src/hg/makeDb/scripts/lrSv/databases.tsv 9fbdfa3416ffde377072fafd2de44059155c3b44

9fbdfa3416ffde377072fafd2de44059155c3b44
max
  Thu Apr 30 06:57:35 2026 -0700
lrSv: add lrSvAll merged track combining all long-read SV subtracks

Variants are merged on exact (chrom, start, end, svType, svLen, insLen).
Per-database AC columns are stored as strings; "unknown" is used where
the source dataset has only placeholder AC values (deCODE, SVatalog 101,
1KG ONT 100). Kim PD Brain is split into affected (PD+ILBD) and healthy
(HC) AC columns. Gustafson contributes sampleCount instead of AC.

Output: 2,694,871 unique SVs from 3,706,100 input rows across 15
subtracks (27% dedup). The merged track sits as the first subtrack of
the lrSv supertrack with filters on sources, svType, svLen, insLen,
maxAF/minAF, AC, and sourceCount.

The trackDb stanza is generated by the build script directly into
human/lrSvAll.ra and pulled in via 'include lrSvAll.ra' from lrSv.ra,
so labels in databases.tsv stay the single source of truth.

lrSv.html: add a "Disease cases" column to the dataset summary,
strip parenthesized internal track names from the section headers,
and shorten exact SV counts to ~Nk / ~N.NM in the prose.

refs #36642

diff --git src/hg/makeDb/scripts/lrSv/databases.tsv src/hg/makeDb/scripts/lrSv/databases.tsv
new file mode 100644
index 00000000000..1f392ca0093
--- /dev/null
+++ src/hg/makeDb/scripts/lrSv/databases.tsv
@@ -0,0 +1,32 @@
+# Database configuration for lrSvAll combined long-read SV track
+# Columns:
+#   key            - short key used as field-name prefix and in `sources` list
+#   label          - human-readable label shown in filter dropdown and detail page
+#   bbPath         - path to the source bigBed file (under /gbdb/hg38/lrSv/)
+#   valueField     - autoSql field name to extract for the per-db value column.
+#                    Special values:
+#                      UNKNOWN  - emit literal "unknown" (db has no real AC)
+#                      SPLIT    - use affField/healField for two output columns
+#   valueLabel     - text shown after the dataset name on the detail page
+#                    (typically "AC", but "samples" for gustafson)
+#   affField       - SPLIT mode: expression for affected AC (supports "a+b")
+#   healField      - SPLIT mode: expression for healthy AC
+#   afField        - autoSql field name(s) to use for max-AF aggregation,
+#                    comma-separated. "" means no AF available.
+# Order here = order of per-db columns in the output bigBed.
+#key	label	bbPath	valueField	valueLabel	affField	healField	afField
+CoLoRSdb	CoLoRSdb 1,427 (PacBio)	/gbdb/hg38/lrSv/colorsDb/sv.hg38.bb	AC	AC			AF
+1000G-ONT-Vienna	1KG ONT Vienna 1,019	/gbdb/hg38/lrSv/1kgOnt.bb	AC	AC			alleleFreq
+1000G-ONT	1KG ONT 100 (Gustafson)	/gbdb/hg38/lrSv/gustafson.bb	sampleCount	samples
+AoU1K	All of Us 1,027 (PacBio)	/gbdb/hg38/lrSv/aou1k.bb	AC	AC			afAfr,afAmr,afEas,afEur,afSas
+Han945	Han Chinese 945	/gbdb/hg38/lrSv/han945.bb	AC	AC			alleleFreq
+TommoJapan	ToMMo 333 (Japanese)	/gbdb/hg38/lrSv/tommoJp.bb	AC	AC			alleleFreq
+GA4K	GA4K 502 (rare disease)	/gbdb/hg38/lrSv/ga4kSv.bb	AC	AC			alleleFreq
+deCODE	deCODE 3,622 (Icelandic)	/gbdb/hg38/lrSv/decodeSv.bb	UNKNOWN	AC
+HPRCv2	HPRC v2 233	/gbdb/hg38/lrSv/hprc2.bb	AC	AC			alleleFreq
+HGSVC2	HGSVC2 32	/gbdb/hg38/lrSv/hgsvc2.bb	AC	AC
+HGSVC3	HGSVC3 65	/gbdb/hg38/lrSv/hgsvc3.bb	AC	AC
+KimPD	Kim PD Brain 100	/gbdb/hg38/lrSv/kwanho.bb	SPLIT	AC	acPd+acIlbd	acHc	afPd,afHc,afIlbd
+ArabUAE53	Arab APR 53	/gbdb/hg38/lrSv/apr.bb	AC	AC			alleleFreq
+China58	CPC 58 (Chinese)	/gbdb/hg38/lrSv/cpc1.bb	AC	AC			alleleFreq
+Svatalog101	SVatalog 101	/gbdb/hg38/lrSv/chirmade101.bb	UNKNOWN	AC