9eb4e0937782954c19d664e7d384d210bffb3b25
max
  Sat Jun 13 16:01:42 2026 -0700
lrSv: QA fixes from Lou's review - dedup, shared color palette, deCODE/AoU cleanup

- Drop kwanhoSv (KimPD) from the lrSvAll merge in databases.tsv; it stays on
dev/alpha until published, which also removes its >5 Mb breakend artifacts
from the merged track.
- Remove searchIndex from colorsDbSv, lrSv1kLin and lrSvAll (and the merge
generator): the bigBeds were built without a name index, so by-name search
never worked.
- Single shared per-SV-type color palette in lrSvCommon.py (svColor), used by
every converter and the merge. CPX is purple everywhere (was orange in
1kgOnt/apr/cpc1, colliding with INV's orange), colorsDb DEL is 200,0,0 like
the rest, and TRA/INSDEL get their own colors.
- deCODE: drop byte-identical duplicate rows and blank the fake AC=50
placeholder (AC is now a string field, omitted from the name and mouseOver).
- AoU: numeric-entity-encode non-ASCII gene/trait text and drop duplicate rows.
- gustafson, chirmade101, hprc2v21: drop byte-identical duplicate rows.
- lrSvMergeAll.py: skip byte-identical duplicate source rows instead of summing
their allele counts, which had inflated the per-database and total AC.

refs #36258

diff --git src/hg/makeDb/scripts/lrSv/databases.tsv src/hg/makeDb/scripts/lrSv/databases.tsv
index a22cff440af..eddde9e315a 100644
--- src/hg/makeDb/scripts/lrSv/databases.tsv
+++ src/hg/makeDb/scripts/lrSv/databases.tsv
@@ -1,32 +1,33 @@
 # Database configuration for lrSvAll combined long-read SV track
 # Columns:
 #   key            - short key used as field-name prefix and in `sources` list
 #   label          - human-readable label shown in filter dropdown and detail page
 #   bbPath         - path to the source bigBed file (under /gbdb/hg38/lrSv/)
 #   valueField     - autoSql field name to extract for the per-db value column.
 #                    Special values:
 #                      UNKNOWN  - emit literal "unknown" (db has no real AC)
 #                      SPLIT    - use affField/healField for two output columns
 #   valueLabel     - text shown after the dataset name on the detail page
 #                    (typically "AC", but "samples" for gustafson)
 #   affField       - SPLIT mode: expression for affected AC (supports "a+b")
 #   healField      - SPLIT mode: expression for healthy AC
 #   afField        - autoSql field name(s) to use for max-AF aggregation,
 #                    comma-separated. "" means no AF available.
 # Order here = order of per-db columns in the output bigBed.
 #key	label	bbPath	valueField	valueLabel	affField	healField	afField
 CoLoRSdb	CoLoRSdb 1,427 (PacBio)	/gbdb/hg38/lrSv/colorsDb/sv.hg38.bb	AC	AC			AF
 1000G-ONT-Vienna	1KG ONT Vienna 1,019	/gbdb/hg38/lrSv/1kgOnt.bb	AC	AC			alleleFreq
 1000G-ONT	1KG ONT 100 (Gustafson)	/gbdb/hg38/lrSv/gustafson.bb	sampleCount	samples
 AoU1K	All of Us 1,027 (PacBio)	/gbdb/hg38/lrSv/aou1k.bb	AC	AC			afAfr,afAmr,afEas,afEur,afSas
 Han945	Han Chinese 945	/gbdb/hg38/lrSv/han945.bb	AC	AC			alleleFreq
 TommoJapan	ToMMo 333 (Japanese)	/gbdb/hg38/lrSv/tommoJp.bb	AC	AC			alleleFreq
 GA4K	GA4K 502 (rare disease)	/gbdb/hg38/lrSv/ga4kSv.bb	AC	AC			alleleFreq
 deCODE	deCODE 3,622 (Icelandic)	/gbdb/hg38/lrSv/decodeSv.bb	UNKNOWN	AC
 HPRCv2.1	HPRC v2.1 233	/gbdb/hg38/lrSv/hprc2v21.bb	AC	AC			alleleFreq
 HGSVC2	HGSVC2 32	/gbdb/hg38/lrSv/hgsvc2.bb	AC	AC
 HGSVC3	HGSVC3 65	/gbdb/hg38/lrSv/hgsvc3.bb	AC	AC
-KimPD	Kim PD Brain 100	/gbdb/hg38/lrSv/kwanho.bb	SPLIT	AC	acPd+acIlbd	acHc	afPd,afHc,afIlbd
+# KimPD (Kim PD Brain 100) is held on dev/alpha until published; it has
+# breakend artifacts up to 190 Mb, so it is excluded from the lrSvAll merge.
 ArabUAE53	Arab APR 53	/gbdb/hg38/lrSv/apr.bb	AC	AC			alleleFreq
 China58	CPC 58 (Chinese)	/gbdb/hg38/lrSv/cpc1.bb	AC	AC			alleleFreq
 Svatalog101	SVatalog 101	/gbdb/hg38/lrSv/chirmade101.bb	UNKNOWN	AC