9eb4e0937782954c19d664e7d384d210bffb3b25
max
  Sat Jun 13 16:01:42 2026 -0700
lrSv: QA fixes from Lou's review - dedup, shared color palette, deCODE/AoU cleanup

- Drop kwanhoSv (KimPD) from the lrSvAll merge in databases.tsv; it stays on
dev/alpha until published, which also removes its >5 Mb breakend artifacts
from the merged track.
- Remove searchIndex from colorsDbSv, lrSv1kLin and lrSvAll (and the merge
generator): the bigBeds were built without a name index, so by-name search
never worked.
- Single shared per-SV-type color palette in lrSvCommon.py (svColor), used by
every converter and the merge. CPX is purple everywhere (was orange in
1kgOnt/apr/cpc1, colliding with INV's orange), colorsDb DEL is 200,0,0 like
the rest, and TRA/INSDEL get their own colors.
- deCODE: drop byte-identical duplicate rows and blank the fake AC=50
placeholder (AC is now a string field, omitted from the name and mouseOver).
- AoU: numeric-entity-encode non-ASCII gene/trait text and drop duplicate rows.
- gustafson, chirmade101, hprc2v21: drop byte-identical duplicate rows.
- lrSvMergeAll.py: skip byte-identical duplicate source rows instead of summing
their allele counts, which had inflated the per-database and total AC.

refs #36258

diff --git src/hg/makeDb/doc/hs1/lrSv.txt src/hg/makeDb/doc/hs1/lrSv.txt
index 20c594d0a48..d2269d7ebcb 100644
--- src/hg/makeDb/doc/hs1/lrSv.txt
+++ src/hg/makeDb/doc/hs1/lrSv.txt
@@ -67,15 +67,27 @@
 # (chr1..chrY) already match the hs1 assembly, so no renaming is needed.
 
 mkdir -p /hive/data/genomes/hs1/bed/lrSv/hprc2v21
 cd /hive/data/genomes/hs1/bed/lrSv/hprc2v21
 
 # VCF provided by Glenn Hickey (HPRC graph team):
 wget https://public.gi.ucsc.edu/~ghickey/debug/hprc-v2.1-mc-chm13.gref95.ro.vcf.gz
 
 python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2RoVcfToBed.py \
     hprc-v2.1-mc-chm13.gref95.ro.vcf.gz hprc2v21.bed
 # kept 608435 SV-sized alleles: 363310 INS, 245125 DEL, 0 CPX
 # (75809 at nested snarl levels LV>0)
 bedSort hprc2v21.bed hprc2v21.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2Ro.as \
     -tab hprc2v21.sorted.bed /hive/data/genomes/hs1/chrom.sizes hprc2v21.bb
+
+##########
+# 2026-06-13 Claude max
+#
+# QA fixes (refs #36258), hs1 side. See doc/hg38/lrSv.txt for the full writeup.
+# The converters now drop byte-identical duplicate rows, so re-running the
+# hs1 builds above gives the new counts:
+#   hprc2v21 hs1  608,435 -> 541,176  (67,259 duplicates dropped)
+# colorsDb hs1 and 1kgOnt hs1 were rebuilt from source for the shared color
+# palette (CPX purple, colorsDb DEL 200,0,0); apr hs1 and cpc1 hs1 only needed
+# the CPX color remapped, so their served bigBeds were recolored in place the
+# same way as the hg38 files (see doc/hg38/lrSv.txt). There is no hs1 lrSvAll.