9eb4e0937782954c19d664e7d384d210bffb3b25 max Sat Jun 13 16:01:42 2026 -0700 lrSv: QA fixes from Lou's review - dedup, shared color palette, deCODE/AoU cleanup - Drop kwanhoSv (KimPD) from the lrSvAll merge in databases.tsv; it stays on dev/alpha until published, which also removes its >5 Mb breakend artifacts from the merged track. - Remove searchIndex from colorsDbSv, lrSv1kLin and lrSvAll (and the merge generator): the bigBeds were built without a name index, so by-name search never worked. - Single shared per-SV-type color palette in lrSvCommon.py (svColor), used by every converter and the merge. CPX is purple everywhere (was orange in 1kgOnt/apr/cpc1, colliding with INV's orange), colorsDb DEL is 200,0,0 like the rest, and TRA/INSDEL get their own colors. - deCODE: drop byte-identical duplicate rows and blank the fake AC=50 placeholder (AC is now a string field, omitted from the name and mouseOver). - AoU: numeric-entity-encode non-ASCII gene/trait text and drop duplicate rows. - gustafson, chirmade101, hprc2v21: drop byte-identical duplicate rows. - lrSvMergeAll.py: skip byte-identical duplicate source rows instead of summing their allele counts, which had inflated the per-database and total AC. refs #36258 diff --git src/hg/makeDb/doc/hs1/lrSv.txt src/hg/makeDb/doc/hs1/lrSv.txt index 20c594d0a48..d2269d7ebcb 100644 --- src/hg/makeDb/doc/hs1/lrSv.txt +++ src/hg/makeDb/doc/hs1/lrSv.txt @@ -67,15 +67,27 @@ # (chr1..chrY) already match the hs1 assembly, so no renaming is needed. mkdir -p /hive/data/genomes/hs1/bed/lrSv/hprc2v21 cd /hive/data/genomes/hs1/bed/lrSv/hprc2v21 # VCF provided by Glenn Hickey (HPRC graph team): wget https://public.gi.ucsc.edu/~ghickey/debug/hprc-v2.1-mc-chm13.gref95.ro.vcf.gz python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2RoVcfToBed.py \ hprc-v2.1-mc-chm13.gref95.ro.vcf.gz hprc2v21.bed # kept 608435 SV-sized alleles: 363310 INS, 245125 DEL, 0 CPX # (75809 at nested snarl levels LV>0) bedSort hprc2v21.bed hprc2v21.sorted.bed bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2Ro.as \ -tab hprc2v21.sorted.bed /hive/data/genomes/hs1/chrom.sizes hprc2v21.bb + +########## +# 2026-06-13 Claude max +# +# QA fixes (refs #36258), hs1 side. See doc/hg38/lrSv.txt for the full writeup. +# The converters now drop byte-identical duplicate rows, so re-running the +# hs1 builds above gives the new counts: +# hprc2v21 hs1 608,435 -> 541,176 (67,259 duplicates dropped) +# colorsDb hs1 and 1kgOnt hs1 were rebuilt from source for the shared color +# palette (CPX purple, colorsDb DEL 200,0,0); apr hs1 and cpc1 hs1 only needed +# the CPX color remapped, so their served bigBeds were recolored in place the +# same way as the hg38 files (see doc/hg38/lrSv.txt). There is no hs1 lrSvAll.