9eb4e0937782954c19d664e7d384d210bffb3b25 max Sat Jun 13 16:01:42 2026 -0700 lrSv: QA fixes from Lou's review - dedup, shared color palette, deCODE/AoU cleanup - Drop kwanhoSv (KimPD) from the lrSvAll merge in databases.tsv; it stays on dev/alpha until published, which also removes its >5 Mb breakend artifacts from the merged track. - Remove searchIndex from colorsDbSv, lrSv1kLin and lrSvAll (and the merge generator): the bigBeds were built without a name index, so by-name search never worked. - Single shared per-SV-type color palette in lrSvCommon.py (svColor), used by every converter and the merge. CPX is purple everywhere (was orange in 1kgOnt/apr/cpc1, colliding with INV's orange), colorsDb DEL is 200,0,0 like the rest, and TRA/INSDEL get their own colors. - deCODE: drop byte-identical duplicate rows and blank the fake AC=50 placeholder (AC is now a string field, omitted from the name and mouseOver). - AoU: numeric-entity-encode non-ASCII gene/trait text and drop duplicate rows. - gustafson, chirmade101, hprc2v21: drop byte-identical duplicate rows. - lrSvMergeAll.py: skip byte-identical duplicate source rows instead of summing their allele counts, which had inflated the per-database and total AC. refs #36258 diff --git src/hg/makeDb/scripts/lrSv/databases.tsv src/hg/makeDb/scripts/lrSv/databases.tsv index a22cff440af..eddde9e315a 100644 --- src/hg/makeDb/scripts/lrSv/databases.tsv +++ src/hg/makeDb/scripts/lrSv/databases.tsv @@ -1,32 +1,33 @@ # Database configuration for lrSvAll combined long-read SV track # Columns: # key - short key used as field-name prefix and in `sources` list # label - human-readable label shown in filter dropdown and detail page # bbPath - path to the source bigBed file (under /gbdb/hg38/lrSv/) # valueField - autoSql field name to extract for the per-db value column. # Special values: # UNKNOWN - emit literal "unknown" (db has no real AC) # SPLIT - use affField/healField for two output columns # valueLabel - text shown after the dataset name on the detail page # (typically "AC", but "samples" for gustafson) # affField - SPLIT mode: expression for affected AC (supports "a+b") # healField - SPLIT mode: expression for healthy AC # afField - autoSql field name(s) to use for max-AF aggregation, # comma-separated. "" means no AF available. # Order here = order of per-db columns in the output bigBed. #key label bbPath valueField valueLabel affField healField afField CoLoRSdb CoLoRSdb 1,427 (PacBio) /gbdb/hg38/lrSv/colorsDb/sv.hg38.bb AC AC AF 1000G-ONT-Vienna 1KG ONT Vienna 1,019 /gbdb/hg38/lrSv/1kgOnt.bb AC AC alleleFreq 1000G-ONT 1KG ONT 100 (Gustafson) /gbdb/hg38/lrSv/gustafson.bb sampleCount samples AoU1K All of Us 1,027 (PacBio) /gbdb/hg38/lrSv/aou1k.bb AC AC afAfr,afAmr,afEas,afEur,afSas Han945 Han Chinese 945 /gbdb/hg38/lrSv/han945.bb AC AC alleleFreq TommoJapan ToMMo 333 (Japanese) /gbdb/hg38/lrSv/tommoJp.bb AC AC alleleFreq GA4K GA4K 502 (rare disease) /gbdb/hg38/lrSv/ga4kSv.bb AC AC alleleFreq deCODE deCODE 3,622 (Icelandic) /gbdb/hg38/lrSv/decodeSv.bb UNKNOWN AC HPRCv2.1 HPRC v2.1 233 /gbdb/hg38/lrSv/hprc2v21.bb AC AC alleleFreq HGSVC2 HGSVC2 32 /gbdb/hg38/lrSv/hgsvc2.bb AC AC HGSVC3 HGSVC3 65 /gbdb/hg38/lrSv/hgsvc3.bb AC AC -KimPD Kim PD Brain 100 /gbdb/hg38/lrSv/kwanho.bb SPLIT AC acPd+acIlbd acHc afPd,afHc,afIlbd +# KimPD (Kim PD Brain 100) is held on dev/alpha until published; it has +# breakend artifacts up to 190 Mb, so it is excluded from the lrSvAll merge. ArabUAE53 Arab APR 53 /gbdb/hg38/lrSv/apr.bb AC AC alleleFreq China58 CPC 58 (Chinese) /gbdb/hg38/lrSv/cpc1.bb AC AC alleleFreq Svatalog101 SVatalog 101 /gbdb/hg38/lrSv/chirmade101.bb UNKNOWN AC