9fbdfa3416ffde377072fafd2de44059155c3b44 max Thu Apr 30 06:57:35 2026 -0700 lrSv: add lrSvAll merged track combining all long-read SV subtracks Variants are merged on exact (chrom, start, end, svType, svLen, insLen). Per-database AC columns are stored as strings; "unknown" is used where the source dataset has only placeholder AC values (deCODE, SVatalog 101, 1KG ONT 100). Kim PD Brain is split into affected (PD+ILBD) and healthy (HC) AC columns. Gustafson contributes sampleCount instead of AC. Output: 2,694,871 unique SVs from 3,706,100 input rows across 15 subtracks (27% dedup). The merged track sits as the first subtrack of the lrSv supertrack with filters on sources, svType, svLen, insLen, maxAF/minAF, AC, and sourceCount. The trackDb stanza is generated by the build script directly into human/lrSvAll.ra and pulled in via 'include lrSvAll.ra' from lrSv.ra, so labels in databases.tsv stay the single source of truth. lrSv.html: add a "Disease cases" column to the dataset summary, strip parenthesized internal track names from the section headers, and shorten exact SV counts to ~Nk / ~N.NM in the prose. refs #36642 diff --git src/hg/makeDb/trackDb/human/lrSvAll.ra src/hg/makeDb/trackDb/human/lrSvAll.ra new file mode 100644 index 00000000000..aa7d9f79a29 --- /dev/null +++ src/hg/makeDb/trackDb/human/lrSvAll.ra @@ -0,0 +1,41 @@ +# AUTO-GENERATED by ~/kent/src/hg/makeDb/scripts/lrSv/lrSvMergeAll.py +# Do not edit by hand - re-run the merge script and re-commit. + + track lrSvAll + parent lrSv + bigDataUrl /gbdb/$D/lrSv/lrSvAll.bb + shortLabel All LR SVs merged + longLabel All long-read SVs merged across the lrSv subtracks (exact-position match), with per-database AC + type bigBed 9 + + itemRgb on + visibility pack + mouseOver <b>$name</b> ($svType) svLen=$svLen insLen=$insLen sources=$sources AF=$minAF-$maxAF AC=$AC + searchIndex name + filterValues.sources CoLoRSdb|CoLoRSdb 1427 (PacBio),1000G-ONT-Vienna|1KG ONT Vienna 1019,1000G-ONT|1KG ONT 100 (Gustafson),AoU1K|All of Us 1027 (PacBio),Han945|Han Chinese 945,TommoJapan|ToMMo 333 (Japanese),GA4K|GA4K 502 (rare disease),deCODE|deCODE 3622 (Icelandic),HPRCv2|HPRC v2 233,HGSVC2|HGSVC2 32,HGSVC3|HGSVC3 65,KimPD|Kim PD Brain 100,ArabUAE53|Arab APR 53,China58|CPC 58 (Chinese),Svatalog101|SVatalog 101 + filterType.sources multipleListOr + filterLabel.sources Source Database + filterValues.svType DEL,INS,DUP,INV,CPX,MIXED,INSDEL,CNV,BND,TRA,MEI + filterType.svType multipleListOr + filterLabel.svType SV Type + filter.svLen 0:30000000 + filterByRange.svLen on + filterLabel.svLen SV Length (bp) + filter.insLen 0:600000 + filterByRange.insLen on + filterLabel.insLen Insertion Length (bp) + filter.maxAF 0:1 + filterByRange.maxAF on + filterLimits.maxAF 0:1 + filterLabel.maxAF Max Allele Frequency (across DBs) + filter.minAF 0:1 + filterByRange.minAF on + filterLimits.minAF 0:1 + filterLabel.minAF Min Allele Frequency (across DBs) + filter.AC 0:30000 + filterByRange.AC on + filterLabel.AC Total AC (across DBs) + filter.sourceCount 1:15 + filterByRange.sourceCount on + filterLabel.sourceCount Number of Source Databases + skipEmptyFields on + priority 0