bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/aou1kSv.html src/hg/makeDb/trackDb/human/aou1kSv.html index bafede33425..523c2be0439 100644 --- src/hg/makeDb/trackDb/human/aou1kSv.html +++ src/hg/makeDb/trackDb/human/aou1kSv.html @@ -38,43 +38,62 @@ <li><b>Regulatory Elements</b>: intersected regulatory elements (e.g. enhancer, promoter)</li> <li><b>Other LR Datasets</b>: whether the SV was also detected in HPRC, HGSVC, or 1KG-ONT long-read datasets</li> <li><b>eQTLs</b>: expression QTL associations with q-values</li> <li><b>GWAS Associations</b>: overlapping GWAS loci with trait, gene, rsID, and LD information</li> <li><b>SV-Trait Associations</b>: associations with clinical phenotypes from AoU electronic health records, including odds ratios and confidence intervals</li> </ul> </p> <h2>Methods</h2> <p> -PacBio HiFi long-read sequencing was performed on 1,027 AoU participants -self-identifying as Black or African American, at a median coverage of ~8x. -SV calling was performed using a cohort-level pipeline, producing calls for -insertions and deletions. Allele frequencies were computed separately for -five ancestry groups. SVs were annotated with gene intersections from OMIM, -disease gene panels, cancer gene lists, and ACMG actionable genes, along -with regulatory element overlaps and segmental duplication associations. +Garimella et al. 2025 performed PacBio HiFi long-read sequencing on 1,027 +All of Us participants self-identifying as Black or African American, to +~8x per-sample coverage at HudsonAlpha Discovery. SVs (≥50 bp) were +called per sample with an ensemble of three methods: two alignment-based +callers, <a href="https://github.com/PacificBiosciences/pbsv" target="_blank"> +PBSV</a> v2.6.0 (with Tandem Repeat Finder context) and +<a href="https://github.com/fritzsedlazeck/Sniffles" target="_blank">Sniffles2</a> +v2.0.6, plus the assembly-based <a href="https://github.com/EichlerLab/pav" +target="_blank">PAV</a> v1.2.1 (hifiasm haplotype-resolved contigs aligned +to GRCh38 with minimap2 <tt>-x asm20</tt>). Per-caller VCFs were normalized, +merged within and across samples and filtered into stringent and lenient +tiers, and the callset was re-genotyped across the cohort to produce the +final release: 541,049 autosomal SVs (444,524 insertions, 96,525 deletions) +with per-ancestry allele frequencies (AFR, AMR, EAS, EUR, SAS) and gene, +regulatory, eQTL, GWAS and EHR-phenotype annotations. </p> <p> -A scalable imputation workflow was developed to impute over 750,000 SVs into -existing short-read whole-genome sequencing datasets. SV-trait associations -were tested in 848 AoU participants with matched electronic health records, -identifying 291 significant associations across 226 conditions. +This track was built from the supplementary media-2 table of the AoU +long-read sequencing preprint +(<a href="https://doi.org/10.1101/2025.10.02.25336942" target="_blank"> +doi:10.1101/2025.10.02.25336942</a>). Access to the underlying AoU +long-read data requires registration through the +<a href="https://www.researchallofus.org/" target="_blank">All of Us +Research Hub</a>. +</p> +<p> +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> This track was built from supplementary data (media-2) of the AoU long-read sequencing preprint. Access to the full AoU dataset requires registration through the <a href="https://www.researchallofus.org/" target="_blank">All of Us Research Hub</a>. </p> <h2>Credits</h2> <p> Thanks to Garimella et al. and the All of Us Research Program for making their structural variant annotations publicly available. </p>