bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/aou1kSv.html src/hg/makeDb/trackDb/human/aou1kSv.html index bafede33425..523c2be0439 100644 --- src/hg/makeDb/trackDb/human/aou1kSv.html +++ src/hg/makeDb/trackDb/human/aou1kSv.html @@ -38,43 +38,62 @@
  • Regulatory Elements: intersected regulatory elements (e.g. enhancer, promoter)
  • Other LR Datasets: whether the SV was also detected in HPRC, HGSVC, or 1KG-ONT long-read datasets
  • eQTLs: expression QTL associations with q-values
  • GWAS Associations: overlapping GWAS loci with trait, gene, rsID, and LD information
  • SV-Trait Associations: associations with clinical phenotypes from AoU electronic health records, including odds ratios and confidence intervals
  • Methods

    -PacBio HiFi long-read sequencing was performed on 1,027 AoU participants -self-identifying as Black or African American, at a median coverage of ~8x. -SV calling was performed using a cohort-level pipeline, producing calls for -insertions and deletions. Allele frequencies were computed separately for -five ancestry groups. SVs were annotated with gene intersections from OMIM, -disease gene panels, cancer gene lists, and ACMG actionable genes, along -with regulatory element overlaps and segmental duplication associations. +Garimella et al. 2025 performed PacBio HiFi long-read sequencing on 1,027 +All of Us participants self-identifying as Black or African American, to +~8x per-sample coverage at HudsonAlpha Discovery. SVs (≥50 bp) were +called per sample with an ensemble of three methods: two alignment-based +callers, +PBSV v2.6.0 (with Tandem Repeat Finder context) and +Sniffles2 +v2.0.6, plus the assembly-based PAV v1.2.1 (hifiasm haplotype-resolved contigs aligned +to GRCh38 with minimap2 -x asm20). Per-caller VCFs were normalized, +merged within and across samples and filtered into stringent and lenient +tiers, and the callset was re-genotyped across the cohort to produce the +final release: 541,049 autosomal SVs (444,524 insertions, 96,525 deletions) +with per-ancestry allele frequencies (AFR, AMR, EAS, EUR, SAS) and gene, +regulatory, eQTL, GWAS and EHR-phenotype annotations.

    -A scalable imputation workflow was developed to impute over 750,000 SVs into -existing short-read whole-genome sequencing datasets. SV-trait associations -were tested in 848 AoU participants with matched electronic health records, -identifying 291 significant associations across 226 conditions. +This track was built from the supplementary media-2 table of the AoU +long-read sequencing preprint +( +doi:10.1101/2025.10.02.25336942). Access to the underlying AoU +long-read data requires registration through the +All of Us +Research Hub. +

    +

    +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

    Data Access

    This track was built from supplementary data (media-2) of the AoU long-read sequencing preprint. Access to the full AoU dataset requires registration through the All of Us Research Hub.

    Credits

    Thanks to Garimella et al. and the All of Us Research Program for making their structural variant annotations publicly available.