7594507ca126d5242346787e42e13c52ea7709b1 max Fri Apr 17 08:40:31 2026 -0700 Add lrSv supertrack: long-read structural variants from 9 studies (hg38). #Preview2 week - bugs introduced now will need a build patch to fix Sub-tracks (all bigBed 9+): han945Sv - 945 Han Chinese, ONT (Gong 2025, PMID 39929826) lrSv1kgOnt - 1019 1000 Genomes, ONT, SVAN-annotated (Schloissnig 2025, PMID 40702182; lifted from hs1) tommoJpSv - 333 Japanese (111 trios), ONT (Otsuki 2022, PMID 36127505) aou1kSv - 1027 All of Us, PacBio HiFi (Garimella 2025, PMID 41256123) ga4kSv - 502 GA4K pediatric rare disease, PacBio HiFi (Cohen 2022, PMID 35305867) decodeSv - 3622 Icelanders, ONT (Beyter 2021, PMID 33972781) hgsvc3Sv - 65 HGSVC3 diverse haplotype-resolved assemblies, HiFi+ONT (Logsdon 2025, PMID 40702183; merges insdel+inv tables) kwanhoSv - 100 post-mortem brains (PD/ILBD/HC), PacBio HiFi (Kim 2026, PMID 41929179) chirmade101Sv - 101 long-read WGS GWAS SVatalog cohort (Chirmade 2026, PMID 41203876) Includes per-track conversion scripts and autoSql under scripts/lrSv/, the supertrack summary table in lrSv.html, and a consolidated makeDoc at doc/hg38/lrSv.txt. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/ga4kSv.html src/hg/makeDb/trackDb/human/ga4kSv.html new file mode 100644 index 00000000000..0a4b668a46e --- /dev/null +++ src/hg/makeDb/trackDb/human/ga4kSv.html @@ -0,0 +1,103 @@ +

Description

+

+This track shows structural variants (SVs) identified by PacBio HiFi long-read +sequencing of probands and their families enrolled in the Genomic Answers for +Kids (GA4K) program at Children's Mercy Research Institute. GA4K is a +longitudinal pediatric genomics initiative that aims to enroll 30,000 children +with suspected rare genetic disorders, together with their parents, to build +a large-scale resource of clinical and genomic data. +

+

+The callset contains 115,554 SVs (52,564 deletions, 58,219 insertions, 4,408 +duplications, 363 inversions) from 502 sequenced samples. Variants are +site-level (no per-sample genotypes) and each SV has been replicated, meaning +that it was either observed in two or more unrelated GA4K individuals, or +matched an SV from an external long-read reference set (Decode or the Human +Pangenome Reference Consortium). +

+ +

Display Conventions and Configuration

+

+Items are colored by SV type: +

+

+

+Insertions are placed at the insertion site with a width of 1 bp; deletions, +duplications and inversions span the affected interval. Filters are available +for SV type, SV length, carrier-sample count and allele frequency. The detail +page also shows the total number of samples genotyped at each site. +

+ +

Methods

+

+Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi +chemistry. Single-sample SV callsets were produced with pbsv and then merged +across the cohort with JASMINE v1.1.4 (jasmine --output-genotypes), +which clusters equivalent SVs across samples and writes a site-level multi-sample +VCF. +

+

+To reduce false positives, the merged VCF was filtered to retain only SVs that +were replicated in at least two independent observations: either (1) matching a +second SV from another unrelated Children's Mercy (CMH) individual within the +same Jasmine cluster, or (2) matching an SV from the Decode Icelandic or Human +Pangenome Reference Consortium (HPRC) callsets using +svpack match with default settings. +

+

+Carrier counts (SVC), total sample counts (SVN) and allele frequencies +(SVF = SVC/SVN) were recomputed on the replicated callset. +

+ +

Data Access

+

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator and exported from there +to spreadsheet or tab-sep tables. From scripts, the data can be accessed +through our API, track=ga4kSv. +

+

+For automated download and analysis, the annotation is stored in a bigBed file +that can be downloaded from +our +download server. The file for this track is called ga4kSv.bb. +Individual regions or the whole annotation can be obtained using the +bigBedToBed utility, available as a precompiled binary or from source +as described on our +utilities +page. +Example: +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/ga4kSv.bb -chrom=chr21 -start=0 -end=100000000 stdout. +

+

+The original VCF is available from the Children's Mercy Research Institute +GA4K data release at + +github.com/ChildrensMercyResearchInstitute/GA4K. +

+ +

Credits

+

+Thanks to the Children's Mercy Research Institute and the Genomic Answers +for Kids participants and their families for making this dataset publicly +available. +

+ +

References

+ + +

+Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, +Baybayan P, Belden B et al. + +Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. +Genet Med. 2022 Jun;24(6):1336-1348. +PMID: 35305867 +

+