7594507ca126d5242346787e42e13c52ea7709b1 max Fri Apr 17 08:40:31 2026 -0700 Add lrSv supertrack: long-read structural variants from 9 studies (hg38). #Preview2 week - bugs introduced now will need a build patch to fix Sub-tracks (all bigBed 9+): han945Sv - 945 Han Chinese, ONT (Gong 2025, PMID 39929826) lrSv1kgOnt - 1019 1000 Genomes, ONT, SVAN-annotated (Schloissnig 2025, PMID 40702182; lifted from hs1) tommoJpSv - 333 Japanese (111 trios), ONT (Otsuki 2022, PMID 36127505) aou1kSv - 1027 All of Us, PacBio HiFi (Garimella 2025, PMID 41256123) ga4kSv - 502 GA4K pediatric rare disease, PacBio HiFi (Cohen 2022, PMID 35305867) decodeSv - 3622 Icelanders, ONT (Beyter 2021, PMID 33972781) hgsvc3Sv - 65 HGSVC3 diverse haplotype-resolved assemblies, HiFi+ONT (Logsdon 2025, PMID 40702183; merges insdel+inv tables) kwanhoSv - 100 post-mortem brains (PD/ILBD/HC), PacBio HiFi (Kim 2026, PMID 41929179) chirmade101Sv - 101 long-read WGS GWAS SVatalog cohort (Chirmade 2026, PMID 41203876) Includes per-track conversion scripts and autoSql under scripts/lrSv/, the supertrack summary table in lrSv.html, and a consolidated makeDoc at doc/hg38/lrSv.txt. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html new file mode 100644 index 00000000000..7517ed38730 --- /dev/null +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -0,0 +1,308 @@ +

Description

+

+This track collection contains structural variant (SV) calls derived from long-read sequencing +studies. Structural variants are genomic rearrangements larger than ~50 bp, including +deletions, insertions, duplications, inversions, and translocations. Long-read sequencing +technologies can span repetitive regions and resolve complex rearrangements +that are difficult to detect with short-read methods. +

+ +

Available Datasets

+

+SV length statistics (min / median / max) are computed from the svLen +field of each track, in base pairs. Some tracks include sites with +svLen=0 (complex events where the reference and alternate alleles +differ in sequence but not in length). +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DatasetN samplesCohort / diseaseSequencingSVsMinMedianMax
Han 945945Han Chinese, general populationONT (PromethION)111,288025499,743
1KG ONT1,0191000 Genomes, globally diverseONT148,375217749,171
ToMMo Japanese333 (111 trios)Japanese, general populationONT74,2015116299,980
AoU 1K1,027All of Us, self-identified Black/African AmericanPacBio HiFi541,049501529,998
GA4K502Children's Mercy, pediatric rare disease probands + familiesPacBio HiFi115,55450186809,711
deCODE 3,6223,622Icelandic general populationONT133,8860127861,080
HGSVC365HGSVC3 diverse reference assembliesPacBio HiFi + ONT176,5315015430,176,500
Kim PD Brain100Parkinson's disease, ILBD, controls (post-mortem brain)PacBio HiFi74,55250160190,088,222
SVatalog 101101Long-read WGS cohort for GWAS LD fine-mapping (SickKids)long-read87,18341601,321,484
+ +

Han 945 SVs (han945Sv)

+

+Structural variants from 945 Han Chinese individuals. 111,288 SVs +(deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. +Includes allele frequencies and per-sample support. +

+ +

1KG ONT SVs (lrSv1kgOnt)

+

+Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). +161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism +of origin (mobile elements, VNTRs, processed pseudogenes, etc.). +Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver. +

+ +

ToMMo Japanese SVs (tommoJpSv)

+

+Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical +Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian +error rates and allele frequencies. +

+ +

AoU 1K SVs (aou1kSv)

+

+Structural variants from 1,027 individuals from the All of Us (AoU) Research Program, +sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions) +with population-specific allele frequencies, gene annotations, and clinical +trait associations. +

+ +

GA4K SVs (ga4kSv)

+

+Structural variants from 502 probands and family members enrolled in the +Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's +Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554 +replicated SVs (deletions, insertions, duplications, inversions) called with +pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs +and short indels) lives alongside other population allele-frequency resources +as GA4K 552 PacBio LR in the Variant +Frequencies track collection. +

+ +

deCODE 3,622 SVs (decodeSv)

+

+High-confidence structural variants from 3,622 Icelanders (deCODE genetics), +sequenced with Oxford Nanopore long reads. 133,886 SVs (deletions, insertions +and combined insertion/deletion events). Site-only callset with annotated +surrounding tandem-repeat regions. +

+ +

HGSVC3 65 SVs (hgsvc3Sv)

+

+Structural variants from 65 diverse individuals sequenced and de novo +assembled by the Human Genome Structural Variation Consortium phase 3 +(HGSVC3). 176,532 haplotype-resolved SVs (deletions, insertions and +inversions) called with PAV and cross-validated with ten additional callers, +with per-site carrier haplotype lists and structural annotations. +

+ +

Kim PD Brain SVs (kwanhoSv)

+

+Structural variants from 100 post-mortem brain samples (Parkinson's disease, +incidental Lewy body disease, and healthy controls) sequenced with PacBio +HiFi long reads. 74,552 high-confidence SVs (deletions, insertions, +duplications, inversions) with per-cohort allele frequencies and +case-control carrier-rate differentials, from Kim et al. 2026. +

+ +

SVatalog 101 SVs (chirmade101Sv)

+

+Structural variants from 101 long-read whole-genome sequences released +alongside the GWAS SVatalog tool (Chirmade et al. 2026). 87,183 SVs +(deletions, insertions, duplications, inversions and complex events) +annotated with gene overlaps, ClinGen / gnomAD constraint scores, +OMIM / ClinVar / DGV / Decipher regional annotations. +

+ +

Data Access

+

+Each subtrack has its own documentation page with details on how to download +and intersect the underlying annotations. +

+ +

References

+ +

+Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y et al. + +Long-read sequencing of 945 Han individuals identifies structural variants associated with +phenotypic diversity and disease susceptibility. +Nat Commun. 2025 Feb 10;16(1):1494. +PMID: 39929826; PMC: PMC11811171 +

+ +

+Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, +Asparuhova M et al. + +Structural variation in 1,019 diverse humans based on long-read sequencing. +Nature. 2025 Aug;644(8076):442-452. +PMID: 40702182; PMC: PMC12350158 +

+ + +

+Otsuki A, Okamura Y, Ishida N, Tadaka S, Takayama J, Kumada K, Kawashima J, Taguchi K, Minegishi N, +Kuriyama S et al. + +Construction of a trio-based structural variation panel utilizing activated T lymphocytes and long- +read sequencing technology. +Commun Biol. 2022 Sep 20;5(1):991. +PMID: 36127505; PMC: PMC9489684 +

+ + + +

+Garimella KV, Li Q, Wertz J, Lee SK, Cunial F, Huang Y, Mostovoy Y, Lorig-Roach R, English A, Su H +et al. + +Population-scale Long-read Sequencing in the All of Us Research Program. +medRxiv. 2025 Oct 5;. +PMID: 41256123; PMC: PMC12622093 +

+ + + +

+Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, +Baybayan P, Belden B et al. + +Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. +Genet Med. 2022 Jun;24(6):1336-1348. +PMID: 35305867 +

+ + + +

+Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, +Kristmundsdottir S, Mehringer S, Hardarson MT et al. + +Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in +human diseases and other traits. +Nat Genet. 2021 Jun;53(6):779-786. +PMID: 33972781 +

+ + + +

+Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo +D et al. + +Complex genetic variation in nearly complete human genomes. +Nature. 2025 Aug;644(8076):430-441. +PMID: 40702183; PMC: PMC12350169 +

+ + + +

+Kim K, Lin Z, Simmons SK, Parker J, Kearney M, Liao Z, Haywood N, Zhang J, Cline MP, Tuncali I +et al. + +Integrating Long-Read Structural Variant Analysis with single-nucleus RNA-seq to Elucidate Gene +Expression Effects in Disease. +bioRxiv. 2026 Mar 23;. +PMID: 41929179; PMC: PMC13041997 +

+ + + +

+Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, +Lin F, Keenan K, Patel RV et al. + +GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. +Heredity (Edinb). 2026 Mar;135(3):199-210. +PMID: 41203876; PMC: PMC13031531 +

+