06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index 1eec6373f17..ebaced9d96e 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -1,449 +1,469 @@

Description

This track collection contains structural variant (SV) calls derived from long-read sequencing studies. Structural variants are genomic rearrangements larger than ~50 bp, including deletions, insertions, duplications, inversions, and translocations. Long-read sequencing technologies can span repetitive regions and resolve complex rearrangements that are difficult to detect with short-read methods.

Available Datasets

SV length statistics (min / median / max) are computed from the svLen field of each track, in base pairs. Some tracks include sites with svLen=0 (complex events where the reference and alternate alleles differ in sequence but not in length).

All subtracks below are long-read callsets, except the last row (1KG 3202, Illumina short-read), which is included as a short-read comparator.

+ + + + + + +
Dataset N samples Cohort / disease Sequencing SVs Min Median Max
CoLoRSdb 1,427 Consortium of Long-Read Sequencing, joint callset PacBio HiFi 426,239 20 33 101,381
Han 945 945 Han Chinese, general population ONT (PromethION) 111,288 0 254 99,743
1KG ONT 100 100 1000 Genomes, 5 superpopulations / 19 subpopulations ONT (R9.4.1) 113,696 0 164 98,289
1KG ONT Vienna 1,019 1000 Genomes, globally diverse ONT 148,375 2 177 49,171
ToMMo Japanese 333 (111 trios) Japanese, general population ONT 74,201 51 162 99,980
ToMMo 48K CNV48,874Japanese, general population (short-read comparator for ToMMo long-read SVs)Illumina short-read (GATK CNV, 1 kb bins, shown as two bigWigs)~2M bins with CNV carriers; not comparable to per-SV counts above
AoU 1K 1,027 All of Us, self-identified Black/African American PacBio HiFi 541,049 50 152 9,998
GA4K 502 Children's Mercy, pediatric rare disease probands + families PacBio HiFi 115,554 50 186 809,711
deCODE 3,622 3,622 Icelandic general population ONT 133,886 0 127 861,080
HPRC v2 233 HPRC release-2 pangenome (CHM13 + diverse 1KG assemblies) PacBio HiFi (pangenome graph) 1,483,114 50 280 97,718
HGSVC2 32 HGSVC2 haplotype-resolved assemblies (5 superpopulations) PacBio CLR + HiFi + Strand-seq 111,746 50 168 57,207,414
HGSVC3 65 HGSVC3 diverse reference assemblies PacBio HiFi + ONT 176,531 50 154 30,176,500
Kim PD Brain 100 Parkinson's disease, ILBD, controls (post-mortem brain) PacBio HiFi 74,552 50 160 190,088,222
SVatalog 101 101 Long-read WGS cohort for GWAS LD fine-mapping (SickKids) long-read 87,183 4 160 1,321,484
1KG 3202 (short-read) 3,202 1000 Genomes expanded cohort (short-read comparator) Illumina short-read 173,366 1 314 154,807,729

CoLoRSdb SVs (colorsDbSv)

Structural variants from the Consortium of Long-Read Sequencing database (CoLoRSdb), from 1,427 PacBio HiFi long-read whole-genome sequences. 426,239 SVs (insertions, deletions, inversions) called with pbsv and merged with Jasmine, with allele frequencies, genotype counts and Hardy-Weinberg statistics across the cohort.

Han 945 SVs (han945Sv)

Structural variants from 945 Han Chinese individuals. 111,288 SVs (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. Includes allele frequencies and per-sample support.

1KG ONT 100 SVs (gustafsonSv)

Structural variants from Oxford Nanopore long-read sequencing of 100 1000 Genomes samples (5 superpopulations, 19 subpopulations) released by the 1000 Genomes ONT Sequencing Consortium and described in Gustafson et al. 2024. 113,696 SVs (insertions, deletions, duplications, inversions) called with five callers and merged with Jasmine. This is a -separate dataset from the Vienna 1KG-ONT release below. +separate dataset from the Vienna 1KG-ONT release below; the 100 samples +here do not overlap with the 1,019 samples in the Vienna release.

1KG ONT Vienna SVs (lrSv1kgOnt)

Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism of origin (mobile elements, VNTRs, processed pseudogenes, etc.). Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver. +This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above; +the 1,019 samples here do not overlap with the 100 samples in that release.

ToMMo Japanese SVs (tommoJpSv)

Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian error rates and allele frequencies.

+

ToMMo 48K CNV SR (tommoJpCnv) - short-read comparator

+

+Short-read CNV comparator for the ToMMo long-read SV track above. +Per-1 kb-bin copy-number carrier counts from short-read whole-genome +sequencing of 48,874 Japanese individuals (jMorp 48KJPN-CNV Frequency +Panel, release 20230828), called with GATK CNV germline workflows. +Shown as a multiWig overlay: red = samples with copy-number loss +(CN<2) per bin, green = samples with gain (CN>2) per bin. +

+

AoU 1K SVs (aou1kSv)

Structural variants from 1,027 individuals from the All of Us (AoU) Research Program, sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions) with population-specific allele frequencies, gene annotations, and clinical trait associations.

GA4K SVs (ga4kSv)

Structural variants from 502 probands and family members enrolled in the Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554 replicated SVs (deletions, insertions, duplications, inversions) called with pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs and short indels) lives alongside other population allele-frequency resources as GA4K 552 PacBio LR in the Variant Frequencies track collection.

deCODE 3,622 SVs (decodeSv)

High-confidence structural variants from 3,622 Icelanders (deCODE genetics), sequenced with Oxford Nanopore long reads. 133,886 SVs (deletions, insertions and combined insertion/deletion events). Site-only callset with annotated surrounding tandem-repeat regions.

HPRC v2 SVs (hprc2Sv)

Structural variants derived from the Human Pangenome Reference Consortium release-2 minigraph-cactus pangenome graph, built from 233 PacBio HiFi haplotype-resolved assemblies (CHM13 + diverse 1000 Genomes samples). 1,483,114 SV-sized alleles (INS, DEL, COMPLEX, INV) extracted with vg deconstruct and decomposed with vcfwave (WFA2).

HGSVC2 32 SVs (hgsvc2Sv)

Structural variants from 32 haplotype-resolved diploid genomes (HGSVC2 freeze 4, Ebert et al. 2021). 111,746 SVs (deletions, insertions and inversions) called from phased de novo assemblies with PAV, with per-variant 1000 Genomes population allele frequencies (insertions and deletions) and rich structural/gene annotations. An earlier HGSVC release complementary to HGSVC3.

HGSVC3 65 SVs (hgsvc3Sv)

Structural variants from 65 diverse individuals sequenced and de novo assembled by the Human Genome Structural Variation Consortium phase 3 (HGSVC3). 176,532 haplotype-resolved SVs (deletions, insertions and inversions) called with PAV and cross-validated with ten additional callers, with per-site carrier haplotype lists and structural annotations.

Kim PD Brain SVs (kwanhoSv)

Structural variants from 100 post-mortem brain samples (Parkinson's disease, incidental Lewy body disease, and healthy controls) sequenced with PacBio HiFi long reads. 74,552 high-confidence SVs (deletions, insertions, duplications, inversions) with per-cohort allele frequencies and case-control carrier-rate differentials, from Kim et al. 2026.

SVatalog 101 SVs (chirmade101Sv)

Structural variants from 101 long-read whole-genome sequences released alongside the GWAS SVatalog tool (Chirmade et al. 2026). 87,183 SVs (deletions, insertions, duplications, inversions and complex events) annotated with gene overlaps, ClinGen / gnomAD constraint scores, OMIM / ClinVar / DGV / Decipher regional annotations.

1KG 3202 SVs (onekg3202Sr) - short-read comparator

This is a short-read dataset, included for comparison only. Structural variants from the expanded 1000 Genomes cohort of 3,202 Illumina NovaSeq short-read whole genomes (Byrska-Bishop et al. 2022), called with the GATK-SV / svtools pipeline. 173,366 SVs (DEL, INS, DUP, INV, CPX, CNV, CTX) with per-superpopulation allele frequencies. Useful for contrasting short-read vs. long-read SV breakpoints and for spotting variants unique to long-read data.

Data Access

Each subtrack has its own documentation page with details on how to download and intersect the underlying annotations.

References

Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y et al. Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility. Nat Commun. 2025 Feb 10;16(1):1494. PMID: 39929826; PMC: PMC11811171

Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M et al. Structural variation in 1,019 diverse humans based on long-read sequencing. Nature. 2025 Aug;644(8076):442-452. PMID: 40702182; PMC: PMC12350158

Otsuki A, Okamura Y, Ishida N, Tadaka S, Takayama J, Kumada K, Kawashima J, Taguchi K, Minegishi N, Kuriyama S et al. Construction of a trio-based structural variation panel utilizing activated T lymphocytes and long- read sequencing technology. Commun Biol. 2022 Sep 20;5(1):991. PMID: 36127505; PMC: PMC9489684

Garimella KV, Li Q, Wertz J, Lee SK, Cunial F, Huang Y, Mostovoy Y, Lorig-Roach R, English A, Su H et al. Population-scale Long-read Sequencing in the All of Us Research Program. medRxiv. 2025 Oct 5;. PMID: 41256123; PMC: PMC12622093

Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, Baybayan P, Belden B et al. Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes. Genet Med. 2022 Jun;24(6):1336-1348. PMID: 35305867

Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT et al. Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits. Nat Genet. 2021 Jun;53(6):779-786. PMID: 33972781

Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D et al. Complex genetic variation in nearly complete human genomes. Nature. 2025 Aug;644(8076):430-441. PMID: 40702183; PMC: PMC12350169

Kim K, Lin Z, Simmons SK, Parker J, Kearney M, Liao Z, Haywood N, Zhang J, Cline MP, Tuncali I et al. Integrating Long-Read Structural Variant Analysis with single-nucleus RNA-seq to Elucidate Gene Expression Effects in Disease. bioRxiv. 2026 Mar 23;. PMID: 41929179; PMC: PMC13041997

Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, Lin F, Keenan K, Patel RV et al. GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. Heredity (Edinb). 2026 Mar;135(3):199-210. PMID: 41203876; PMC: PMC13031531

Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W et al. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res. 2024 Nov 20;34(11):2061-2073. PMID: 39358015; PMC: PMC11610458

Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021 Apr 2;372(6537). PMID: 33632895; PMC: PMC8026704

Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022 Sep 1;185(18):3426-3440.e19. PMID: 36055201; PMC: PMC9439720