f4d6633d6a724d7b682f9f49ed983e22a5e0975d max Mon Apr 20 14:41:07 2026 -0700 updating a few lrSv subtracks, and moving the colorsDbSnv track under the varFreqs track. refs #36642 diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index 7517ed38730..e2d16939843 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -2,53 +2,77 @@
This track collection contains structural variant (SV) calls derived from long-read sequencing studies. Structural variants are genomic rearrangements larger than ~50 bp, including deletions, insertions, duplications, inversions, and translocations. Long-read sequencing technologies can span repetitive regions and resolve complex rearrangements that are difficult to detect with short-read methods.
SV length statistics (min / median / max) are computed from the svLen field of each track, in base pairs. Some tracks include sites with svLen=0 (complex events where the reference and alternate alleles differ in sequence but not in length).
++All subtracks below are long-read callsets, except the last row (1KG 3202, +Illumina short-read), which is included as a short-read comparator. +
| Dataset | N samples | Cohort / disease | Sequencing | SVs | Min | Median | Max | ||
|---|---|---|---|---|---|---|---|---|---|
| CoLoRSdb | +1,427 | +Consortium of Long-Read Sequencing, joint callset | +PacBio HiFi | +426,239 | +20 | +33 | +101,381 | +||
| Han 945 | 945 | Han Chinese, general population | ONT (PromethION) | 111,288 | 0 | 254 | 99,743 | ||
| 1KG ONT | +1KG ONT 100 | +100 | +1000 Genomes, 5 superpopulations / 19 subpopulations | +ONT (R9.4.1) | +113,696 | +0 | +164 | +98,289 | +|
| 1KG ONT Vienna | 1,019 | 1000 Genomes, globally diverse | ONT | 148,375 | 2 | 177 | 49,171 | ||
| ToMMo Japanese | 333 (111 trios) | Japanese, general population | ONT | 74,201 | 51 | @@ -73,70 +97,119 @@115,554 | 50 | 186 | 809,711 |
| deCODE 3,622 | 3,622 | Icelandic general population | ONT | 133,886 | 0 | 127 | 861,080 | ||
| HPRC v2 | +233 | +HPRC release-2 pangenome (CHM13 + diverse 1KG assemblies) | +PacBio HiFi (pangenome graph) | +1,483,114 | +50 | +280 | +97,718 | +||
| HGSVC2 | +32 | +HGSVC2 haplotype-resolved assemblies (5 superpopulations) | +PacBio CLR + HiFi + Strand-seq | +111,746 | +50 | +168 | +57,207,414 | +||
| HGSVC3 | 65 | HGSVC3 diverse reference assemblies | PacBio HiFi + ONT | 176,531 | 50 | 154 | 30,176,500 | ||
| Kim PD Brain | 100 | Parkinson's disease, ILBD, controls (post-mortem brain) | PacBio HiFi | 74,552 | 50 | 160 | 190,088,222 | ||
| SVatalog 101 | 101 | Long-read WGS cohort for GWAS LD fine-mapping (SickKids) | long-read | 87,183 | 4 | 160 | 1,321,484 | ||
| 1KG 3202 (short-read) | +3,202 | +1000 Genomes expanded cohort (short-read comparator) | +Illumina short-read | +173,366 | +1 | +314 | +154,807,729 | +
+Structural variants from the Consortium of Long-Read Sequencing database +(CoLoRSdb), from 1,427 PacBio HiFi long-read whole-genome sequences. +426,239 SVs (insertions, deletions, inversions) called with pbsv and +merged with Jasmine, with allele frequencies, genotype counts and +Hardy-Weinberg statistics across the cohort. +
+Structural variants from 945 Han Chinese individuals. 111,288 SVs (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. Includes allele frequencies and per-sample support.
-+Structural variants from Oxford Nanopore long-read sequencing of 100 +1000 Genomes samples (5 superpopulations, 19 subpopulations) released +by the 1000 Genomes ONT Sequencing Consortium and described in +Gustafson et al. 2024. 113,696 SVs (insertions, deletions, duplications, +inversions) called with five callers and merged with Jasmine. This is a +separate dataset from the Vienna 1KG-ONT release below. +
+ +Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism of origin (mobile elements, VNTRs, processed pseudogenes, etc.). Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver.
Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian error rates and allele frequencies.
High-confidence structural variants from 3,622 Icelanders (deCODE genetics), sequenced with Oxford Nanopore long reads. 133,886 SVs (deletions, insertions and combined insertion/deletion events). Site-only callset with annotated surrounding tandem-repeat regions.
++Structural variants derived from the Human Pangenome Reference Consortium +release-2 minigraph-cactus pangenome graph, built from 233 PacBio HiFi +haplotype-resolved assemblies (CHM13 + diverse 1000 Genomes samples). +1,483,114 SV-sized alleles (INS, DEL, COMPLEX, INV) extracted with +vg deconstruct and decomposed with vcfwave (WFA2). +
+ ++Structural variants from 32 haplotype-resolved diploid genomes (HGSVC2 +freeze 4, Ebert et al. 2021). 111,746 SVs (deletions, insertions and +inversions) called from phased de novo assemblies with PAV, with +per-variant 1000 Genomes population allele frequencies (insertions and +deletions) and rich structural/gene annotations. An earlier HGSVC release +complementary to HGSVC3. +
+Structural variants from 65 diverse individuals sequenced and de novo assembled by the Human Genome Structural Variation Consortium phase 3 (HGSVC3). 176,532 haplotype-resolved SVs (deletions, insertions and inversions) called with PAV and cross-validated with ten additional callers, with per-site carrier haplotype lists and structural annotations.
Structural variants from 100 post-mortem brain samples (Parkinson's disease, incidental Lewy body disease, and healthy controls) sequenced with PacBio HiFi long reads. 74,552 high-confidence SVs (deletions, insertions, duplications, inversions) with per-cohort allele frequencies and case-control carrier-rate differentials, from Kim et al. 2026.
Structural variants from 101 long-read whole-genome sequences released alongside the GWAS SVatalog tool (Chirmade et al. 2026). 87,183 SVs (deletions, insertions, duplications, inversions and complex events) annotated with gene overlaps, ClinGen / gnomAD constraint scores, OMIM / ClinVar / DGV / Decipher regional annotations.
++This is a short-read dataset, included for comparison only. +Structural variants from the expanded 1000 Genomes cohort of 3,202 +Illumina NovaSeq short-read whole genomes (Byrska-Bishop et al. 2022), +called with the GATK-SV / svtools pipeline. 173,366 SVs (DEL, INS, DUP, +INV, CPX, CNV, CTX) with per-superpopulation allele frequencies. Useful +for contrasting short-read vs. long-read SV breakpoints and for spotting +variants unique to long-read data. +
+Each subtrack has its own documentation page with details on how to download and intersect the underlying annotations.
Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y et al. Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility. Nat Commun. 2025 Feb 10;16(1):1494. PMID: 39929826; PMC: PMC13041997
Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, Lin F, Keenan K, Patel RV et al. GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations. Heredity (Edinb). 2026 Mar;135(3):199-210. PMID: 41203876; PMC: PMC13031531
+ + ++Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, +Richmond PA, De Coster W et al. + +High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive +catalog of human genetic variation. +Genome Res. 2024 Nov 20;34(11):2061-2073. +PMID: 39358015; PMC: PMC11610458 +
+ + + ++Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, +Serra Mari R et al. + +Haplotype-resolved diverse human genomes and integrated analysis of structural variation. +Science. 2021 Apr 2;372(6537). +PMID: 33632895; PMC: PMC8026704 +
+ + + ++Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, +Nagulapalli K et al. + +High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 +trios. +Cell. 2022 Sep 1;185(18):3426-3440.e19. +PMID: 36055201; PMC: PMC9439720 +
+