ac18a42f0dafb4febaaeaebcd53fe75df9b83234 max Mon May 11 08:29:14 2026 -0700 lrSv: add Coverage column, drop redundant Sequencing column, rename SVs to SV count Coverage values pulled from the per-subtrack methods sections and the underlying papers (Han 17x, deCODE 17x, GA4K 27x, HPRC 60x HiFi + 30x ONT, etc.). Sequencing technology is now folded into the Coverage cells. Also cross-links to the new HGSVC3 Mobile Insertions tracks. refs #36642 diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index ae8d30a51e9..ebf9c5bdf1f 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -7,213 +7,221 @@ that are difficult to detect with short-read methods.
SV length statistics (min / median / max) are computed from the svLen field of each track, in base pairs. Some tracks include sites with svLen=0 (complex events where the reference and alternate alleles differ in sequence but not in length).
For short-read structural-variant comparators (CCDG 17,795, 1KG 3202, ToMMo 48K CNV) see the companion Short-read SVs supertrack.
++Polymorphic Mobile Element Insertions (Alu, L1, SVA, HERVK, +snRNA) called from HGSVC3 long-read assemblies are released as a +separate track collection — see the +Mobile Insertions tracks. Those MEIs are +the insertions identified in the 65 HGSVC3 samples relative to the +reference, available on both GRCh38/hg38 and T2T-CHM13/hs1. +
| Dataset | N samples | Cohort / disease | Disease cases | -Sequencing | -SVs | +Coverage | +SV count | Min | Median | Max |
|---|---|---|---|---|---|---|---|---|---|---|
| All merged | -— | +— | All long-read SV datasets merged on identical position+type+length, with per-database AC | mixed | mixed (PacBio HiFi, ONT) | 2,694,871 | 50 | 200 | 190,088,223 | |
| CoLoRSdb | 1,427 | Consortium of Long-Read Sequencing, joint callset | No | -PacBio HiFi | +mixed (HiFi) | 426,239 | 20 | 33 | 101,381 | |
| Han 945 | 945 | Han Chinese, general population | No | -ONT (PromethION) | +~17x ONT | 111,288 | 0 | 254 | 99,743 | |
| 1KG ONT 100 | 100 | -1000 Genomes, 5 superpopulations / 19 subpop., high 37x seq. coverage | +1000 Genomes, 5 superpopulations / 19 subpopulations | No | -ONT (R9.4.1) | +~37x ONT (R9.4.1) | 113,696 | 0 | 164 | 98,289 |
| 1KG ONT Vienna | 1,019 | -1000 Genomes, diverse, normal 17x seq. coverage | +1000 Genomes, diverse | No | -ONT | +~17x ONT | 148,375 | 2 | 177 | 49,171 |
| ToMMo Japanese | 333 (111 trios) | Japanese, general population | No | -ONT | +~22x ONT | 74,201 | 51 | 162 | 99,980 | |
| AoU 1K | 1,027 | -All of Us, self-identified Black/African American, 8x cov.; biobank includes a variety of conditions (diabetes, hearing loss, etc.) | +All of Us, self-identified Black/African American; biobank includes a variety of conditions (diabetes, hearing loss, etc.) | Yes (mixed) | -PacBio HiFi | +~8x HiFi | 541,049 | 50 | 152 | 9,998 |
| GA4K | 502 | Children's Mercy, pediatric rare disease probands + families | Yes (probands) | -PacBio HiFi | +~27x HiFi | 115,554 | 50 | 186 | 809,711 | |
| deCODE 3,622 | 3,622 | Icelandic general population | No | -ONT | +~17x ONT | 133,886 | 0 | 127 | 861,080 | |
| HPRC v2 | 233 | HPRC release-2 pangenome (CHM13 + diverse 1KG assemblies) | No | -PacBio HiFi (pangenome graph) | +~60x HiFi + ~30x ONT (pangenome graph) | 1,483,114 | 50 | 280 | 97,718 | |
| HGSVC2 | 32 | HGSVC2 haplotype-resolved assemblies (5 superpopulations) | No | -PacBio CLR + HiFi + Strand-seq | +>40x PacBio CLR + >20x HiFi (+ Strand-seq) | 111,746 | 50 | 168 | 57,207,414 | |
| HGSVC3 | 65 | HGSVC3 diverse reference assemblies | No | -PacBio HiFi + ONT | +~47x HiFi + ~56x ONT | 176,531 | 50 | 154 | 30,176,500 | |
| Arab UPR | 53 | UAE-resident Arabs from 8 countries (UAE Pangenome Reference) | No | -PacBio HiFi + ONT + Hi-C (pangenome graph) | +~35x HiFi + ~54x ONT (+ Hi-C, pangenome graph) | 72,656 | 1 | 21 | 99,885 | |
| CPC | 58 | Chinese Pangenome Consortium, 36 minority ethnic groups (HPRC-specific SVs removed) | No | -PacBio HiFi (pangenome graph) | +~30x HiFi (pangenome graph) | 36,030 | 1 | 53 | 8,998,096 | |
| Kim PD Brain | 100 | Parkinson's disease, ILBD, controls (post-mortem brain) | Yes (PD + ILBD) | -PacBio HiFi | +~17x HiFi | 74,552 | 50 | 160 | 190,088,222 | |
| SVatalog 101 | 101 | Cystic fibrosis (CF) patients from the CF Canada-Sick Kids Program in Individual CF Therapy (CFIT). Long-read WGS used for GWAS LD fine-mapping | Yes (all CF) | -long-read | +~50x PacBio CLR (34, Sequel I) + ~76x HiFi (67, Sequel II) | 87,183 | 4 | 160 | 1,321,484 |
Note: there is likely some overlap in sample composition across these collections. For example, 1000 Genomes samples are also included in HPRC and CoLoRSdb.
Structural variants from the Consortium of Long-Read Sequencing database