c81011d4a8f57db347e15aa1248c501b2c8a6fea lrnassar Mon Jun 1 13:16:15 2026 -0700 QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258 Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv, chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the lrSvMergeAll.py generator so a re-run reproduces it). Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic" and "UAE UPR"). lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1 native 161,332, each with its own type breakdown) and encode non-ASCII author names as numeric entities. hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231 DEL+INS, 176,531 total). colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well as hg38. cpc1Sv.html: encode a Unicode minus sign as a numeric entity. diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index 4a6736c66a4..c9c68e0f0a9 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -1,500 +1,500 @@ <h2>Description</h2> <p> This track collection contains structural variant (SV) calls derived from long-read sequencing studies. Structural variants are genomic rearrangements larger than ~50 bp, including deletions, insertions, duplications, inversions, and translocations. Long-read sequencing technologies can span repetitive regions and resolve complex rearrangements that are difficult to detect with short-read methods. </p> <h3>Available Datasets</h3> <p> SV length statistics (min / median / max) are computed from the <tt>svLen</tt> field of each track, in base pairs. Some tracks include sites with <tt>svLen=0</tt> (complex events where the reference and alternate alleles differ in sequence but not in length). </p> <p> For short-read structural-variant comparators (CCDG 17,795, 1KG 3202, ToMMo 48K CNV) see the companion <a href="hgTrackUi?g=srSv">Short-read SVs</a> supertrack. </p> <p> Polymorphic <b>Mobile Element Insertions</b> (Alu, L1, SVA, HERVK, snRNA) called from HGSVC3 long-read assemblies are released as a separate track collection; see the <a href="hgTrackUi?g=mei">Mobile Insertions</a> tracks. Those MEIs are the insertions identified in the 65 HGSVC3 samples relative to the reference, available on both GRCh38/hg38 and T2T-CHM13/hs1. </p> <table class="stdTbl"> <tr> <th>Dataset</th> <th>N samples</th> <th>Cohort / disease</th> <th>Disease cases</th> <th>Coverage</th> <th>SV count</th> <th>Min</th> <th>Median</th> <th>Max</th> </tr> <tr> <td><a href="hgTrackUi?g=lrSvAll"><b>All merged</b></a></td> <td>—</td> <td>All long-read SV datasets merged on identical position+type+length, with per-database AC</td> <td>mixed</td> <td>mixed (PacBio HiFi, ONT)</td> <td>2,694,871</td> <td>50</td> <td>200</td> <td>190,088,223</td> </tr> <tr> <td><a href="hgTrackUi?g=colorsDbSv">CoLoRSdb</a></td> <td>1,427</td> <td>Consortium of Long-Read Sequencing, joint callset</td> <td>No</td> <td>mixed (HiFi)</td> <td>426,239</td> <td>20</td> <td>33</td> <td>101,381</td> </tr> <tr> <td><a href="hgTrackUi?g=han945Sv">Han 945</a></td> <td>945</td> <td>Han Chinese, general population</td> <td>No</td> <td>~17x ONT</td> <td>111,288</td> <td>0</td> <td>254</td> <td>99,743</td> </tr> <tr> <td><a href="hgTrackUi?g=gustafsonSv">1KG ONT 100</a></td> <td>100</td> <td>1000 Genomes, 5 superpopulations / 19 subpopulations</td> <td>No</td> <td>~37x ONT (R9.4.1)</td> <td>113,696</td> <td>0</td> <td>164</td> <td>98,289</td> </tr> <tr> <td><a href="hgTrackUi?g=lrSv1kgOnt">1KG ONT Vienna</a></td> <td>1,019</td> <td>1000 Genomes, diverse</td> <td>No</td> <td>~17x ONT</td> <td>148,375</td> <td>2</td> <td>177</td> <td>49,171</td> </tr> <tr> <td><a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese</a></td> <td>333 (111 trios)</td> <td>Japanese, general population</td> <td>No</td> <td>~22x ONT</td> <td>74,201</td> <td>51</td> <td>162</td> <td>99,980</td> </tr> <tr> <td><a href="hgTrackUi?g=aou1kSv">AoU 1K</a></td> <td>1,027</td> <td>All of Us, self-identified Black/African American; biobank includes a variety of conditions (diabetes, hearing loss, etc.)</td> <td>Yes (mixed)</td> <td>~8x HiFi</td> <td>541,049</td> <td>50</td> <td>152</td> <td>9,998</td> </tr> <tr> <td><a href="hgTrackUi?g=ga4kSv">GA4K</a></td> <td>502</td> <td>Children's Mercy, pediatric rare disease probands + families</td> <td>Yes (probands)</td> <td>~27x HiFi</td> <td>115,554</td> <td>50</td> <td>186</td> <td>809,711</td> </tr> <tr> <td><a href="hgTrackUi?g=decodeSv">deCODE 3,622</a></td> <td>3,622</td> <td>Icelandic general population</td> <td>No</td> <td>~17x ONT</td> <td>133,886</td> <td>0</td> <td>127</td> <td>861,080</td> </tr> <tr> <td><a href="hgTrackUi?g=hprc2Sv">HPRC v2</a></td> <td>233</td> <td>HPRC release-2 pangenome (CHM13 + diverse 1KG assemblies)</td> <td>No</td> <td>~60x HiFi + ~30x ONT (pangenome graph)</td> <td>1,483,114</td> <td>50</td> <td>280</td> <td>97,718</td> </tr> <tr> <td><a href="hgTrackUi?g=hgsvc2Sv">HGSVC2</a></td> <td>32</td> <td>HGSVC2 haplotype-resolved assemblies (5 superpopulations)</td> <td>No</td> <td>>40x PacBio CLR + >20x HiFi (+ Strand-seq)</td> <td>111,746</td> <td>50</td> <td>168</td> <td>57,207,414</td> </tr> <tr> <td><a href="hgTrackUi?g=hgsvc3Sv">HGSVC3</a></td> <td>65</td> <td>HGSVC3 diverse reference assemblies</td> <td>No</td> <td>~47x HiFi + ~56x ONT</td> <td>176,531</td> <td>50</td> <td>154</td> <td>30,176,500</td> </tr> <tr> - <td><a href="hgTrackUi?g=aprSv">Arab UPR</a></td> + <td><a href="hgTrackUi?g=aprSv">Arab APR</a></td> <td>53</td> - <td>UAE-resident Arabs from 8 countries (UAE Pangenome Reference)</td> + <td>UAE-resident Arabs from 8 countries (Arab Pangenome Reference)</td> <td>No</td> <td>~35x HiFi + ~54x ONT (+ Hi-C, pangenome graph)</td> <td>72,656</td> <td>1</td> <td>21</td> <td>99,885</td> </tr> <tr> <td><a href="hgTrackUi?g=cpc1Sv">CPC</a></td> <td>58</td> <td>Chinese Pangenome Consortium, 36 minority ethnic groups (HPRC-specific SVs removed)</td> <td>No</td> <td>~30x HiFi (pangenome graph)</td> <td>36,030</td> <td>1</td> <td>53</td> <td>8,998,096</td> </tr> <tr> <td><a href="hgTrackUi?g=kwanhoSv">Kim PD Brain</a></td> <td>100</td> <td>Parkinson's disease, ILBD, controls (post-mortem brain)</td> <td>Yes (PD + ILBD)</td> <td>~17x HiFi</td> <td>74,552</td> <td>50</td> <td>160</td> <td>190,088,222</td> </tr> <tr> <td><a href="hgTrackUi?g=chirmade101Sv">SVatalog 101</a></td> <td>101</td> <td>Cystic fibrosis (CF) patients from the CF Canada-Sick Kids Program in Individual CF Therapy (CFIT). Long-read WGS used for GWAS LD fine-mapping</td> <td>Yes (all CF)</td> <td>~50x PacBio CLR (34, Sequel I) + ~76x HiFi (67, Sequel II)</td> <td>87,183</td> <td>4</td> <td>160</td> <td>1,321,484</td> </tr> </table> <p> Note: there is likely some overlap in sample composition across these collections. For example, 1000 Genomes samples are also included in HPRC and CoLoRSdb. </p> <h3><a href="hgTrackUi?g=colorsDbSv">CoLoRSdb SVs</a></h3> <p> Structural variants from the Consortium of Long-Read Sequencing database (CoLoRSdb), from 1,427 PacBio HiFi long-read whole-genome sequences. ~426k SVs (insertions, deletions, inversions) called with pbsv and merged with Jasmine, with allele frequencies, genotype counts and Hardy-Weinberg statistics across the cohort. </p> <h3><a href="hgTrackUi?g=han945Sv">Han 945 SVs</a></h3> <p> Structural variants from 945 Han Chinese individuals. ~111k SVs (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. Includes allele frequencies and per-sample support. </p> <h3><a href="hgTrackUi?g=gustafsonSv">1KG ONT 100 SVs</a></h3> <p> Structural variants from Oxford Nanopore long-read sequencing of 100 1000 Genomes samples (5 superpopulations, 19 subpopulations) released by the 1000 Genomes ONT Sequencing Consortium and described in Gustafson et al. 2024. ~114k SVs (insertions, deletions, duplications, inversions) called with five callers and merged with Jasmine. This is a separate dataset from the Vienna 1KG-ONT release below; the 100 samples here do not overlap with the 1,019 samples in the Vienna release. </p> <h3><a href="hgTrackUi?g=lrSv1kgOnt">1KG ONT Vienna SVs</a></h3> <p> Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). ~161k SVs annotated with SVAN, classifying insertions and deletions by mechanism of origin (mobile elements, VNTRs, processed pseudogenes, etc.). Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver. This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above; the 1,019 samples here do not overlap with the 100 samples in that release. </p> <h3><a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese SVs</a></h3> <p> Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical Megabank (ToMMo). ~74k SVs (deletions and insertions) with trio-based Mendelian error rates and allele frequencies. </p> <h3><a href="hgTrackUi?g=aou1kSv">AoU 1K SVs</a></h3> <p> Structural variants from 1,027 individuals from the All of Us (AoU) Research Program, sequenced with PacBio HiFi long reads. AoU is a deeply phenotyped biobank that includes participants with a range of conditions (e.g. diabetes, hearing loss, hypertension), so the cohort is not disease-free. ~541k SVs (insertions and deletions) with population-specific allele frequencies, gene annotations, and clinical trait associations. </p> <h3><a href="hgTrackUi?g=ga4kSv">GA4K SVs</a></h3> <p> Structural variants from 502 probands and family members enrolled in the Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's Mercy Research Institute, sequenced with PacBio HiFi long reads. ~116k replicated SVs (deletions, insertions, duplications, inversions) called with pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs and short indels) lives alongside other population allele-frequency resources as <a href="hgTrackUi?g=ga4kSnv">GA4K 552 PacBio LR</a> in the Variant Frequencies track collection. </p> <h3><a href="hgTrackUi?g=decodeSv">deCODE 3,622 SVs</a></h3> <p> High-confidence structural variants from 3,622 Icelanders (deCODE genetics), sequenced with Oxford Nanopore long reads. ~134k SVs (deletions, insertions and combined insertion/deletion events). Site-only callset with annotated surrounding tandem-repeat regions. </p> <h3><a href="hgTrackUi?g=hprc2Sv">HPRC v2 SVs</a></h3> <p> Structural variants derived from the Human Pangenome Reference Consortium release-2 minigraph-cactus pangenome graph, built from 233 PacBio HiFi haplotype-resolved assemblies (CHM13 + diverse 1000 Genomes samples). ~1.5M SV-sized alleles (INS, DEL, COMPLEX, INV) extracted with <tt>vg deconstruct</tt> and decomposed with <tt>vcfwave</tt> (WFA2). </p> <h3><a href="hgTrackUi?g=hgsvc2Sv">HGSVC2 32 SVs</a></h3> <p> Structural variants from 32 haplotype-resolved diploid genomes (HGSVC2 freeze 4, Ebert et al. 2021). ~112k SVs (deletions, insertions and inversions) called from phased de novo assemblies with PAV, with per-variant 1000 Genomes population allele frequencies (insertions and deletions) and rich structural/gene annotations. An earlier HGSVC release complementary to <a href="hgTrackUi?g=hgsvc3Sv">HGSVC3</a>. </p> <h3><a href="hgTrackUi?g=hgsvc3Sv">HGSVC3 65 SVs</a></h3> <p> Structural variants from 65 diverse individuals sequenced and de novo assembled by the Human Genome Structural Variation Consortium phase 3 (HGSVC3). ~177k haplotype-resolved SVs (deletions, insertions and inversions) called with PAV and cross-validated with ten additional callers, with per-site carrier haplotype lists and structural annotations. </p> <h3><a href="hgTrackUi?g=kwanhoSv">Kim PD Brain SVs</a></h3> <p> Structural variants from 100 post-mortem brain samples (Parkinson's disease, incidental Lewy body disease, and healthy controls) sequenced with PacBio HiFi long reads. ~75k high-confidence SVs (deletions, insertions, duplications, inversions) with per-cohort allele frequencies and case-control carrier-rate differentials, from Kim et al. 2026. </p> <h3><a href="hgTrackUi?g=chirmade101Sv">SVatalog 101 SVs</a></h3> <p> Structural variants from 101 long-read whole-genome sequences released alongside the GWAS SVatalog tool (Chirmade et al. 2026). The samples come from the CF Canada-Sick Kids Program in Individual CF Therapy (CFIT), a cystic-fibrosis (CF) patient cohort assembled to model patient-specific responses to CFTR modulator therapies (most participants are F508del homozygotes or F508del / minimal-function compound heterozygotes; a smaller number carry rare nonsense or missense CFTR mutations). ~87k SVs (deletions, insertions, duplications, inversions and complex events) annotated with gene overlaps, ClinGen / gnomAD constraint scores, OMIM / ClinVar / DGV / Decipher regional annotations. </p> <h2>Data Access</h2> <p> Each subtrack has its own documentation page with details on how to download and intersect the underlying annotations. </p> <h2>References</h2> <p> Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y <em>et al</em>. <a href="https://doi.org/10.1038/s41467-025-56661-9" target="_blank"> Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility</a>. <em>Nat Commun</em>. 2025 Feb 10;16(1):1494. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39929826" target="_blank">39929826</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811171/" target="_blank">PMC11811171</a> </p> <p> Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M <em>et al</em>. <a href="https://doi.org/10.1038/s41586-025-09290-7" target="_blank"> Structural variation in 1,019 diverse humans based on long-read sequencing</a>. <em>Nature</em>. 2025 Aug;644(8076):442-452. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702182" target="_blank">40702182</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350158/" target="_blank">PMC12350158</a> </p> <p> Otsuki A, Okamura Y, Ishida N, Tadaka S, Takayama J, Kumada K, Kawashima J, Taguchi K, Minegishi N, Kuriyama S <em>et al</em>. <a href="https://doi.org/10.1038/s42003-022-03953-1" target="_blank"> Construction of a trio-based structural variation panel utilizing activated T lymphocytes and long- read sequencing technology</a>. <em>Commun Biol</em>. 2022 Sep 20;5(1):991. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36127505" target="_blank">36127505</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9489684/" target="_blank">PMC9489684</a> </p> <p> Garimella KV, Li Q, Wertz J, Lee SK, Cunial F, Huang Y, Mostovoy Y, Lorig-Roach R, English A, Su H <em>et al</em>. <a href="https://doi.org/10.1101/2025.10.02.25336942" target="_blank"> Population-scale Long-read Sequencing in the All of Us Research Program</a>. <em>medRxiv</em>. 2025 Oct 5;. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41256123" target="_blank">41256123</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12622093/" target="_blank">PMC12622093</a> </p> <p> Cohen ASA, Farrow EG, Abdelmoity AT, Alaimo JT, Amudhavalli SM, Anderson JT, Bansal L, Bartik L, Baybayan P, Belden B <em>et al</em>. <a href="https://linkinghub.elsevier.com/retrieve/pii/S1098-3600(22)00653-0" target="_blank"> Genomic answers for children: Dynamic analyses of >1000 pediatric rare disease genomes</a>. <em>Genet Med</em>. 2022 Jun;24(6):1336-1348. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35305867" target="_blank">35305867</a> </p> <p> Beyter D, Ingimundardottir H, Oddsson A, Eggertsson HP, Bjornsson E, Jonsson H, Atlason BA, Kristmundsdottir S, Mehringer S, Hardarson MT <em>et al</em>. <a href="https://doi.org/10.1038/s41588-021-00865-4" target="_blank"> Long-read sequencing of 3,622 Icelanders provides insight into the role of structural variants in human diseases and other traits</a>. <em>Nat Genet</em>. 2021 Jun;53(6):779-786. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33972781" target="_blank">33972781</a> </p> <p> Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo D <em>et al</em>. <a href="https://doi.org/10.1038/s41586-025-09140-6" target="_blank"> Complex genetic variation in nearly complete human genomes</a>. <em>Nature</em>. 2025 Aug;644(8076):430-441. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702183" target="_blank">40702183</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350169/" target="_blank">PMC12350169</a> </p> <p> Kim K, Lin Z, Simmons SK, Parker J, Kearney M, Liao Z, Haywood N, Zhang J, Cline MP, Tuncali I <em>et al</em>. <a href="https://doi.org/10.64898/2026.03.20.713192" target="_blank"> Integrating Long-Read Structural Variant Analysis with single-nucleus RNA-seq to Elucidate Gene Expression Effects in Disease</a>. <em>bioRxiv</em>. 2026 Mar 23;. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41929179" target="_blank">41929179</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13041997/" target="_blank">PMC13041997</a> </p> <p> Chirmade S, Wang Z, Mastromatteo S, Sanders E, Thiruvahindrapuram B, Nalpathamkalam T, Pellecchia G, Lin F, Keenan K, Patel RV <em>et al</em>. <a href="https://doi.org/10.1038/s41437-025-00809-2" target="_blank"> GWAS SVatalog: a visualization tool to aid fine-mapping of GWAS loci with structural variations</a>. <em>Heredity (Edinb)</em>. 2026 Mar;135(3):199-210. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41203876" target="_blank">41203876</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13031531/" target="_blank">PMC13031531</a> </p> <p> Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA, Richmond PA, De Coster W <em>et al</em>. <a href="http://genome.cshlp.org/lookup/pmidlookup?view=long&pmid=39358015" target="_blank"> High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation</a>. <em>Genome Res</em>. 2024 Nov 20;34(11):2061-2073. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39358015" target="_blank">39358015</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610458/" target="_blank">PMC11610458</a> </p> <p> Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R <em>et al</em>. <a href="https://www.science.org/doi/10.1126/science.abf7117" target="_blank"> Haplotype-resolved diverse human genomes and integrated analysis of structural variation</a>. <em>Science</em>. 2021 Apr 2;372(6537). PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33632895" target="_blank">33632895</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8026704/" target="_blank">PMC8026704</a> </p> <p> Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, Nagulapalli K <em>et al</em>. <a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(22)00991-6" target="_blank"> High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios</a>. <em>Cell</em>. 2022 Sep 1;185(18):3426-3440.e19. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36055201" target="_blank">36055201</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9439720/" target="_blank">PMC9439720</a> </p>