c81011d4a8f57db347e15aa1248c501b2c8a6fea lrnassar Mon Jun 1 13:16:15 2026 -0700 QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258 Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv, chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the lrSvMergeAll.py generator so a re-run reproduces it). Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic" and "UAE UPR"). lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1 native 161,332, each with its own type breakdown) and encode non-ASCII author names as numeric entities. hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231 DEL+INS, 176,531 total). colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well as hg38. cpc1Sv.html: encode a Unicode minus sign as a numeric entity. diff --git src/hg/makeDb/trackDb/human/lrSv1kgOnt.html src/hg/makeDb/trackDb/human/lrSv1kgOnt.html index 9a379e674a6..fa8be24c6ae 100644 --- src/hg/makeDb/trackDb/human/lrSv1kgOnt.html +++ src/hg/makeDb/trackDb/human/lrSv1kgOnt.html @@ -1,117 +1,118 @@ <h2>Description</h2> <p> This track shows structural variants (SVs) identified by Oxford Nanopore long-read sequencing of 1,019 individuals from the 1000 Genomes Project, representing 26 populations across 5 continental regions: Africa (275 samples), East Asia (192), South Asia (199), Europe (189), and Americas (164). Median sequencing coverage was 16.9x per sample with a median N50 read length of 20.3 kb. </p> <p> SVs were discovered using the SAGA framework (SV Analysis by Graph Augmentation) and annotated with SVAN, which classifies insertions and deletions by their -mechanism of origin. The dataset contains 161,332 annotated SVs, -including 75,324 insertions, 66,192 deletions, and 19,816 complex rearrangements. -The original coordinates are on the T2T-CHM13 assembly (hs1); for GRCh38 (hg38), -coordinates were converted using liftOver (148,375 records mapped successfully). +mechanism of origin. The full release is native to the T2T-CHM13 assembly +(hs1) and contains 161,332 annotated SVs (75,324 insertions, 66,192 deletions, +and 19,816 complex rearrangements). For GRCh38 (hg38), coordinates were converted +using liftOver and 148,375 records mapped successfully (73,298 insertions, +58,637 deletions, and 16,440 complex rearrangements). </p> <p> The 1,019 samples sequenced here are distinct from those in the <a href="hgTrackUi?g=gustafsonSv">1KG ONT 100</a> track (Gustafson et al. 2024); the two releases were produced by separate consortia (Vienna and the 1000 Genomes ONT Sequencing Consortium, respectively) and there is no sample overlap between the two. </p> <h2>Display Conventions and Configuration</h2> <p> Items are colored by SV class: <ul> <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(230,140,0);">Complex (CPX)</span> - orange</li> </ul> </p> <p> Filters are available for SV type, insertion/deletion type, transposon family, and SV length. For insertions, the item is placed at the insertion site with a width of 1 bp; for deletions, the item spans the deleted region. </p> <p> The detail page for each item shows SVAN annotation fields including: <ul> <li><b>Insertion/Deletion Type</b>: solo (single mobile element), partnered (with transduction), orphan (transduction only), VNTR, PSD (processed pseudogene), NUMT (nuclear mitochondrial insertion), DUP (tandem duplication), DUP_INTERSPERSED, INV_DUP (inverted duplication), COMPLEX_DUP, or chimera</li> <li><b>Transposon Family</b>: Alu, L1, SVA, HERVK, or LTR5_Hs</li> <li><b>Percent Resolved</b>: fraction of inserted sequence resolved by assembly</li> <li><b>TSD Length</b>: target site duplication length</li> <li><b>Poly-A Length</b>: poly-A tail length</li> <li><b>Conformation</b>: structural conformation of the insertion (e.g. FOR+POLYA, Hexamer+Alu-like+VNTR+SINE-R+POLYA)</li> <li><b>Source Coordinates</b>: genomic location of the source element (for transductions)</li> </ul> </p> <h2>Methods</h2> <p> Schloissnig et al. 2025 generated intermediate-coverage Oxford Nanopore long-read sequencing of 1,019 samples from the 1000 Genomes Project on PromethION 48 instruments with R9.4.1 (FLO-PRO002) flow cells (SQK-LSK110 libraries, 24-h runs with flow-cell wash and reload). SVs were discovered with the SAGA framework (SV Analysis by Graph Augmentation), which combines linear-reference callers (Sniffles and DELLY, run against both GRCh38 and T2T-CHM13), graph-aware discovery with SVarp (local long-read assembly of SV-supporting graph-aligned reads) and graph-based joint genotyping with Giggles across a pangenome graph. Insertions and deletions were then annotated with <a href="https://github.com/REPBIO-LAB/svan" target="_blank"> SVAN</a> v1.3, which classifies SVs by mechanism of origin. The release contains 161,332 SVAN-annotated SVs: 75,324 insertions, 66,192 deletions and 19,816 complex rearrangements. The original VCF is on T2T-CHM13 contig coordinates; for the hg38 version of this track, SVs were lifted with liftOver (148,375 of 161,332 records mapped), while the hs1 version uses the native coordinates. </p> <p> The SVAN-annotated unphased VCF (<tt>final-vcf.unphased.SVAN_1.3.vcf.gz</tt>) was downloaded from <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/release/v1.1/svan-annotation/" target="_blank"> the IGSR 1KG_ONT_VIENNA v1.1 SVAN-annotation directory</a>; allele counts were added from the companion shapeit5-phased-callset (<tt>shapeit5-phased-callset_final-vcf.phased.vcf.gz</tt>) in the same release tree. </p> <p> The step-by-step build commands (download, liftOver, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track container: <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> Source data is available from the <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/" target="_blank">1000 Genomes ONT Vienna</a> data collection at IGSR. </p> <h2>Credits</h2> <p> Thanks to the 1000 Genomes ONT Vienna consortium for making their structural variant calls and SVAN annotations publicly available. </p> <h2>References</h2> <p> -Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, +Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M <em>et al</em>. <a href="https://doi.org/10.1038/s41586-025-09290-7" target="_blank"> Structural variation in 1,019 diverse humans based on long-read sequencing</a>. <em>Nature</em>. 2025 Aug;644(8076):442-452. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702182" target="_blank">40702182</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350158/" target="_blank">PMC12350158</a> </p>