c81011d4a8f57db347e15aa1248c501b2c8a6fea lrnassar Mon Jun 1 13:16:15 2026 -0700 QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258 Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv, chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the lrSvMergeAll.py generator so a re-run reproduces it). Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic" and "UAE UPR"). lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1 native 161,332, each with its own type breakdown) and encode non-ASCII author names as numeric entities. hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231 DEL+INS, 176,531 total). colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well as hg38. cpc1Sv.html: encode a Unicode minus sign as a numeric entity. diff --git src/hg/makeDb/trackDb/human/lrSv1kgOnt.html src/hg/makeDb/trackDb/human/lrSv1kgOnt.html index 9a379e674a6..fa8be24c6ae 100644 --- src/hg/makeDb/trackDb/human/lrSv1kgOnt.html +++ src/hg/makeDb/trackDb/human/lrSv1kgOnt.html @@ -1,117 +1,118 @@

Description

This track shows structural variants (SVs) identified by Oxford Nanopore long-read sequencing of 1,019 individuals from the 1000 Genomes Project, representing 26 populations across 5 continental regions: Africa (275 samples), East Asia (192), South Asia (199), Europe (189), and Americas (164). Median sequencing coverage was 16.9x per sample with a median N50 read length of 20.3 kb.

SVs were discovered using the SAGA framework (SV Analysis by Graph Augmentation) and annotated with SVAN, which classifies insertions and deletions by their -mechanism of origin. The dataset contains 161,332 annotated SVs, -including 75,324 insertions, 66,192 deletions, and 19,816 complex rearrangements. -The original coordinates are on the T2T-CHM13 assembly (hs1); for GRCh38 (hg38), -coordinates were converted using liftOver (148,375 records mapped successfully). +mechanism of origin. The full release is native to the T2T-CHM13 assembly +(hs1) and contains 161,332 annotated SVs (75,324 insertions, 66,192 deletions, +and 19,816 complex rearrangements). For GRCh38 (hg38), coordinates were converted +using liftOver and 148,375 records mapped successfully (73,298 insertions, +58,637 deletions, and 16,440 complex rearrangements).

The 1,019 samples sequenced here are distinct from those in the 1KG ONT 100 track (Gustafson et al. 2024); the two releases were produced by separate consortia (Vienna and the 1000 Genomes ONT Sequencing Consortium, respectively) and there is no sample overlap between the two.

Display Conventions and Configuration

Items are colored by SV class:

Deletions (DEL) - red
Insertions (INS) - blue
Complex (CPX) - orange

Filters are available for SV type, insertion/deletion type, transposon family, and SV length. For insertions, the item is placed at the insertion site with a width of 1 bp; for deletions, the item spans the deleted region.

The detail page for each item shows SVAN annotation fields including:

Insertion/Deletion Type: solo (single mobile element), partnered (with transduction), orphan (transduction only), VNTR, PSD (processed pseudogene), NUMT (nuclear mitochondrial insertion), DUP (tandem duplication), DUP_INTERSPERSED, INV_DUP (inverted duplication), COMPLEX_DUP, or chimera
Transposon Family: Alu, L1, SVA, HERVK, or LTR5_Hs
Percent Resolved: fraction of inserted sequence resolved by assembly
TSD Length: target site duplication length
Poly-A Length: poly-A tail length
Conformation: structural conformation of the insertion (e.g. FOR+POLYA, Hexamer+Alu-like+VNTR+SINE-R+POLYA)
Source Coordinates: genomic location of the source element (for transductions)

Methods

Schloissnig et al. 2025 generated intermediate-coverage Oxford Nanopore long-read sequencing of 1,019 samples from the 1000 Genomes Project on PromethION 48 instruments with R9.4.1 (FLO-PRO002) flow cells (SQK-LSK110 libraries, 24-h runs with flow-cell wash and reload). SVs were discovered with the SAGA framework (SV Analysis by Graph Augmentation), which combines linear-reference callers (Sniffles and DELLY, run against both GRCh38 and T2T-CHM13), graph-aware discovery with SVarp (local long-read assembly of SV-supporting graph-aligned reads) and graph-based joint genotyping with Giggles across a pangenome graph. Insertions and deletions were then annotated with SVAN v1.3, which classifies SVs by mechanism of origin. The release contains 161,332 SVAN-annotated SVs: 75,324 insertions, 66,192 deletions and 19,816 complex rearrangements. The original VCF is on T2T-CHM13 contig coordinates; for the hg38 version of this track, SVs were lifted with liftOver (148,375 of 161,332 records mapped), while the hs1 version uses the native coordinates.

The SVAN-annotated unphased VCF (final-vcf.unphased.SVAN_1.3.vcf.gz) was downloaded from the IGSR 1KG_ONT_VIENNA v1.1 SVAN-annotation directory; allele counts were added from the companion shapeit5-phased-callset (shapeit5-phased-callset_final-vcf.phased.vcf.gz) in the same release tree.

The step-by-step build commands (download, liftOver, format conversion, bigBed build) are recorded in the UCSC makeDoc for this track container: doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in makeDb/scripts/lrSv.

Data Access

Source data is available from the 1000 Genomes ONT Vienna data collection at IGSR.

Credits

Thanks to the 1000 Genomes ONT Vienna consortium for making their structural variant calls and SVAN annotations publicly available.

References

-Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, +Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T, Asparuhova M et al. Structural variation in 1,019 diverse humans based on long-read sequencing. Nature. 2025 Aug;644(8076):442-452. PMID: 40702182; PMC: PMC12350158