src/hg/makeDb/trackDb/human/lrSv1kgOnt.html c81011d4a8f57db347e15aa1248c501b2c8a6fea

c81011d4a8f57db347e15aa1248c501b2c8a6fea
lrnassar
  Mon Jun 1 13:16:15 2026 -0700
QA fixes for the lrSv long-read SV supertrack: labels and description cleanups. refs #36258

Trim six subtrack longLabels to the 85-char limit (ga4kSv, hprc2Sv, hgsvc2Sv,
chirmade101Sv, cpc1Sv, and lrSvAll; the lrSvAll change is also made in the
lrSvMergeAll.py generator so a re-run reproduces it).
Standardize the APR dataset name to "Arab Pangenome Reference (APR)" across
lrSv.ra, lrSv.html, aprSv.html, and the makeDoc comment (was a mix of "Arabic"
and "UAE UPR").
lrSv1kgOnt.html: state per-assembly SV counts (hg38 lifted 148,375 vs hs1
native 161,332, each with its own type breakdown) and encode non-ASCII author
names as numeric entities.
hgsvc3Sv.html: correct the hg38 counts to match the served bigBed (176,231
DEL+INS, 176,531 total).
colorsDbSv.html: use $db in the hgdownload path so it resolves on hs1 as well
as hg38.
cpc1Sv.html: encode a Unicode minus sign as a numeric entity.

diff --git src/hg/makeDb/trackDb/human/lrSv1kgOnt.html src/hg/makeDb/trackDb/human/lrSv1kgOnt.html
index 9a379e674a6..fa8be24c6ae 100644
--- src/hg/makeDb/trackDb/human/lrSv1kgOnt.html
+++ src/hg/makeDb/trackDb/human/lrSv1kgOnt.html
@@ -1,117 +1,118 @@
 <h2>Description</h2>
 <p>
 This track shows structural variants (SVs) identified by Oxford Nanopore long-read
 sequencing of 1,019 individuals from the 1000 Genomes Project, representing 26
 populations across 5 continental regions: Africa (275 samples), East Asia (192),
 South Asia (199), Europe (189), and Americas (164). Median sequencing coverage
 was 16.9x per sample with a median N50 read length of 20.3 kb.
 </p>
 <p>
 SVs were discovered using the SAGA framework (SV Analysis by Graph Augmentation)
 and annotated with SVAN, which classifies insertions and deletions by their
-mechanism of origin. The dataset contains 161,332 annotated SVs,
-including 75,324 insertions, 66,192 deletions, and 19,816 complex rearrangements.
-The original coordinates are on the T2T-CHM13 assembly (hs1); for GRCh38 (hg38),
-coordinates were converted using liftOver (148,375 records mapped successfully).
+mechanism of origin. The full release is native to the T2T-CHM13 assembly
+(hs1) and contains 161,332 annotated SVs (75,324 insertions, 66,192 deletions,
+and 19,816 complex rearrangements). For GRCh38 (hg38), coordinates were converted
+using liftOver and 148,375 records mapped successfully (73,298 insertions,
+58,637 deletions, and 16,440 complex rearrangements).
 </p>
 <p>
 The 1,019 samples sequenced here are distinct from those in the
 <a href="hgTrackUi?g=gustafsonSv">1KG ONT 100</a> track (Gustafson et al. 2024);
 the two releases were produced by separate consortia (Vienna and the 1000 Genomes
 ONT Sequencing Consortium, respectively) and there is no sample overlap between
 the two.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Items are colored by SV class:
 <ul>
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(230,140,0);">Complex (CPX)</span> - orange</li>
 </ul>
 </p>
 <p>
 Filters are available for SV type, insertion/deletion type, transposon family,
 and SV length. For insertions, the item is placed at the insertion site with a
 width of 1 bp; for deletions, the item spans the deleted region.
 </p>
 <p>
 The detail page for each item shows SVAN annotation fields including:
 <ul>
 <li><b>Insertion/Deletion Type</b>: solo (single mobile element), partnered
 (with transduction), orphan (transduction only), VNTR, PSD (processed pseudogene),
 NUMT (nuclear mitochondrial insertion), DUP (tandem duplication),
 DUP_INTERSPERSED, INV_DUP (inverted duplication), COMPLEX_DUP, or chimera</li>
 <li><b>Transposon Family</b>: Alu, L1, SVA, HERVK, or LTR5_Hs</li>
 <li><b>Percent Resolved</b>: fraction of inserted sequence resolved by assembly</li>
 <li><b>TSD Length</b>: target site duplication length</li>
 <li><b>Poly-A Length</b>: poly-A tail length</li>
 <li><b>Conformation</b>: structural conformation of the insertion
 (e.g. FOR+POLYA, Hexamer+Alu-like+VNTR+SINE-R+POLYA)</li>
 <li><b>Source Coordinates</b>: genomic location of the source element (for transductions)</li>
 </ul>
 </p>
 
 <h2>Methods</h2>
 <p>
 Schloissnig et al. 2025 generated intermediate-coverage Oxford Nanopore
 long-read sequencing of 1,019 samples from the 1000 Genomes Project on
 PromethION 48 instruments with R9.4.1 (FLO-PRO002) flow cells (SQK-LSK110
 libraries, 24-h runs with flow-cell wash and reload). SVs were discovered
 with the SAGA framework (SV Analysis by Graph Augmentation), which combines
 linear-reference callers (Sniffles and DELLY, run against both GRCh38 and
 T2T-CHM13), graph-aware discovery with SVarp (local long-read assembly of
 SV-supporting graph-aligned reads) and graph-based joint genotyping with
 Giggles across a pangenome graph. Insertions and deletions were then
 annotated with <a href="https://github.com/REPBIO-LAB/svan" target="_blank">
 SVAN</a> v1.3, which classifies SVs by mechanism of origin. The release
 contains 161,332 SVAN-annotated SVs: 75,324 insertions, 66,192 deletions
 and 19,816 complex rearrangements. The original VCF is on T2T-CHM13 contig
 coordinates; for the hg38 version of this track, SVs were lifted with
 liftOver (148,375 of 161,332 records mapped), while the hs1 version uses
 the native coordinates.
 </p>
 <p>
 The SVAN-annotated unphased VCF (<tt>final-vcf.unphased.SVAN_1.3.vcf.gz</tt>)
 was downloaded from
 <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/release/v1.1/svan-annotation/" target="_blank">
 the IGSR 1KG_ONT_VIENNA v1.1 SVAN-annotation directory</a>; allele counts
 were added from the companion shapeit5-phased-callset
 (<tt>shapeit5-phased-callset_final-vcf.phased.vcf.gz</tt>) in the same
 release tree.
 </p>
 <p>
 The step-by-step build commands (download, liftOver, format conversion,
 bigBed build) are recorded in the UCSC makeDoc for this track container:
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
 doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
 makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 Source data is available from the
 <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1KG_ONT_VIENNA/"
    target="_blank">1000 Genomes ONT Vienna</a> data collection at IGSR.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the 1000 Genomes ONT Vienna consortium for making their structural
 variant calls and SVAN annotations publicly available.
 </p>
 
 <h2>References</h2>
 
 <p>
-Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, Söylev A, Hüther P, Ashraf H, Prodanov T,
+Schloissnig S, Pani S, Ebler J, Hain C, Tsapalou V, S&#246;ylev A, H&#252;ther P, Ashraf H, Prodanov T,
 Asparuhova M <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-025-09290-7" target="_blank">
 Structural variation in 1,019 diverse humans based on long-read sequencing</a>.
 <em>Nature</em>. 2025 Aug;644(8076):442-452.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702182" target="_blank">40702182</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350158/" target="_blank">PMC12350158</a>
 </p>