bac95a147f49cd331052e597006e04b3deee40fc
max
  Wed Apr 22 10:43:20 2026 -0700
lrSv/srSv: human-readable SV type filter labels, script cleanups

Add human-readable labels to the supertrack-level svType filter on
both the lrSv and srSv supertracks using the "CODE|CODE (Long name)"
filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)",
etc. Labels keep the short code up front so users can match what
hgTracks shows next to each feature.

Also sweep in the in-progress converter/as-file cleanups under
scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py
helpers, consistent insLen / svLen / AC column naming, tightened
field-description text) that had been piling up as an unstaged
working tree.

refs #36258

diff --git src/hg/makeDb/trackDb/human/han945Sv.html src/hg/makeDb/trackDb/human/han945Sv.html
index 2742da33cbe..a719ff0018a 100644
--- src/hg/makeDb/trackDb/human/han945Sv.html
+++ src/hg/makeDb/trackDb/human/han945Sv.html
@@ -1,72 +1,90 @@
 <h2>Description</h2>
 <p>
 This track shows structural variants (SVs) identified by long-read sequencing
 of 945 Han Chinese individuals. The dataset contains 111,288 SVs merged across
 samples using SURVIVOR, including 49,518 deletions, 42,300 insertions,
 13,503 duplications, 5,595 inversions, and 372 translocations.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Items are colored by SV type:
 <ul>
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li>
 <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
 <li><span style="color: rgb(140,0,200);">Translocations (TRA)</span> - purple</li>
 </ul>
 </p>
 <p>
 Filters are available for SV type, SV length, allele frequency, and number of
 supporting samples. For insertions, the item is placed at the insertion site
 with a width of 1 bp. For translocations, only the first breakpoint is shown;
 the second breakpoint chromosome and position are listed in the item details.
 </p>
 
 <h2>Methods</h2>
 <p>
-Long-read sequencing was performed on 945 Han Chinese individuals.
-Structural variants were called per sample and then merged across all samples using
+Gong et al. 2025 performed Oxford Nanopore long-read sequencing of 945
+Han Chinese individuals on PromethION instruments with R9.4 flow cells.
+Reads were aligned to GRCh38.p13 with NGMLR v0.2.7 using ONT-tuned
+parameters, and a joint-calling strategy was used to call SVs at moderate
+coverage: per-sample discovery with
+<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">cuteSV</a>
+v1.0.13, merging of breakpoints within 500 bp across individuals with
 <a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a>
-(v1.0.6). The merged VCF was converted to bigBed format for display.
-Allele frequencies and per-sample support information were extracted from the
-INFO fields of the merged VCF. The study identified two notable variants:
-an ancestral deletion in GSDMD associated with bone density and kidney injury
-risk, and a modern human-specific variant in WWP2 influencing height, body
-composition, and facial features.
+v1.0.6, per-sample re-genotyping of the merged set with LRcaller v1.0, and
+a final BCFtools merge. SVs in centromeric, pericentromeric and gap regions
+were filtered out, yielding 111,288 high-quality SVs: 49,518 deletions,
+42,300 insertions, 13,503 duplications, 5,595 inversions and 372
+translocations.
+</p>
+<p>
+The site-only VCF released at
+<a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">
+OMIX accession OED00945268</a> (<tt>OED00945268_Han_945samples_SV.vcf.gz</tt>)
+was converted to BED for this track.
+</p>
+<p>
+The step-by-step build commands (download, format conversion, bigBed build)
+are recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The raw VCF data was obtained from the
 <a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">OMIX</a>
 repository (accession OED00945268) at the National Genomics Data Center (NGDC),
 China National Center for Bioinformation.
 </p>
 <p>
 The source VCF also encodes phased per-sample genotypes: the <tt>sampleList</tt>
 field on the detail page is derived from the SURVIVOR <tt>SUPP_VEC</tt> bitmask
 and is an ordered list of the 1-based indices of the 945 samples carrying
 each SV. The full per-sample phased VCF can be browsed as a separate track in
 the <a href="hgTrackUi?g=han945SvVcf">SVs from 945 Han Chinese</a> entry of
 the <a href="hgTrackUi?g=phasedVars">Phased Variants</a> track collection.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Gong et al. for making their structural variant calls publicly available.
 </p>
 
 <h2>References</h2>
 
 <p>
 Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y <em>et al</em>.
 <a href="https://doi.org/10.1038/s41467-025-56661-9" target="_blank">
 Long-read sequencing of 945 Han individuals identifies structural variants associated with
 phenotypic diversity and disease susceptibility</a>.
 <em>Nat Commun</em>. 2025 Feb 10;16(1):1494.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39929826" target="_blank">39929826</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811171/" target="_blank">PMC11811171</a>
 </p>