bac95a147f49cd331052e597006e04b3deee40fc
max
  Wed Apr 22 10:43:20 2026 -0700
lrSv/srSv: human-readable SV type filter labels, script cleanups

Add human-readable labels to the supertrack-level svType filter on
both the lrSv and srSv supertracks using the "CODE|CODE (Long name)"
filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)",
etc. Labels keep the short code up front so users can match what
hgTracks shows next to each feature.

Also sweep in the in-progress converter/as-file cleanups under
scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py
helpers, consistent insLen / svLen / AC column naming, tightened
field-description text) that had been piling up as an unstaged
working tree.

refs #36258

diff --git src/hg/makeDb/trackDb/human/han945Sv.html src/hg/makeDb/trackDb/human/han945Sv.html
index 2742da33cbe..a719ff0018a 100644
--- src/hg/makeDb/trackDb/human/han945Sv.html
+++ src/hg/makeDb/trackDb/human/han945Sv.html
@@ -14,39 +14,57 @@
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li>
 <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
 <li><span style="color: rgb(140,0,200);">Translocations (TRA)</span> - purple</li>
 </ul>
 </p>
 <p>
 Filters are available for SV type, SV length, allele frequency, and number of
 supporting samples. For insertions, the item is placed at the insertion site
 with a width of 1 bp. For translocations, only the first breakpoint is shown;
 the second breakpoint chromosome and position are listed in the item details.
 </p>
 
 <h2>Methods</h2>
 <p>
-Long-read sequencing was performed on 945 Han Chinese individuals.
-Structural variants were called per sample and then merged across all samples using
+Gong et al. 2025 performed Oxford Nanopore long-read sequencing of 945
+Han Chinese individuals on PromethION instruments with R9.4 flow cells.
+Reads were aligned to GRCh38.p13 with NGMLR v0.2.7 using ONT-tuned
+parameters, and a joint-calling strategy was used to call SVs at moderate
+coverage: per-sample discovery with
+<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">cuteSV</a>
+v1.0.13, merging of breakpoints within 500 bp across individuals with
 <a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a>
-(v1.0.6). The merged VCF was converted to bigBed format for display.
-Allele frequencies and per-sample support information were extracted from the
-INFO fields of the merged VCF. The study identified two notable variants:
-an ancestral deletion in GSDMD associated with bone density and kidney injury
-risk, and a modern human-specific variant in WWP2 influencing height, body
-composition, and facial features.
+v1.0.6, per-sample re-genotyping of the merged set with LRcaller v1.0, and
+a final BCFtools merge. SVs in centromeric, pericentromeric and gap regions
+were filtered out, yielding 111,288 high-quality SVs: 49,518 deletions,
+42,300 insertions, 13,503 duplications, 5,595 inversions and 372
+translocations.
+</p>
+<p>
+The site-only VCF released at
+<a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">
+OMIX accession OED00945268</a> (<tt>OED00945268_Han_945samples_SV.vcf.gz</tt>)
+was converted to BED for this track.
+</p>
+<p>
+The step-by-step build commands (download, format conversion, bigBed build)
+are recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The raw VCF data was obtained from the
 <a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">OMIX</a>
 repository (accession OED00945268) at the National Genomics Data Center (NGDC),
 China National Center for Bioinformation.
 </p>
 <p>
 The source VCF also encodes phased per-sample genotypes: the <tt>sampleList</tt>
 field on the detail page is derived from the SURVIVOR <tt>SUPP_VEC</tt> bitmask
 and is an ordered list of the 1-based indices of the 945 samples carrying
 each SV. The full per-sample phased VCF can be browsed as a separate track in
 the <a href="hgTrackUi?g=han945SvVcf">SVs from 945 Han Chinese</a> entry of