bac95a147f49cd331052e597006e04b3deee40fc
max
  Wed Apr 22 10:43:20 2026 -0700
lrSv/srSv: human-readable SV type filter labels, script cleanups

Add human-readable labels to the supertrack-level svType filter on
both the lrSv and srSv supertracks using the "CODE|CODE (Long name)"
filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)",
etc. Labels keep the short code up front so users can match what
hgTracks shows next to each feature.

Also sweep in the in-progress converter/as-file cleanups under
scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py
helpers, consistent insLen / svLen / AC column naming, tightened
field-description text) that had been piling up as an unstaged
working tree.

refs #36258

diff --git src/hg/makeDb/trackDb/human/chirmade101Sv.html src/hg/makeDb/trackDb/human/chirmade101Sv.html
index 3f78a6c24dc..1ffe94e15a3 100644
--- src/hg/makeDb/trackDb/human/chirmade101Sv.html
+++ src/hg/makeDb/trackDb/human/chirmade101Sv.html
@@ -28,42 +28,64 @@
 <li><span style="color: rgb(230,140,0);">Inversions (inv)</span> - orange</li>
 <li><span style="color: rgb(140,0,200);">Complex</span> - purple</li>
 </ul>
 </p>
 <p>
 Filters are available for SV type, SV length and the number of overlapping
 genes. The detail page shows the full annotation row: gene-level constraint
 scores (per overlapping gene), ClinGen / Decipher / ClinVar region matches,
 OMIM phenotype annotations and gnomAD SV frequencies at &gt;=90% reciprocal
 overlap. Because most genomic regions carry no clinical annotation, many
 columns will be blank for an arbitrary SV.
 </p>
 
 <h2>Methods</h2>
 <p>
-SVs were called from 101 long-read whole-genome sequencing samples and
-annotated as described in Chirmade et al. 2026. The annotation table used
-here (<tt>sv_annotations.tsv</tt>) is the companion data release for GWAS
-SVatalog, available from the Zenodo record linked below. Coordinates in
-the source TSV are 1-based closed and were converted to 0-based half-open
-BED for this track.
+Chirmade et al. 2026 called SVs from 101 whole-genome sequenced individuals
+enrolled in the CF Canada-SickKids Program in Individualized Therapy
+(CFIT), a predominantly-European cohort of people with cystic fibrosis.
+Each sample was sequenced with two long-read / linked-read technologies:
+PacBio continuous long reads on Sequel I (34 samples, 50x) or Sequel II
+(67 samples, 76x), and 10X Genomics linked reads on Illumina HiSeq X at
+~30x. SVs were called per sample with pbsv v2.2.2 (pbmm2 alignments) and
+Sniffles v1.0.11 (NGMLR alignments) on the PacBio CLR data, and with Long
+Ranger, CNVnator v0.4, ERDS v1.1 and Manta v1.6.0 on the 10XG data.
+Per-platform and cross-platform calls were merged in three steps using a
+50% reciprocal overlap rule (pbsv anchored, tagged by Sniffles on PacBio;
+Manta anchored, augmented by CNVnator, ERDS and Long Ranger deletions on
+10XG; then a cross-platform merge with PacBio coordinates preferred), and
+SV records present in fewer than three participants were dropped. The
+released catalog contains 87,183 SVs (42,435 deletions, 41,734 insertions,
+1,394 duplications, 912 inversions and 708 complex events); the
+pre-computed GWAS SVatalog LD analyses use a common-SV subset of 35,732
+sites against 116,870 GWAS-Catalog SNPs.
 </p>
 <p>
-Note that the SVatalog tool's pre-computed LD analyses use a common-SV
-subset (35,732 sites); the underlying long-read callset released in this
-TSV (87,183 SVs) is larger and includes rarer variants not used for LD
-visualisation.
+The annotation TSV <tt>sv_annotations.tsv</tt> was downloaded from the
+Zenodo companion record,
+<a href="https://zenodo.org/records/13367574" target="_blank">
+zenodo.org/records/13367574</a>. Coordinates in the TSV are 1-based closed
+and were converted to 0-based half-open BED for this track.
+</p>
+<p>
+The step-by-step build commands (download, coordinate shift, format
+conversion, bigBed build) are recorded in the UCSC makeDoc for this track
+container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
 programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
 track=<i>chirmade101Sv</i>.
 </p>
 <p>
 The bigBed is available from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
 download server</a> as <tt>chirmade101.bb</tt>. Example:
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/chirmade101.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.