bac95a147f49cd331052e597006e04b3deee40fc
max
  Wed Apr 22 10:43:20 2026 -0700
lrSv/srSv: human-readable SV type filter labels, script cleanups

Add human-readable labels to the supertrack-level svType filter on
both the lrSv and srSv supertracks using the "CODE|CODE (Long name)"
filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)",
etc. Labels keep the short code up front so users can match what
hgTracks shows next to each feature.

Also sweep in the in-progress converter/as-file cleanups under
scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py
helpers, consistent insLen / svLen / AC column naming, tightened
field-description text) that had been piling up as an unstaged
working tree.

refs #36258

diff --git src/hg/makeDb/trackDb/human/ga4kSv.html src/hg/makeDb/trackDb/human/ga4kSv.html
index 0a4b668a46e..1f6ef9d049e 100644
--- src/hg/makeDb/trackDb/human/ga4kSv.html
+++ src/hg/makeDb/trackDb/human/ga4kSv.html
@@ -23,47 +23,62 @@
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li>
 <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
 </ul>
 </p>
 <p>
 Insertions are placed at the insertion site with a width of 1 bp; deletions,
 duplications and inversions span the affected interval. Filters are available
 for SV type, SV length, carrier-sample count and allele frequency. The detail
 page also shows the total number of samples genotyped at each site.
 </p>
 
 <h2>Methods</h2>
 <p>
-Samples were sequenced on PacBio Revio and Sequel II instruments with HiFi
-chemistry. Single-sample SV callsets were produced with pbsv and then merged
-across the cohort with JASMINE v1.1.4 (<tt>jasmine --output-genotypes</tt>),
-which clusters equivalent SVs across samples and writes a site-level multi-sample
-VCF.
+The Genomic Answers for Kids (GA4K) program at Children's Mercy Research
+Institute is a longitudinal pediatric rare-disease initiative described in
+Cohen et al. 2022. GA4K probands and their families are sequenced with
+PacBio HiFi long reads (Revio and Sequel II), and the 502-sample GA4K
+PacBio SV release (<tt>pb_joint_merged.sv.vcf.gz</tt>) is produced by
+running <a href="https://github.com/PacificBiosciences/pbsv" target="_blank">
+pbsv</a> per sample and merging with
+<a href="https://github.com/mkirsche/Jasmine" target="_blank">JASMINE</a>
+v1.1.4 (<tt>--output-genotypes</tt>). The merged site-level VCF is
+filtered to SVs replicated in at least two independent observations
+(either matching a second unrelated CMH individual in the same Jasmine
+cluster, or matching an SV in the deCODE Icelandic or HPRC callsets via
+<a href="https://github.com/PacificBiosciences/svpack" target="_blank">
+svpack match</a>). The released catalog contains 115,554 replicated SVs
+(52,564 deletions, 58,219 insertions, 4,408 duplications and 363
+inversions) with recomputed carrier counts (SVC), total sample counts
+(SVN) and allele frequencies (SVF = SVC/SVN).
 </p>
 <p>
-To reduce false positives, the merged VCF was filtered to retain only SVs that
-were replicated in at least two independent observations: either (1) matching a
-second SV from another unrelated Children's Mercy (CMH) individual within the
-same Jasmine cluster, or (2) matching an SV from the Decode Icelandic or Human
-Pangenome Reference Consortium (HPRC) callsets using
-<tt>svpack match</tt> with default settings.
+The source VCF was cloned from the Children's Mercy Research Institute
+GA4K GitHub repository,
+<a href="https://github.com/ChildrensMercyResearchInstitute/GA4K" target="_blank">
+github.com/ChildrensMercyResearchInstitute/GA4K</a>
+(<tt>pacbio_sv_vcf/pb_joint_merged.sv.vcf.gz</tt>).
 </p>
 <p>
-Carrier counts (SVC), total sample counts (SVN) and allele frequencies
-(SVF = SVC/SVN) were recomputed on the replicated callset.
+The step-by-step build commands (download, format conversion, bigBed build)
+are recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there
 to spreadsheet or tab-sep tables. From scripts, the data can be accessed
 through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>ga4kSv</i>.
 </p>
 <p>
 For automated download and analysis, the annotation is stored in a bigBed file
 that can be downloaded from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
 download server</a>. The file for this track is called <tt>ga4kSv.bb</tt>.