bac95a147f49cd331052e597006e04b3deee40fc
max
  Wed Apr 22 10:43:20 2026 -0700
lrSv/srSv: human-readable SV type filter labels, script cleanups

Add human-readable labels to the supertrack-level svType filter on
both the lrSv and srSv supertracks using the "CODE|CODE (Long name)"
filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)",
etc. Labels keep the short code up front so users can match what
hgTracks shows next to each feature.

Also sweep in the in-progress converter/as-file cleanups under
scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py
helpers, consistent insLen / svLen / AC column naming, tightened
field-description text) that had been piling up as an unstaged
working tree.

refs #36258

diff --git src/hg/makeDb/trackDb/human/kwanhoSv.html src/hg/makeDb/trackDb/human/kwanhoSv.html
index 4c641c20175..76c2050aabd 100644
--- src/hg/makeDb/trackDb/human/kwanhoSv.html
+++ src/hg/makeDb/trackDb/human/kwanhoSv.html
@@ -47,41 +47,66 @@
 which cohorts include at least one carrier.</li>
 <li><b>Carrier rates</b>: fraction of cases (PD+ILBD) and controls (HC)
 carrying the variant, and the case-minus-control differential.</li>
 <li><b>Per-cohort AF / AC / AN</b>: alternate allele frequency, alternate
 allele count, and total called alleles in PD, HC and ILBD samples.</li>
 <li><b>Carrier lists</b>: sample IDs carrying the variant in each cohort.</li>
 <li><b>Nearby SNP context</b>: number of SNPs nearby and the number in
 linkage disequilibrium with the SV (from the paper's LD analyses).</li>
 <li><b>Read support</b>: average mapping quality and average supporting
 reads per sample at the variant site.</li>
 </ul>
 </p>
 
 <h2>Methods</h2>
 <p>
-Long-read whole-genome sequencing was performed on 100 post-mortem brain
-samples (35 PD, 31 ILBD, 34 HC) with PacBio HiFi chemistry. Per-sample SV
-calls from multiple callers were merged into a joint callset; the
-high-confidence filtered catalog released in Supplementary Table 13
-(<tt>media-13.txt</tt>) of the Kim et al. 2026 preprint is used directly
-here. Per-cohort allele frequencies, Hardy-Weinberg statistics and case /
-control carrier rates are reported in the source table; the track exposes
-the allele counts and the case-control differential as filterable fields.
-The paper also integrates single-nucleus RNA-seq from two brain regions
-of the same donors to test SV-expression associations in specific cell
-types, but that layer is not shown in this track.
+Kim et al. 2026 performed PacBio HiFi long-read whole-genome sequencing on
+100 post-mortem cerebellum samples from the Arizona Study of Aging and
+Neurodegenerative Disorders / Brain and Body Donation Program cohort
+(35 Parkinson's disease, 31 incidental Lewy body disease, 34 healthy
+controls). gDNA was isolated with either the Qiagen DNeasy or PacBio
+Nanobind PanDNA kit, sheared on a Megaruptor 3 to 10-23.5 kb, built into
+SMRTbell libraries (Prep Kit 3.0) and sequenced on PacBio Revio (25M
+SMRT cells, 2-h pre-extension, 24-h movies) to ~17x per-sample coverage.
+Reads were processed with the Broad long-read WDL pipelines (CCS v6.2.0,
+pbmm2 v1.4.0 aligned to GRCh38, SAMtools v1.13 merge/sort) and an
+ensemble of three callers was run per sample: Sniffles2 v2.0.6,
+<a href="https://github.com/PacificBiosciences/pbsv" target="_blank">
+PBSV</a> v2.9.0 (with GRCh38 tandem-repeat context) and Cue2 v2.0.0
+(deep-learning image-based long-read caller). Per-caller VCFs were
+FILTER-PASS / &ge;40 bp filtered, split by SV type with BCFtools, and
+merged by type across the 100 individuals and across the three callers
+with <a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">
+SURVIVOR</a> v1.0.7 (1 kb distance, strand-match, min 50 bp). Centromere,
+reference-gap, segmental-duplication and sex-chromosome SVs were excluded.
+The high-confidence catalog contains 74,552 SVs (34,056 insertions,
+29,545 deletions, 9,707 duplications and 1,244 inversions) released in
+Supplementary Table 13 (<tt>media-13.txt</tt>), with per-cohort AF / AC /
+AN, Hardy-Weinberg statistics and case/control carrier differentials.
+</p>
+<p>
+The supplementary table <tt>media-13.txt</tt> was downloaded from the Kim
+et al. 2026 bioRxiv preprint (<a href="https://doi.org/10.64898/2026.03.20.713192" target="_blank">
+doi:10.64898/2026.03.20.713192</a>).
+</p>
+<p>
+The step-by-step build commands (download, TSV parsing, bigBed build) are
+recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
 programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
 track=<i>kwanhoSv</i>.
 </p>
 <p>
 The bigBed is available from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
 download server</a> as <tt>kwanho.bb</tt>. Example:
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/kwanho.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.