06a482a2120d4d85c7c34fb5038213e07f595554
max
  Tue Apr 21 15:00:21 2026 -0700
lrSv: add tommoJpCnv short-read CNV comparator (multiWig)

ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies
from short-read whole-genome sequencing of 48,874 Japanese individuals
(jMorp 20230828 release, GATK CNV germline workflow at 1 kb
resolution). Published as a companion short-read comparator to the
long-read tommoJpSv track.

Rendered as a multiWig container with two bigWig subtracks (transparent
overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and
tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are
absolute carrier counts out of 48,874. 2,006,905 bins with at least one
CNV carrier; bins that are wholly CN=2 are omitted.

Files:
- trackDb/human/lrSv.ra: new tommoJpCnv multiWig container
- trackDb/human/tommoJpCnv.html: new doc page
- trackDb/human/lrSv.html: summary-table row + per-track blurb
- scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs
- doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html
index 1eec6373f17..ebaced9d96e 100644
--- src/hg/makeDb/trackDb/human/lrSv.html
+++ src/hg/makeDb/trackDb/human/lrSv.html
@@ -67,30 +67,37 @@
   <td>148,375</td>
   <td>2</td>
   <td>177</td>
   <td>49,171</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese</a></td>
   <td>333 (111 trios)</td>
   <td>Japanese, general population</td>
   <td>ONT</td>
   <td>74,201</td>
   <td>51</td>
   <td>162</td>
   <td>99,980</td>
 </tr>
+<tr>
+  <td><a href="hgTrackUi?g=tommoJpCnv">ToMMo 48K CNV</a></td>
+  <td>48,874</td>
+  <td>Japanese, general population (<b>short-read comparator</b> for ToMMo long-read SVs)</td>
+  <td><b>Illumina short-read</b> (GATK CNV, 1 kb bins, shown as two bigWigs)</td>
+  <td colspan="4">~2M bins with CNV carriers; not comparable to per-SV counts above</td>
+</tr>
 <tr>
   <td><a href="hgTrackUi?g=aou1kSv">AoU 1K</a></td>
   <td>1,027</td>
   <td>All of Us, self-identified Black/African American</td>
   <td>PacBio HiFi</td>
   <td>541,049</td>
   <td>50</td>
   <td>152</td>
   <td>9,998</td>
 </tr>
 <tr>
   <td><a href="hgTrackUi?g=ga4kSv">GA4K</a></td>
   <td>502</td>
   <td>Children's Mercy, pediatric rare disease probands + families</td>
   <td>PacBio HiFi</td>
@@ -182,48 +189,61 @@
 
 <h3>Han 945 SVs (<a href="hgTrackUi?g=han945Sv">han945Sv</a>)</h3>
 <p>
 Structural variants from 945 Han Chinese individuals. 111,288 SVs
 (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR.
 Includes allele frequencies and per-sample support.
 </p>
 
 <h3>1KG ONT 100 SVs (<a href="hgTrackUi?g=gustafsonSv">gustafsonSv</a>)</h3>
 <p>
 Structural variants from Oxford Nanopore long-read sequencing of 100
 1000 Genomes samples (5 superpopulations, 19 subpopulations) released
 by the 1000 Genomes ONT Sequencing Consortium and described in
 Gustafson et al. 2024. 113,696 SVs (insertions, deletions, duplications,
 inversions) called with five callers and merged with Jasmine. This is a
-separate dataset from the Vienna 1KG-ONT release below.
+separate dataset from the Vienna 1KG-ONT release below; the 100 samples
+here do not overlap with the 1,019 samples in the Vienna release.
 </p>
 
 <h3>1KG ONT Vienna SVs (<a href="hgTrackUi?g=lrSv1kgOnt">lrSv1kgOnt</a>)</h3>
 <p>
 Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT).
 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism
 of origin (mobile elements, VNTRs, processed pseudogenes, etc.).
 Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver.
+This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above;
+the 1,019 samples here do not overlap with the 100 samples in that release.
 </p>
 
 <h3>ToMMo Japanese SVs (<a href="hgTrackUi?g=tommoJpSv">tommoJpSv</a>)</h3>
 <p>
 Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical
 Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian
 error rates and allele frequencies.
 </p>
 
+<h3>ToMMo 48K CNV SR (<a href="hgTrackUi?g=tommoJpCnv">tommoJpCnv</a>) - short-read comparator</h3>
+<p>
+<b>Short-read CNV comparator for the ToMMo long-read SV track above.</b>
+Per-1 kb-bin copy-number carrier counts from short-read whole-genome
+sequencing of 48,874 Japanese individuals (jMorp 48KJPN-CNV Frequency
+Panel, release 20230828), called with GATK CNV germline workflows.
+Shown as a multiWig overlay: red = samples with copy-number loss
+(CN&lt;2) per bin, green = samples with gain (CN&gt;2) per bin.
+</p>
+
 <h3>AoU 1K SVs (<a href="hgTrackUi?g=aou1kSv">aou1kSv</a>)</h3>
 <p>
 Structural variants from 1,027 individuals from the All of Us (AoU) Research Program,
 sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions)
 with population-specific allele frequencies, gene annotations, and clinical
 trait associations.
 </p>
 
 <h3>GA4K SVs (<a href="hgTrackUi?g=ga4kSv">ga4kSv</a>)</h3>
 <p>
 Structural variants from 502 probands and family members enrolled in the
 Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's
 Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554
 replicated SVs (deletions, insertions, duplications, inversions) called with
 pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs