06a482a2120d4d85c7c34fb5038213e07f595554
max
  Tue Apr 21 15:00:21 2026 -0700
lrSv: add tommoJpCnv short-read CNV comparator (multiWig)

ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies
from short-read whole-genome sequencing of 48,874 Japanese individuals
(jMorp 20230828 release, GATK CNV germline workflow at 1 kb
resolution). Published as a companion short-read comparator to the
long-read tommoJpSv track.

Rendered as a multiWig container with two bigWig subtracks (transparent
overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and
tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are
absolute carrier counts out of 48,874. 2,006,905 bins with at least one
CNV carrier; bins that are wholly CN=2 are omitted.

Files:
- trackDb/human/lrSv.ra: new tommoJpCnv multiWig container
- trackDb/human/tommoJpCnv.html: new doc page
- trackDb/human/lrSv.html: summary-table row + per-track blurb
- scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs
- doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/tommoJpCnv.html src/hg/makeDb/trackDb/human/tommoJpCnv.html
new file mode 100644
index 00000000000..86cb84c1167
--- /dev/null
+++ src/hg/makeDb/trackDb/human/tommoJpCnv.html
@@ -0,0 +1,110 @@
+<h2>Description</h2>
+<p>
+<b>This track is a short-read CNV comparator to the long-read
+<a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese SVs</a> track</b>. It shows
+copy number variation (CNV) frequency estimates from short-read
+whole-genome sequencing of 48,874 Japanese individuals from the Tohoku
+Medical Megabank Project (jMorp 48KJPN-CNV Frequency Panel, release
+20230828).
+</p>
+<p>
+The callset is binned at ~1 kb resolution. For each bin, the source
+VCF reports how many of the 48,874 samples are at each observed
+integer copy number (CN0 through CN5). In an autosomal region the
+diploid reference state is CN=2; CN&lt;2 indicates a copy-number loss
+and CN&gt;2 indicates a copy-number gain.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+This track is a composite of two bigWig tracks displayed as a
+two-color transparent overlay, showing, per 1 kb bin, the <b>absolute
+number of samples</b> (out of 48,874) carrying:
+<ul>
+<li><span style="color: rgb(200,0,0);">Loss (CN&lt;2)</span> - red</li>
+<li><span style="color: rgb(0,160,0);">Gain (CN&gt;2)</span> - green</li>
+</ul>
+Peaks in the overlay correspond to genomic regions where many samples
+show CNVs. Bins where every sample was at CN=2 (no CNV observed) are
+omitted from the tracks.
+</p>
+<p>
+The default y-axis runs from 0 to ~1,000 carriers with auto-scale
+enabled; the maximum supported value is 48,874 (every sample). Toggle
+<i>Show subtrack colors on UI</i> to switch the subtrack visibility
+individually.
+</p>
+
+<h2>Methods</h2>
+<p>
+The ToMMo 48KJPN-CNV Frequency Panel is generated by short-read WGS of
+48,874 Japanese individuals (blood buffy coat and saliva samples). Per the
+jMorp data provider, the analysis runs on CRAM files produced for the
+sibling 54KJPN-SNV/INDEL release: 200 samples per (sequencer, sequencing
+institution) combination are used to build a Panel of Normals with the
+<a href="https://gatk.broadinstitute.org/hc/en-us/articles/360035535892-GATK-CNV-germline-pipelines" target="_blank">
+GATK CNV Germline Cohort Workflow</a> on 1 kb intervals of the non-N
+reference; the full cohort is then processed in 200-sample batches with
+the matching Case Workflow, per-sample amplification / loss counts are
+filtered by a 1.5&times;IQR outlier rule, and each surviving sample is tallied
+per 1 kb bin at each integer copy-number state (CN0..CN5). The resulting
+per-bin sample counts (SC) and frequencies (SF) are released as a VCF. For
+display here, the per-CN counts are collapsed into two per-bin values
+(samples with CN&lt;2, samples with CN&gt;2) and written as two bedGraphs
+/ bigWigs; bins where every sample was CN=2 are omitted. 2,006,905 bins
+with at least one carrier are kept across the genome.
+</p>
+<p>
+The source VCF <tt>tommo-jcnvv1-20230828-GRCh38.vcf.gz</tt> was downloaded
+from the
+<a href="https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828" target="_blank">
+jMorp 48KJPN-CNV download page</a>.
+</p>
+<p>
+The step-by-step build commands (download, VCF-to-bedGraph conversion,
+bigWig build) are recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively in table format with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
+programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
+track=<i>tommoJpCnv</i>.
+</p>
+<p>
+The bigWigs are available from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
+download server</a> as <tt>tommoJpCnvLoss.bw</tt> and
+<tt>tommoJpCnvGain.bw</tt>. Example:
+<tt>bigWigAverageOverBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvLoss.bw regions.bed regions.tab</tt>
+or
+<tt>bigWigToWig http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvGain.bw -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
+</p>
+<p>
+The original VCF is available from the
+<a href="https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828" target="_blank">jMorp
+48KJPN-CNV download page</a>
+(<tt>tommo-jcnvv1-20230828-GRCh38.vcf.gz</tt>).
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to the Tohoku Medical Megabank Organization (ToMMo) and the
+jMorp team for releasing the 48KJPN-CNV Frequency Panel and its
+detailed methodology.
+</p>
+
+<h2>References</h2>
+<p>
+See the
+<a href="https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jcnvv1-20230828" target="_blank">jMorp
+48KJPN-CNV dataset page</a> for the official description. Earlier
+ToMMo CNV releases are described in Tadaka et al.; see the dataset page
+for the current citation list.
+</p>