06a482a2120d4d85c7c34fb5038213e07f595554
max
Tue Apr 21 15:00:21 2026 -0700
lrSv: add tommoJpCnv short-read CNV comparator (multiWig)
ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies
from short-read whole-genome sequencing of 48,874 Japanese individuals
(jMorp 20230828 release, GATK CNV germline workflow at 1 kb
resolution). Published as a companion short-read comparator to the
long-read tommoJpSv track.
Rendered as a multiWig container with two bigWig subtracks (transparent
overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and
tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are
absolute carrier counts out of 48,874. 2,006,905 bins with at least one
CNV carrier; bins that are wholly CN=2 are omitted.
Files:
- trackDb/human/lrSv.ra: new tommoJpCnv multiWig container
- trackDb/human/tommoJpCnv.html: new doc page
- trackDb/human/lrSv.html: summary-table row + per-track blurb
- scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs
- doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps
refs #36258
Co-Authored-By: Claude Opus 4.7 (1M context)
+This track is a short-read CNV comparator to the long-read
+ToMMo Japanese SVs track. It shows
+copy number variation (CNV) frequency estimates from short-read
+whole-genome sequencing of 48,874 Japanese individuals from the Tohoku
+Medical Megabank Project (jMorp 48KJPN-CNV Frequency Panel, release
+20230828).
+
+The callset is binned at ~1 kb resolution. For each bin, the source
+VCF reports how many of the 48,874 samples are at each observed
+integer copy number (CN0 through CN5). In an autosomal region the
+diploid reference state is CN=2; CN<2 indicates a copy-number loss
+and CN>2 indicates a copy-number gain.
+
+This track is a composite of two bigWig tracks displayed as a
+two-color transparent overlay, showing, per 1 kb bin, the absolute
+number of samples (out of 48,874) carrying:
+Description
+Display Conventions and Configuration
+
+
+Peaks in the overlay correspond to genomic regions where many samples
+show CNVs. Bins where every sample was at CN=2 (no CNV observed) are
+omitted from the tracks.
+
+The default y-axis runs from 0 to ~1,000 carriers with auto-scale +enabled; the maximum supported value is 48,874 (every sample). Toggle +Show subtrack colors on UI to switch the subtrack visibility +individually. +
+ ++The ToMMo 48KJPN-CNV Frequency Panel is generated by short-read WGS of +48,874 Japanese individuals (blood buffy coat and saliva samples). Per the +jMorp data provider, the analysis runs on CRAM files produced for the +sibling 54KJPN-SNV/INDEL release: 200 samples per (sequencer, sequencing +institution) combination are used to build a Panel of Normals with the + +GATK CNV Germline Cohort Workflow on 1 kb intervals of the non-N +reference; the full cohort is then processed in 200-sample batches with +the matching Case Workflow, per-sample amplification / loss counts are +filtered by a 1.5×IQR outlier rule, and each surviving sample is tallied +per 1 kb bin at each integer copy-number state (CN0..CN5). The resulting +per-bin sample counts (SC) and frequencies (SF) are released as a VCF. For +display here, the per-CN counts are collapsed into two per-bin values +(samples with CN<2, samples with CN>2) and written as two bedGraphs +/ bigWigs; bins where every sample was CN=2 are omitted. 2,006,905 bins +with at least one carrier are kept across the genome. +
++The source VCF tommo-jcnvv1-20230828-GRCh38.vcf.gz was downloaded +from the + +jMorp 48KJPN-CNV download page. +
++The step-by-step build commands (download, VCF-to-bedGraph conversion, +bigWig build) are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts live in + +makeDb/scripts/lrSv. +
+ ++The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=tommoJpCnv. +
++The bigWigs are available from +our +download server as tommoJpCnvLoss.bw and +tommoJpCnvGain.bw. Example: +bigWigAverageOverBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvLoss.bw regions.bed regions.tab +or +bigWigToWig http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvGain.bw -chrom=chr21 -start=0 -end=100000000 stdout. +
++The original VCF is available from the +jMorp +48KJPN-CNV download page +(tommo-jcnvv1-20230828-GRCh38.vcf.gz). +
+ ++Thanks to the Tohoku Medical Megabank Organization (ToMMo) and the +jMorp team for releasing the 48KJPN-CNV Frequency Panel and its +detailed methodology. +
+ ++See the +jMorp +48KJPN-CNV dataset page for the official description. Earlier +ToMMo CNV releases are described in Tadaka et al.; see the dataset page +for the current citation list. +