06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/tommoJpCnv.html src/hg/makeDb/trackDb/human/tommoJpCnv.html new file mode 100644 index 00000000000..86cb84c1167 --- /dev/null +++ src/hg/makeDb/trackDb/human/tommoJpCnv.html @@ -0,0 +1,110 @@ +<h2>Description</h2> +<p> +<b>This track is a short-read CNV comparator to the long-read +<a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese SVs</a> track</b>. It shows +copy number variation (CNV) frequency estimates from short-read +whole-genome sequencing of 48,874 Japanese individuals from the Tohoku +Medical Megabank Project (jMorp 48KJPN-CNV Frequency Panel, release +20230828). +</p> +<p> +The callset is binned at ~1 kb resolution. For each bin, the source +VCF reports how many of the 48,874 samples are at each observed +integer copy number (CN0 through CN5). In an autosomal region the +diploid reference state is CN=2; CN<2 indicates a copy-number loss +and CN>2 indicates a copy-number gain. +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +This track is a composite of two bigWig tracks displayed as a +two-color transparent overlay, showing, per 1 kb bin, the <b>absolute +number of samples</b> (out of 48,874) carrying: +<ul> +<li><span style="color: rgb(200,0,0);">Loss (CN<2)</span> - red</li> +<li><span style="color: rgb(0,160,0);">Gain (CN>2)</span> - green</li> +</ul> +Peaks in the overlay correspond to genomic regions where many samples +show CNVs. Bins where every sample was at CN=2 (no CNV observed) are +omitted from the tracks. +</p> +<p> +The default y-axis runs from 0 to ~1,000 carriers with auto-scale +enabled; the maximum supported value is 48,874 (every sample). Toggle +<i>Show subtrack colors on UI</i> to switch the subtrack visibility +individually. +</p> + +<h2>Methods</h2> +<p> +The ToMMo 48KJPN-CNV Frequency Panel is generated by short-read WGS of +48,874 Japanese individuals (blood buffy coat and saliva samples). Per the +jMorp data provider, the analysis runs on CRAM files produced for the +sibling 54KJPN-SNV/INDEL release: 200 samples per (sequencer, sequencing +institution) combination are used to build a Panel of Normals with the +<a href="https://gatk.broadinstitute.org/hc/en-us/articles/360035535892-GATK-CNV-germline-pipelines" target="_blank"> +GATK CNV Germline Cohort Workflow</a> on 1 kb intervals of the non-N +reference; the full cohort is then processed in 200-sample batches with +the matching Case Workflow, per-sample amplification / loss counts are +filtered by a 1.5×IQR outlier rule, and each surviving sample is tallied +per 1 kb bin at each integer copy-number state (CN0..CN5). The resulting +per-bin sample counts (SC) and frequencies (SF) are released as a VCF. For +display here, the per-CN counts are collapsed into two per-bin values +(samples with CN<2, samples with CN>2) and written as two bedGraphs +/ bigWigs; bins where every sample was CN=2 are omitted. 2,006,905 bins +with at least one carrier are kept across the genome. +</p> +<p> +The source VCF <tt>tommo-jcnvv1-20230828-GRCh38.vcf.gz</tt> was downloaded +from the +<a href="https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828" target="_blank"> +jMorp 48KJPN-CNV download page</a>. +</p> +<p> +The step-by-step build commands (download, VCF-to-bedGraph conversion, +bigWig build) are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. +</p> + +<h2>Data Access</h2> +<p> +The data can be explored interactively in table format with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed +programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, +track=<i>tommoJpCnv</i>. +</p> +<p> +The bigWigs are available from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our +download server</a> as <tt>tommoJpCnvLoss.bw</tt> and +<tt>tommoJpCnvGain.bw</tt>. Example: +<tt>bigWigAverageOverBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvLoss.bw regions.bed regions.tab</tt> +or +<tt>bigWigToWig http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvGain.bw -chrom=chr21 -start=0 -end=100000000 stdout</tt>. +</p> +<p> +The original VCF is available from the +<a href="https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828" target="_blank">jMorp +48KJPN-CNV download page</a> +(<tt>tommo-jcnvv1-20230828-GRCh38.vcf.gz</tt>). +</p> + +<h2>Credits</h2> +<p> +Thanks to the Tohoku Medical Megabank Organization (ToMMo) and the +jMorp team for releasing the 48KJPN-CNV Frequency Panel and its +detailed methodology. +</p> + +<h2>References</h2> +<p> +See the +<a href="https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jcnvv1-20230828" target="_blank">jMorp +48KJPN-CNV dataset page</a> for the official description. Earlier +ToMMo CNV releases are described in Tadaka et al.; see the dataset page +for the current citation list. +</p>