06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/tommoJpCnv.html src/hg/makeDb/trackDb/human/tommoJpCnv.html new file mode 100644 index 00000000000..86cb84c1167 --- /dev/null +++ src/hg/makeDb/trackDb/human/tommoJpCnv.html @@ -0,0 +1,110 @@ +

Description

+

+This track is a short-read CNV comparator to the long-read +ToMMo Japanese SVs track. It shows +copy number variation (CNV) frequency estimates from short-read +whole-genome sequencing of 48,874 Japanese individuals from the Tohoku +Medical Megabank Project (jMorp 48KJPN-CNV Frequency Panel, release +20230828). +

+

+The callset is binned at ~1 kb resolution. For each bin, the source +VCF reports how many of the 48,874 samples are at each observed +integer copy number (CN0 through CN5). In an autosomal region the +diploid reference state is CN=2; CN<2 indicates a copy-number loss +and CN>2 indicates a copy-number gain. +

+ +

Display Conventions and Configuration

+

+This track is a composite of two bigWig tracks displayed as a +two-color transparent overlay, showing, per 1 kb bin, the absolute +number of samples (out of 48,874) carrying: +

+Peaks in the overlay correspond to genomic regions where many samples +show CNVs. Bins where every sample was at CN=2 (no CNV observed) are +omitted from the tracks. +

+

+The default y-axis runs from 0 to ~1,000 carriers with auto-scale +enabled; the maximum supported value is 48,874 (every sample). Toggle +Show subtrack colors on UI to switch the subtrack visibility +individually. +

+ +

Methods

+

+The ToMMo 48KJPN-CNV Frequency Panel is generated by short-read WGS of +48,874 Japanese individuals (blood buffy coat and saliva samples). Per the +jMorp data provider, the analysis runs on CRAM files produced for the +sibling 54KJPN-SNV/INDEL release: 200 samples per (sequencer, sequencing +institution) combination are used to build a Panel of Normals with the + +GATK CNV Germline Cohort Workflow on 1 kb intervals of the non-N +reference; the full cohort is then processed in 200-sample batches with +the matching Case Workflow, per-sample amplification / loss counts are +filtered by a 1.5×IQR outlier rule, and each surviving sample is tallied +per 1 kb bin at each integer copy-number state (CN0..CN5). The resulting +per-bin sample counts (SC) and frequencies (SF) are released as a VCF. For +display here, the per-CN counts are collapsed into two per-bin values +(samples with CN<2, samples with CN>2) and written as two bedGraphs +/ bigWigs; bins where every sample was CN=2 are omitted. 2,006,905 bins +with at least one carrier are kept across the genome. +

+

+The source VCF tommo-jcnvv1-20230828-GRCh38.vcf.gz was downloaded +from the + +jMorp 48KJPN-CNV download page. +

+

+The step-by-step build commands (download, VCF-to-bedGraph conversion, +bigWig build) are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts live in + +makeDb/scripts/lrSv. +

+ +

Data Access

+

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=tommoJpCnv. +

+

+The bigWigs are available from +our +download server as tommoJpCnvLoss.bw and +tommoJpCnvGain.bw. Example: +bigWigAverageOverBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvLoss.bw regions.bed regions.tab +or +bigWigToWig http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/tommoJpCnvGain.bw -chrom=chr21 -start=0 -end=100000000 stdout. +

+

+The original VCF is available from the +jMorp +48KJPN-CNV download page +(tommo-jcnvv1-20230828-GRCh38.vcf.gz). +

+ +

Credits

+

+Thanks to the Tohoku Medical Megabank Organization (ToMMo) and the +jMorp team for releasing the 48KJPN-CNV Frequency Panel and its +detailed methodology. +

+ +

References

+

+See the +jMorp +48KJPN-CNV dataset page for the official description. Earlier +ToMMo CNV releases are described in Tadaka et al.; see the dataset page +for the current citation list. +