06a482a2120d4d85c7c34fb5038213e07f595554
max
  Tue Apr 21 15:00:21 2026 -0700
lrSv: add tommoJpCnv short-read CNV comparator (multiWig)

ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies
from short-read whole-genome sequencing of 48,874 Japanese individuals
(jMorp 20230828 release, GATK CNV germline workflow at 1 kb
resolution). Published as a companion short-read comparator to the
long-read tommoJpSv track.

Rendered as a multiWig container with two bigWig subtracks (transparent
overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and
tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are
absolute carrier counts out of 48,874. 2,006,905 bins with at least one
CNV carrier; bins that are wholly CN=2 are omitted.

Files:
- trackDb/human/lrSv.ra: new tommoJpCnv multiWig container
- trackDb/human/tommoJpCnv.html: new doc page
- trackDb/human/lrSv.html: summary-table row + per-track blurb
- scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs
- doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index 918b750b12e..8f373210431 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -98,30 +98,56 @@
 # VCF downloaded from jMorp:
 # https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jsv1-20211208-af
 # File: tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz
 # 74,201 SVs: 37,981 DEL, 36,220 INS
 # Site-only VCF, merged with SURVIVOR v1.0.6
 # Native GRCh38 coordinates (confirmed via contig headers)
 # Trio-based: 111 families, includes Mendelian error rates
 
 # Convert VCF to BED and build bigBed
 python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpVcfToBed.py \
     tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz tommoJp.bed
 bedSort tommoJp.bed tommoJp.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJp.as \
     -tab tommoJp.sorted.bed /hive/data/genomes/hg38/chrom.sizes tommoJp.bb
 
+##########
+# 2026-04-21 Claude max
+
+# ToMMo 48KJPN CNV Frequency Panel - short-read CNV comparator to the
+# long-read tommoJpSv track above. 48,874 Japanese individuals,
+# short-read WGS, GATK CNV germline workflow at 1 kb bin resolution.
+# Data page: https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828
+
+mkdir -p /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv
+cd /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv
+wget https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jcnvv1-20230828/files/tommo-jcnvv1-20230828-GRCh38.vcf.gz
+
+# The VCF has one record per 1 kb non-N bin with per-ALT sample counts
+# (SC) for each observed CN state (CN0..CN5). The converter collapses
+# the per-CN counts into two per-bin values (samples with CN<2, samples
+# with CN>2) and writes two bedGraphs, skipping bins with no CNV
+# carrier. Displayed as a multiWig transparent overlay via trackDb
+# (loss red / gain green) so CNV carrier-count density is visible at
+# any zoom level. 2,006,905 bins kept.
+python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py \
+    tommo-jcnvv1-20230828-GRCh38.vcf.gz tommoJpCnvLoss.bg tommoJpCnvGain.bg
+sort -k1,1 -k2,2n tommoJpCnvLoss.bg > tommoJpCnvLoss.sorted.bg
+sort -k1,1 -k2,2n tommoJpCnvGain.bg > tommoJpCnvGain.sorted.bg
+bedGraphToBigWig tommoJpCnvLoss.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvLoss.bw
+bedGraphToBigWig tommoJpCnvGain.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvGain.bw
+
 ##########
 # 2026-03-26 Claude max
 
 # Fourth subtrack: AoU 1K - SVs from 1,027 AoU individuals (PacBio HiFi)
 # Paper: Garimella et al. 2025, medRxiv, PMID 41256123
 # Data: Supplementary media-2 from preprint
 
 mkdir -p /hive/data/genomes/hg38/bed/lrSv/aou1k
 cd /hive/data/genomes/hg38/bed/lrSv/aou1k
 
 # Downloaded supplementary CSV from preprint (media-2.gz)
 # 541,049 SVs: 444,524 INS, 96,525 DEL (autosomes only)
 # Population-specific AFs (AFR, AMR, EAS, EUR, SAS)
 # Gene annotations (OMIM, disease, cancer, ACMG), regulatory elements
 # eQTL, GWAS, and SV-trait associations