06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt index 918b750b12e..8f373210431 100644 --- src/hg/makeDb/doc/hg38/lrSv.txt +++ src/hg/makeDb/doc/hg38/lrSv.txt @@ -98,30 +98,56 @@ # VCF downloaded from jMorp: # https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jsv1-20211208-af # File: tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz # 74,201 SVs: 37,981 DEL, 36,220 INS # Site-only VCF, merged with SURVIVOR v1.0.6 # Native GRCh38 coordinates (confirmed via contig headers) # Trio-based: 111 families, includes Mendelian error rates # Convert VCF to BED and build bigBed python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpVcfToBed.py \ tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz tommoJp.bed bedSort tommoJp.bed tommoJp.sorted.bed bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJp.as \ -tab tommoJp.sorted.bed /hive/data/genomes/hg38/chrom.sizes tommoJp.bb +########## +# 2026-04-21 Claude max + +# ToMMo 48KJPN CNV Frequency Panel - short-read CNV comparator to the +# long-read tommoJpSv track above. 48,874 Japanese individuals, +# short-read WGS, GATK CNV germline workflow at 1 kb bin resolution. +# Data page: https://jmorp.megabank.tohoku.ac.jp/downloads/tommo-jcnvv1-20230828 + +mkdir -p /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv +cd /hive/data/genomes/hg38/bed/lrSv/tommoJpCnv +wget https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jcnvv1-20230828/files/tommo-jcnvv1-20230828-GRCh38.vcf.gz + +# The VCF has one record per 1 kb non-N bin with per-ALT sample counts +# (SC) for each observed CN state (CN0..CN5). The converter collapses +# the per-CN counts into two per-bin values (samples with CN<2, samples +# with CN>2) and writes two bedGraphs, skipping bins with no CNV +# carrier. Displayed as a multiWig transparent overlay via trackDb +# (loss red / gain green) so CNV carrier-count density is visible at +# any zoom level. 2,006,905 bins kept. +python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py \ + tommo-jcnvv1-20230828-GRCh38.vcf.gz tommoJpCnvLoss.bg tommoJpCnvGain.bg +sort -k1,1 -k2,2n tommoJpCnvLoss.bg > tommoJpCnvLoss.sorted.bg +sort -k1,1 -k2,2n tommoJpCnvGain.bg > tommoJpCnvGain.sorted.bg +bedGraphToBigWig tommoJpCnvLoss.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvLoss.bw +bedGraphToBigWig tommoJpCnvGain.sorted.bg /hive/data/genomes/hg38/chrom.sizes tommoJpCnvGain.bw + ########## # 2026-03-26 Claude max # Fourth subtrack: AoU 1K - SVs from 1,027 AoU individuals (PacBio HiFi) # Paper: Garimella et al. 2025, medRxiv, PMID 41256123 # Data: Supplementary media-2 from preprint mkdir -p /hive/data/genomes/hg38/bed/lrSv/aou1k cd /hive/data/genomes/hg38/bed/lrSv/aou1k # Downloaded supplementary CSV from preprint (media-2.gz) # 541,049 SVs: 444,524 INS, 96,525 DEL (autosomes only) # Population-specific AFs (AFR, AMR, EAS, EUR, SAS) # Gene annotations (OMIM, disease, cancer, ACMG), regulatory elements # eQTL, GWAS, and SV-trait associations