06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/lrSv.html src/hg/makeDb/trackDb/human/lrSv.html index 1eec6373f17..ebaced9d96e 100644 --- src/hg/makeDb/trackDb/human/lrSv.html +++ src/hg/makeDb/trackDb/human/lrSv.html @@ -67,30 +67,37 @@ <td>148,375</td> <td>2</td> <td>177</td> <td>49,171</td> </tr> <tr> <td><a href="hgTrackUi?g=tommoJpSv">ToMMo Japanese</a></td> <td>333 (111 trios)</td> <td>Japanese, general population</td> <td>ONT</td> <td>74,201</td> <td>51</td> <td>162</td> <td>99,980</td> </tr> +<tr> + <td><a href="hgTrackUi?g=tommoJpCnv">ToMMo 48K CNV</a></td> + <td>48,874</td> + <td>Japanese, general population (<b>short-read comparator</b> for ToMMo long-read SVs)</td> + <td><b>Illumina short-read</b> (GATK CNV, 1 kb bins, shown as two bigWigs)</td> + <td colspan="4">~2M bins with CNV carriers; not comparable to per-SV counts above</td> +</tr> <tr> <td><a href="hgTrackUi?g=aou1kSv">AoU 1K</a></td> <td>1,027</td> <td>All of Us, self-identified Black/African American</td> <td>PacBio HiFi</td> <td>541,049</td> <td>50</td> <td>152</td> <td>9,998</td> </tr> <tr> <td><a href="hgTrackUi?g=ga4kSv">GA4K</a></td> <td>502</td> <td>Children's Mercy, pediatric rare disease probands + families</td> <td>PacBio HiFi</td> @@ -182,48 +189,61 @@ <h3>Han 945 SVs (<a href="hgTrackUi?g=han945Sv">han945Sv</a>)</h3> <p> Structural variants from 945 Han Chinese individuals. 111,288 SVs (deletions, insertions, duplications, inversions, translocations) merged with SURVIVOR. Includes allele frequencies and per-sample support. </p> <h3>1KG ONT 100 SVs (<a href="hgTrackUi?g=gustafsonSv">gustafsonSv</a>)</h3> <p> Structural variants from Oxford Nanopore long-read sequencing of 100 1000 Genomes samples (5 superpopulations, 19 subpopulations) released by the 1000 Genomes ONT Sequencing Consortium and described in Gustafson et al. 2024. 113,696 SVs (insertions, deletions, duplications, inversions) called with five callers and merged with Jasmine. This is a -separate dataset from the Vienna 1KG-ONT release below. +separate dataset from the Vienna 1KG-ONT release below; the 100 samples +here do not overlap with the 1,019 samples in the Vienna release. </p> <h3>1KG ONT Vienna SVs (<a href="hgTrackUi?g=lrSv1kgOnt">lrSv1kgOnt</a>)</h3> <p> Structural variants from 1,019 individuals across 26 populations (1000 Genomes ONT). 161,332 SVs annotated with SVAN, classifying insertions and deletions by mechanism of origin (mobile elements, VNTRs, processed pseudogenes, etc.). Original coordinates are on T2T-CHM13 (hs1); the hg38 version was created via liftOver. +This is a separate dataset from the 1KG ONT 100 (Gustafson et al.) track above; +the 1,019 samples here do not overlap with the 100 samples in that release. </p> <h3>ToMMo Japanese SVs (<a href="hgTrackUi?g=tommoJpSv">tommoJpSv</a>)</h3> <p> Structural variants from 333 Japanese individuals (111 trios) from the Tohoku Medical Megabank (ToMMo). 74,201 SVs (deletions and insertions) with trio-based Mendelian error rates and allele frequencies. </p> +<h3>ToMMo 48K CNV SR (<a href="hgTrackUi?g=tommoJpCnv">tommoJpCnv</a>) - short-read comparator</h3> +<p> +<b>Short-read CNV comparator for the ToMMo long-read SV track above.</b> +Per-1 kb-bin copy-number carrier counts from short-read whole-genome +sequencing of 48,874 Japanese individuals (jMorp 48KJPN-CNV Frequency +Panel, release 20230828), called with GATK CNV germline workflows. +Shown as a multiWig overlay: red = samples with copy-number loss +(CN<2) per bin, green = samples with gain (CN>2) per bin. +</p> + <h3>AoU 1K SVs (<a href="hgTrackUi?g=aou1kSv">aou1kSv</a>)</h3> <p> Structural variants from 1,027 individuals from the All of Us (AoU) Research Program, sequenced with PacBio HiFi long reads. 541,049 SVs (insertions and deletions) with population-specific allele frequencies, gene annotations, and clinical trait associations. </p> <h3>GA4K SVs (<a href="hgTrackUi?g=ga4kSv">ga4kSv</a>)</h3> <p> Structural variants from 502 probands and family members enrolled in the Genomic Answers for Kids (GA4K) pediatric rare-disease program at Children's Mercy Research Institute, sequenced with PacBio HiFi long reads. 115,554 replicated SVs (deletions, insertions, duplications, inversions) called with pbsv and merged with JASMINE. The matched GA4K small-variant callset (SNVs