06a482a2120d4d85c7c34fb5038213e07f595554
max
  Tue Apr 21 15:00:21 2026 -0700
lrSv: add tommoJpCnv short-read CNV comparator (multiWig)

ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies
from short-read whole-genome sequencing of 48,874 Japanese individuals
(jMorp 20230828 release, GATK CNV germline workflow at 1 kb
resolution). Published as a companion short-read comparator to the
long-read tommoJpSv track.

Rendered as a multiWig container with two bigWig subtracks (transparent
overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and
tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are
absolute carrier counts out of 48,874. 2,006,905 bins with at least one
CNV carrier; bins that are wholly CN=2 are omitted.

Files:
- trackDb/human/lrSv.ra: new tommoJpCnv multiWig container
- trackDb/human/tommoJpCnv.html: new doc page
- trackDb/human/lrSv.html: summary-table row + per-track blurb
- scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs
- doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/tommoJpSv.html src/hg/makeDb/trackDb/human/tommoJpSv.html
index 10015b98804..10c3117337e 100644
--- src/hg/makeDb/trackDb/human/tommoJpSv.html
+++ src/hg/makeDb/trackDb/human/tommoJpSv.html
@@ -1,81 +1,103 @@
 <h2>Description</h2>
 <p>
 This track shows structural variants (SVs) identified by Oxford Nanopore long-read
 sequencing of 333 Japanese individuals from the Tohoku Medical Megabank (ToMMo)
 project. The 333 individuals form 111 parent-offspring trios, enabling
 Mendelian consistency checks on the SV calls. Activated T lymphocytes were used
 as a source of high-molecular-weight DNA for nanopore sequencing at a median
 coverage of 22.2x with an N50 read length of 25.8 kb.
 </p>
 <p>
 The dataset contains 74,201 SVs (37,981 deletions and 36,220 insertions),
 merged across individuals using SURVIVOR v1.0.6. Over 95% of the SVs are
 concordant with Mendelian inheritance in the trio families.
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Items are colored by SV type:
 <ul>
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 </ul>
 </p>
 <p>
 Filters are available for SV type, SV length, and allele frequency.
 For insertions, the item is placed at the insertion site with a width of 1 bp;
 for deletions, the item spans the deleted region.
 </p>
 <p>
 The detail page for each item shows:
 <ul>
 <li><b>Allele Frequency</b>: fraction of alleles carrying this variant
 (based on 444 alleles from 222 unrelated parents)</li>
 <li><b>Allele Count / Allele Number</b>: number of variant alleles and
 total alleles genotyped</li>
 <li><b>Mendelian Error Rate</b>: fraction of trio families showing
 inheritance errors for this variant</li>
 <li><b>Families with Errors / Families Genotyped</b>: number of families
 with Mendelian errors and total families with complete genotype calls</li>
 </ul>
 </p>
 
 <h2>Methods</h2>
 <p>
-Oxford Nanopore sequencing was performed on genomic DNA extracted from activated
-T lymphocytes of 333 individuals (111 trios) from the Tohoku Medical Megabank
-(ToMMo) cohort. SV calling was performed with Sniffles on each sample, and
-calls were merged across individuals with SURVIVOR v1.0.6 using a maximum
-distance of 1 kbp. Allele frequencies were computed from 222 unrelated parents
-(excluding offspring to avoid double-counting). Mendelian error rates were
-calculated by checking transmission consistency within each trio family.
+Otsuki et al. 2022 extracted high-molecular-weight genomic DNA from activated
+T lymphocytes of 333 individuals (111 parent-offspring trios) from the Tohoku
+Medical Megabank (ToMMo) BirThree cohort and performed Oxford Nanopore
+whole-genome sequencing on PromethION instruments with R9.4.1 flow cells
+(SQK-LSK109 libraries, Guppy v4.2.2 high-accuracy base-calling). After QC,
+median per-sample sequencing coverage was 22.2x with a read N50 of 25.8 kb.
+Reads were aligned to GRCh38 with LRA, SVs were called per sample with
+<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">CuteSV</a>
+v1.0.9 (<tt>-min_sv_length 50</tt>), and per-sample calls were merged with
+<a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a>
+v1.0.6 (1000 bp distance, type-match, no length-match) into a nonredundant
+panel of 74,201 autosomal SVs (37,981 deletions and 36,220 insertions).
+Over 95% of the SVs were concordant with Mendelian inheritance in the 111
+trio families; allele frequencies in this track are computed from the 222
+unrelated parents to avoid double-counting.
+</p>
+<p>
+The site-only VCF <tt>tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz</tt>
+was downloaded from the jMorp JSV1 dataset page,
+<a href="https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jsv1-20211208-af" target="_blank">
+tommo-jsv1-20211208-af</a>.
+</p>
+<p>
+The step-by-step build commands (download, format conversion, bigBed build)
+are recorded in the UCSC makeDoc for this track container:
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank">
+doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in
+<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank">
+makeDb/scripts/lrSv</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 Source data is available from the
 <a href="https://jmorp.megabank.tohoku.ac.jp/downloads"
    target="_blank">jMorp downloads page</a> (ToMMo Japanese Multi Omics Reference Panel).
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the Tohoku Medical Megabank Organization for making their structural
 variant calls publicly available through the jMorp data portal.
 </p>
 
 <h2>References</h2>
 
 
 
 <p>
 Otsuki A, Okamura Y, Ishida N, Tadaka S, Takayama J, Kumada K, Kawashima J, Taguchi K, Minegishi N,
 Kuriyama S <em>et al</em>.
 <a href="https://doi.org/10.1038/s42003-022-03953-1" target="_blank">
 Construction of a trio-based structural variation panel utilizing activated T lymphocytes and long-
 read sequencing technology</a>.
 <em>Commun Biol</em>. 2022 Sep 20;5(1):991.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36127505" target="_blank">36127505</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9489684/" target="_blank">PMC9489684</a>
 </p>