06a482a2120d4d85c7c34fb5038213e07f595554 max Tue Apr 21 15:00:21 2026 -0700 lrSv: add tommoJpCnv short-read CNV comparator (multiWig) ToMMo 48KJPN-CNV Frequency Panel: copy-number variation frequencies from short-read whole-genome sequencing of 48,874 Japanese individuals (jMorp 20230828 release, GATK CNV germline workflow at 1 kb resolution). Published as a companion short-read comparator to the long-read tommoJpSv track. Rendered as a multiWig container with two bigWig subtracks (transparent overlay): tommoJpCnvLoss.bw counts samples at CN<2 per bin (red) and tommoJpCnvGain.bw counts samples at CN>2 per bin (green). Values are absolute carrier counts out of 48,874. 2,006,905 bins with at least one CNV carrier; bins that are wholly CN=2 are omitted. Files: - trackDb/human/lrSv.ra: new tommoJpCnv multiWig container - trackDb/human/tommoJpCnv.html: new doc page - trackDb/human/lrSv.html: summary-table row + per-track blurb - scripts/lrSv/lrSvTommoJpCnvVcfToBedGraph.py: VCF -> two bedGraphs - doc/hg38/lrSv.txt: wget, converter invocation, bigWig build steps refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/tommoJpSv.html src/hg/makeDb/trackDb/human/tommoJpSv.html index 10015b98804..10c3117337e 100644 --- src/hg/makeDb/trackDb/human/tommoJpSv.html +++ src/hg/makeDb/trackDb/human/tommoJpSv.html @@ -1,81 +1,103 @@ <h2>Description</h2> <p> This track shows structural variants (SVs) identified by Oxford Nanopore long-read sequencing of 333 Japanese individuals from the Tohoku Medical Megabank (ToMMo) project. The 333 individuals form 111 parent-offspring trios, enabling Mendelian consistency checks on the SV calls. Activated T lymphocytes were used as a source of high-molecular-weight DNA for nanopore sequencing at a median coverage of 22.2x with an N50 read length of 25.8 kb. </p> <p> The dataset contains 74,201 SVs (37,981 deletions and 36,220 insertions), merged across individuals using SURVIVOR v1.0.6. Over 95% of the SVs are concordant with Mendelian inheritance in the trio families. </p> <h2>Display Conventions and Configuration</h2> <p> Items are colored by SV type: <ul> <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> </ul> </p> <p> Filters are available for SV type, SV length, and allele frequency. For insertions, the item is placed at the insertion site with a width of 1 bp; for deletions, the item spans the deleted region. </p> <p> The detail page for each item shows: <ul> <li><b>Allele Frequency</b>: fraction of alleles carrying this variant (based on 444 alleles from 222 unrelated parents)</li> <li><b>Allele Count / Allele Number</b>: number of variant alleles and total alleles genotyped</li> <li><b>Mendelian Error Rate</b>: fraction of trio families showing inheritance errors for this variant</li> <li><b>Families with Errors / Families Genotyped</b>: number of families with Mendelian errors and total families with complete genotype calls</li> </ul> </p> <h2>Methods</h2> <p> -Oxford Nanopore sequencing was performed on genomic DNA extracted from activated -T lymphocytes of 333 individuals (111 trios) from the Tohoku Medical Megabank -(ToMMo) cohort. SV calling was performed with Sniffles on each sample, and -calls were merged across individuals with SURVIVOR v1.0.6 using a maximum -distance of 1 kbp. Allele frequencies were computed from 222 unrelated parents -(excluding offspring to avoid double-counting). Mendelian error rates were -calculated by checking transmission consistency within each trio family. +Otsuki et al. 2022 extracted high-molecular-weight genomic DNA from activated +T lymphocytes of 333 individuals (111 parent-offspring trios) from the Tohoku +Medical Megabank (ToMMo) BirThree cohort and performed Oxford Nanopore +whole-genome sequencing on PromethION instruments with R9.4.1 flow cells +(SQK-LSK109 libraries, Guppy v4.2.2 high-accuracy base-calling). After QC, +median per-sample sequencing coverage was 22.2x with a read N50 of 25.8 kb. +Reads were aligned to GRCh38 with LRA, SVs were called per sample with +<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">CuteSV</a> +v1.0.9 (<tt>-min_sv_length 50</tt>), and per-sample calls were merged with +<a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a> +v1.0.6 (1000 bp distance, type-match, no length-match) into a nonredundant +panel of 74,201 autosomal SVs (37,981 deletions and 36,220 insertions). +Over 95% of the SVs were concordant with Mendelian inheritance in the 111 +trio families; allele frequencies in this track are computed from the 222 +unrelated parents to avoid double-counting. +</p> +<p> +The site-only VCF <tt>tommo-JSV1-20211208-GRCh38-without-genotype-count.vcf.gz</tt> +was downloaded from the jMorp JSV1 dataset page, +<a href="https://jmorp.megabank.tohoku.ac.jp/datasets/tommo-jsv1-20211208-af" target="_blank"> +tommo-jsv1-20211208-af</a>. +</p> +<p> +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> Source data is available from the <a href="https://jmorp.megabank.tohoku.ac.jp/downloads" target="_blank">jMorp downloads page</a> (ToMMo Japanese Multi Omics Reference Panel). </p> <h2>Credits</h2> <p> Thanks to the Tohoku Medical Megabank Organization for making their structural variant calls publicly available through the jMorp data portal. </p> <h2>References</h2> <p> Otsuki A, Okamura Y, Ishida N, Tadaka S, Takayama J, Kumada K, Kawashima J, Taguchi K, Minegishi N, Kuriyama S <em>et al</em>. <a href="https://doi.org/10.1038/s42003-022-03953-1" target="_blank"> Construction of a trio-based structural variation panel utilizing activated T lymphocytes and long- read sequencing technology</a>. <em>Commun Biol</em>. 2022 Sep 20;5(1):991. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36127505" target="_blank">36127505</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9489684/" target="_blank">PMC9489684</a> </p>