bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/han945Sv.html src/hg/makeDb/trackDb/human/han945Sv.html index 2742da33cbe..a719ff0018a 100644 --- src/hg/makeDb/trackDb/human/han945Sv.html +++ src/hg/makeDb/trackDb/human/han945Sv.html @@ -1,72 +1,90 @@ <h2>Description</h2> <p> This track shows structural variants (SVs) identified by long-read sequencing of 945 Han Chinese individuals. The dataset contains 111,288 SVs merged across samples using SURVIVOR, including 49,518 deletions, 42,300 insertions, 13,503 duplications, 5,595 inversions, and 372 translocations. </p> <h2>Display Conventions and Configuration</h2> <p> Items are colored by SV type: <ul> <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li> <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li> <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> <li><span style="color: rgb(140,0,200);">Translocations (TRA)</span> - purple</li> </ul> </p> <p> Filters are available for SV type, SV length, allele frequency, and number of supporting samples. For insertions, the item is placed at the insertion site with a width of 1 bp. For translocations, only the first breakpoint is shown; the second breakpoint chromosome and position are listed in the item details. </p> <h2>Methods</h2> <p> -Long-read sequencing was performed on 945 Han Chinese individuals. -Structural variants were called per sample and then merged across all samples using +Gong et al. 2025 performed Oxford Nanopore long-read sequencing of 945 +Han Chinese individuals on PromethION instruments with R9.4 flow cells. +Reads were aligned to GRCh38.p13 with NGMLR v0.2.7 using ONT-tuned +parameters, and a joint-calling strategy was used to call SVs at moderate +coverage: per-sample discovery with +<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">cuteSV</a> +v1.0.13, merging of breakpoints within 500 bp across individuals with <a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a> -(v1.0.6). The merged VCF was converted to bigBed format for display. -Allele frequencies and per-sample support information were extracted from the -INFO fields of the merged VCF. The study identified two notable variants: -an ancestral deletion in GSDMD associated with bone density and kidney injury -risk, and a modern human-specific variant in WWP2 influencing height, body -composition, and facial features. +v1.0.6, per-sample re-genotyping of the merged set with LRcaller v1.0, and +a final BCFtools merge. SVs in centromeric, pericentromeric and gap regions +were filtered out, yielding 111,288 high-quality SVs: 49,518 deletions, +42,300 insertions, 13,503 duplications, 5,595 inversions and 372 +translocations. +</p> +<p> +The site-only VCF released at +<a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank"> +OMIX accession OED00945268</a> (<tt>OED00945268_Han_945samples_SV.vcf.gz</tt>) +was converted to BED for this track. +</p> +<p> +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The raw VCF data was obtained from the <a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">OMIX</a> repository (accession OED00945268) at the National Genomics Data Center (NGDC), China National Center for Bioinformation. </p> <p> The source VCF also encodes phased per-sample genotypes: the <tt>sampleList</tt> field on the detail page is derived from the SURVIVOR <tt>SUPP_VEC</tt> bitmask and is an ordered list of the 1-based indices of the 945 samples carrying each SV. The full per-sample phased VCF can be browsed as a separate track in the <a href="hgTrackUi?g=han945SvVcf">SVs from 945 Han Chinese</a> entry of the <a href="hgTrackUi?g=phasedVars">Phased Variants</a> track collection. </p> <h2>Credits</h2> <p> Thanks to Gong et al. for making their structural variant calls publicly available. </p> <h2>References</h2> <p> Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y <em>et al</em>. <a href="https://doi.org/10.1038/s41467-025-56661-9" target="_blank"> Long-read sequencing of 945 Han individuals identifies structural variants associated with phenotypic diversity and disease susceptibility</a>. <em>Nat Commun</em>. 2025 Feb 10;16(1):1494. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39929826" target="_blank">39929826</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811171/" target="_blank">PMC11811171</a> </p>