bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/han945Sv.html src/hg/makeDb/trackDb/human/han945Sv.html index 2742da33cbe..a719ff0018a 100644 --- src/hg/makeDb/trackDb/human/han945Sv.html +++ src/hg/makeDb/trackDb/human/han945Sv.html @@ -14,39 +14,57 @@ <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li> <li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li> <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li> <li><span style="color: rgb(140,0,200);">Translocations (TRA)</span> - purple</li> </ul> </p> <p> Filters are available for SV type, SV length, allele frequency, and number of supporting samples. For insertions, the item is placed at the insertion site with a width of 1 bp. For translocations, only the first breakpoint is shown; the second breakpoint chromosome and position are listed in the item details. </p> <h2>Methods</h2> <p> -Long-read sequencing was performed on 945 Han Chinese individuals. -Structural variants were called per sample and then merged across all samples using +Gong et al. 2025 performed Oxford Nanopore long-read sequencing of 945 +Han Chinese individuals on PromethION instruments with R9.4 flow cells. +Reads were aligned to GRCh38.p13 with NGMLR v0.2.7 using ONT-tuned +parameters, and a joint-calling strategy was used to call SVs at moderate +coverage: per-sample discovery with +<a href="https://github.com/tjiangHIT/cuteSV" target="_blank">cuteSV</a> +v1.0.13, merging of breakpoints within 500 bp across individuals with <a href="https://github.com/fritzsedlazeck/SURVIVOR" target="_blank">SURVIVOR</a> -(v1.0.6). The merged VCF was converted to bigBed format for display. -Allele frequencies and per-sample support information were extracted from the -INFO fields of the merged VCF. The study identified two notable variants: -an ancestral deletion in GSDMD associated with bone density and kidney injury -risk, and a modern human-specific variant in WWP2 influencing height, body -composition, and facial features. +v1.0.6, per-sample re-genotyping of the merged set with LRcaller v1.0, and +a final BCFtools merge. SVs in centromeric, pericentromeric and gap regions +were filtered out, yielding 111,288 high-quality SVs: 49,518 deletions, +42,300 insertions, 13,503 duplications, 5,595 inversions and 372 +translocations. +</p> +<p> +The site-only VCF released at +<a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank"> +OMIX accession OED00945268</a> (<tt>OED00945268_Han_945samples_SV.vcf.gz</tt>) +was converted to BED for this track. +</p> +<p> +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The raw VCF data was obtained from the <a href="https://www.biosino.org/node/analysis/detail/OEZ007028" target="_blank">OMIX</a> repository (accession OED00945268) at the National Genomics Data Center (NGDC), China National Center for Bioinformation. </p> <p> The source VCF also encodes phased per-sample genotypes: the <tt>sampleList</tt> field on the detail page is derived from the SURVIVOR <tt>SUPP_VEC</tt> bitmask and is an ordered list of the 1-based indices of the 945 samples carrying each SV. The full per-sample phased VCF can be browsed as a separate track in the <a href="hgTrackUi?g=han945SvVcf">SVs from 945 Han Chinese</a> entry of