bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/han945Sv.html src/hg/makeDb/trackDb/human/han945Sv.html index 2742da33cbe..a719ff0018a 100644 --- src/hg/makeDb/trackDb/human/han945Sv.html +++ src/hg/makeDb/trackDb/human/han945Sv.html @@ -14,39 +14,57 @@

Insertions (INS) - blue

Duplications (DUP) - green

Inversions (INV) - orange

Translocations (TRA) - purple

Filters are available for SV type, SV length, allele frequency, and number of supporting samples. For insertions, the item is placed at the insertion site with a width of 1 bp. For translocations, only the first breakpoint is shown; the second breakpoint chromosome and position are listed in the item details.

Methods

-Long-read sequencing was performed on 945 Han Chinese individuals. -Structural variants were called per sample and then merged across all samples using +Gong et al. 2025 performed Oxford Nanopore long-read sequencing of 945 +Han Chinese individuals on PromethION instruments with R9.4 flow cells. +Reads were aligned to GRCh38.p13 with NGMLR v0.2.7 using ONT-tuned +parameters, and a joint-calling strategy was used to call SVs at moderate +coverage: per-sample discovery with +cuteSV +v1.0.13, merging of breakpoints within 500 bp across individuals with SURVIVOR -(v1.0.6). The merged VCF was converted to bigBed format for display. -Allele frequencies and per-sample support information were extracted from the -INFO fields of the merged VCF. The study identified two notable variants: -an ancestral deletion in GSDMD associated with bone density and kidney injury -risk, and a modern human-specific variant in WWP2 influencing height, body -composition, and facial features. +v1.0.6, per-sample re-genotyping of the merged set with LRcaller v1.0, and +a final BCFtools merge. SVs in centromeric, pericentromeric and gap regions +were filtered out, yielding 111,288 high-quality SVs: 49,518 deletions, +42,300 insertions, 13,503 duplications, 5,595 inversions and 372 +translocations. +

+The site-only VCF released at + +OMIX accession OED00945268 (OED00945268_Han_945samples_SV.vcf.gz) +was converted to BED for this track. +

+The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

Data Access

The raw VCF data was obtained from the OMIX repository (accession OED00945268) at the National Genomics Data Center (NGDC), China National Center for Bioinformation.

The source VCF also encodes phased per-sample genotypes: the sampleList field on the detail page is derived from the SURVIVOR SUPP_VEC bitmask and is an ordered list of the 1-based indices of the 945 samples carrying each SV. The full per-sample phased VCF can be browsed as a separate track in the SVs from 945 Han Chinese entry of