bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/hgsvc2Sv.html src/hg/makeDb/trackDb/human/hgsvc2Sv.html index 500ba0a04f9..994ff8c8441 100644 --- src/hg/makeDb/trackDb/human/hgsvc2Sv.html +++ src/hg/makeDb/trackDb/human/hgsvc2Sv.html @@ -1,122 +1,135 @@

Description

This track shows structural variants (SVs) from the second phase of the Human Genome Structural Variation Consortium (HGSVC2). The callset is derived from 32 haplotype-resolved diploid genomes (64 phased haplotypes) spanning five 1000 Genomes superpopulations (African, Admixed American, East Asian, European, South Asian). Each genome was sequenced with PacBio long reads (continuous long-read and HiFi) and phased with Strand-seq, enabling comprehensive characterization of SVs that short-read approaches miss.

The track merges the two SV annotation tables from the HGSVC2 v2.0 integrated callset freeze 4: 111,330 insertions/deletions and 416 inversions, for a total of 111,746 SVs. Each row is a site-level variant with per-site allele count, carrier haplotypes, population-scale allele frequencies (imputed from the phased callset back into 1000 Genomes, insertions and deletions only) and structural annotations.

Display Conventions and Configuration

Items are colored by SV type:

Insertions are placed at the insertion site with a width of 1 bp; deletions and inversions span the affected reference interval. Filters are available for SV type, SV length, carrier-haplotype count, distinct sample count, whether the site falls in a Tandem Repeat Finder region and the fraction of the variant overlapping segmental duplications.

The detail page shows, where available:

Methods

-HGSVC2 generated phased haplotype-resolved de novo assemblies for 32 -diploid samples across five 1000 Genomes superpopulations. Assemblies -were built from PacBio continuous long reads and HiFi reads and phased -with Strand-seq. Structural variants were discovered from each haplotype -assembly using PAV and validated with multiple orthogonal callers -(including PBSV, Bionano, DeepVariant, PAV-LRA, and others recorded in -per-site validation columns). The final SV set was merged to produce the -integrated callset used here. +Ebert et al. 2021 produced phased haplotype-resolved de novo assemblies for +32 diploid samples (64 unrelated haplotypes) across five 1000 Genomes +superpopulations on the PacBio Sequel II platform, using continuous +long-read sequencing (CLR, >40x) and high-fidelity sequencing (HiFi, +>20x). Single-cell Strand-seq data from the same samples were used to +phase the assemblies without parental trios, yielding N50 contigs >25 Mbp +at QV > 40. SVs were discovered from the two haplotype assemblies of +each sample with the Phased Assembly Variant (PAV) caller against GRCh38, +and candidate SVs were orthogonally supported by at least one of seven +other sources (read-based callers MELT, PBSV and PALMER; Bionano optical +mapping; breakpoint k-mer analysis; PAV replication with LRA). This +yielded the integrated nonredundant callset of 107,590 insertion/deletion +SVs and 316 inversions. Population-scale allele frequencies (POP_*_AF) were +obtained by graph-based re-genotyping of the HGSVC2 SVs into the +3,202-sample 1000 Genomes short-read cohort with PanGenie (insertions and +deletions only).

-Population-scale allele frequencies (POP_*_AF) were derived by imputing -the HGSVC2 SVs back into the full 1000 Genomes short-read cohort. These -fields are only available for insertions and deletions. +For display, the HGSVC2 v2.0 freeze-4 annotation tables +variants_freeze4_sv_insdel.tsv.gz (111,330 DEL+INS) and +variants_freeze4_sv_inv.tsv.gz (416 INV) were downloaded from the + +IGSR HGSVC2 v2.0 integrated-callset directory and merged into a single +bigBed; type-specific columns (POP_*_AF for insdel, RGN_REF_INNER for +inversions) are empty on the detail page when they do not apply.

-Two tables were merged for display here: -variants_freeze4_sv_insdel.tsv.gz (DEL + INS, 111,330 records) and -variants_freeze4_sv_inv.tsv.gz (INV, 416 records). Type-specific -columns (POP_*_AF for insdel, RGN_REF_INNER for inversions) are shown as -empty on the detail page when they do not apply. +The step-by-step build commands (download, format conversion, bigBed build) +are recorded in the UCSC makeDoc for this track container: + +doc/hg38/lrSv.txt. The conversion scripts and autoSql schemas live in + +makeDb/scripts/lrSv.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator, and accessed programmatically through our API, track=hgsvc2Sv.

The bigBed is available from our download server as hgsvc2.bb. Example: bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc2.bb -chrom=chr21 -start=0 -end=100000000 stdout.

The original annotation tables and VCFs are available from the HGSVC2 v2.0 integrated callset on the IGSR FTP site.

Credits

Thanks to the Human Genome Structural Variation Consortium (HGSVC) and the 1000 Genomes Project for releasing this dataset. Later HGSVC releases are also available as UCSC tracks: HGSVC3 65 SVs.

References

Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, Serra Mari R et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science. 2021 Apr 2;372(6537). PMID: 33632895; PMC: PMC8026704