9a11061ca6b40fe16bdfd09b1af53192f6c7c85b
max
Tue Apr 21 08:13:02 2026 -0700
lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3
Subtrack stanzas for these SV callsets landed in earlier commits but
the conversion scripts and per-track HTML description pages were
never added; trackDb therefore had no doc to serve. This commit
catches up.
Docs (new):
- colorsDbSv.html CoLoRSdb 1,427-sample long-read SVs
- gustafsonSv.html 1KG ONT 100 (Gustafson 2024, PMID 39358015)
- hgsvc2Sv.html HGSVC2 (Ebert 2021, PMID 33632895)
- hprc2Sv.html HPRC release-2 pangenome SVs (no PMID yet;
see humanpangenome.org/hprc-data-release-2/)
- onekg3202Sr.html 1KG 3202 Illumina SHORT-READ GATK-SV
(Byrska-Bishop 2022, PMID 36055201)
Scripts (new):
- lrSvGustafson.as / lrSvGustafsonVcfToBed.py
- lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py (merges insdel + inv tables)
- lrSvHprc2.as / lrSvHprc2VcfToBed.py (streams wave-decomposed VCF,
explodes multi-allelic rows,
filters to SV-sized or INV)
- lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py
HGSVC3 also on hs1:
- hgsvc3Sv.html: note that the hs1 build is native (not lifted):
HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and
released separate annotation tables per reference. Added the
T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb
download link to Data Access.
- doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers
back to doc/hg38/lrSv.txt for the full process.
refs #36258
Co-Authored-By: Claude Opus 4.7 (1M context)
+This track shows structural variants (SVs) from the second phase of the
+Human Genome Structural Variation Consortium (HGSVC2). The callset is
+derived from 32 haplotype-resolved diploid genomes (64 phased haplotypes)
+spanning five 1000 Genomes superpopulations (African, Admixed American,
+East Asian, European, South Asian). Each genome was sequenced with
+PacBio long reads (continuous long-read and HiFi) and phased with
+Strand-seq, enabling comprehensive characterization of SVs that short-read
+approaches miss.
+
+The track merges the two SV annotation tables from the HGSVC2 v2.0
+integrated callset freeze 4: 111,330 insertions/deletions and 416
+inversions, for a total of 111,746 SVs. Each row is a site-level variant
+with per-site allele count, carrier haplotypes, population-scale allele
+frequencies (imputed from the phased callset back into 1000 Genomes,
+insertions and deletions only) and structural annotations.
+
+Items are colored by SV type:
+Description
+Display Conventions and Configuration
+
+
+
+Insertions are placed at the insertion site with a width of 1 bp; deletions +and inversions span the affected reference interval. Filters are available +for SV type, SV length, carrier-haplotype count, distinct sample count, +whether the site falls in a Tandem Repeat Finder region and the fraction +of the variant overlapping segmental duplications. +
++The detail page shows, where available: +
+HGSVC2 generated phased haplotype-resolved de novo assemblies for 32 +diploid samples across five 1000 Genomes superpopulations. Assemblies +were built from PacBio continuous long reads and HiFi reads and phased +with Strand-seq. Structural variants were discovered from each haplotype +assembly using PAV and validated with multiple orthogonal callers +(including PBSV, Bionano, DeepVariant, PAV-LRA, and others recorded in +per-site validation columns). The final SV set was merged to produce the +integrated callset used here. +
++Population-scale allele frequencies (POP_*_AF) were derived by imputing +the HGSVC2 SVs back into the full 1000 Genomes short-read cohort. These +fields are only available for insertions and deletions. +
++Two tables were merged for display here: +variants_freeze4_sv_insdel.tsv.gz (DEL + INS, 111,330 records) and +variants_freeze4_sv_inv.tsv.gz (INV, 416 records). Type-specific +columns (POP_*_AF for insdel, RGN_REF_INNER for inversions) are shown as +empty on the detail page when they do not apply. +
+ ++The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=hgsvc2Sv. +
++The bigBed is available from +our +download server as hgsvc2.bb. Example: +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc2.bb -chrom=chr21 -start=0 -end=100000000 stdout. +
++The original annotation tables and VCFs are available from the + +HGSVC2 v2.0 integrated callset on the IGSR FTP site. +
+ ++Thanks to the Human Genome Structural Variation Consortium (HGSVC) and +the 1000 Genomes Project for releasing this dataset. Later HGSVC releases +are also available as UCSC tracks: +HGSVC3 65 SVs. +
+ ++Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, +Serra Mari R et al. + +Haplotype-resolved diverse human genomes and integrated analysis of structural variation. +Science. 2021 Apr 2;372(6537). +PMID: 33632895; PMC: PMC8026704 +
+