9a11061ca6b40fe16bdfd09b1af53192f6c7c85b max Tue Apr 21 08:13:02 2026 -0700 lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3 Subtrack stanzas for these SV callsets landed in earlier commits but the conversion scripts and per-track HTML description pages were never added; trackDb therefore had no doc to serve. This commit catches up. Docs (new): - colorsDbSv.html CoLoRSdb 1,427-sample long-read SVs - gustafsonSv.html 1KG ONT 100 (Gustafson 2024, PMID 39358015) - hgsvc2Sv.html HGSVC2 (Ebert 2021, PMID 33632895) - hprc2Sv.html HPRC release-2 pangenome SVs (no PMID yet; see humanpangenome.org/hprc-data-release-2/) - onekg3202Sr.html 1KG 3202 Illumina SHORT-READ GATK-SV (Byrska-Bishop 2022, PMID 36055201) Scripts (new): - lrSvGustafson.as / lrSvGustafsonVcfToBed.py - lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py (merges insdel + inv tables) - lrSvHprc2.as / lrSvHprc2VcfToBed.py (streams wave-decomposed VCF, explodes multi-allelic rows, filters to SV-sized or INV) - lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py HGSVC3 also on hs1: - hgsvc3Sv.html: note that the hs1 build is native (not lifted): HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and released separate annotation tables per reference. Added the T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb download link to Data Access. - doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers back to doc/hg38/lrSv.txt for the full process. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/hgsvc2Sv.html src/hg/makeDb/trackDb/human/hgsvc2Sv.html new file mode 100644 index 00000000000..500ba0a04f9 --- /dev/null +++ src/hg/makeDb/trackDb/human/hgsvc2Sv.html @@ -0,0 +1,122 @@ +

Description

+

+This track shows structural variants (SVs) from the second phase of the +Human Genome Structural Variation Consortium (HGSVC2). The callset is +derived from 32 haplotype-resolved diploid genomes (64 phased haplotypes) +spanning five 1000 Genomes superpopulations (African, Admixed American, +East Asian, European, South Asian). Each genome was sequenced with +PacBio long reads (continuous long-read and HiFi) and phased with +Strand-seq, enabling comprehensive characterization of SVs that short-read +approaches miss. +

+

+The track merges the two SV annotation tables from the HGSVC2 v2.0 +integrated callset freeze 4: 111,330 insertions/deletions and 416 +inversions, for a total of 111,746 SVs. Each row is a site-level variant +with per-site allele count, carrier haplotypes, population-scale allele +frequencies (imputed from the phased callset back into 1000 Genomes, +insertions and deletions only) and structural annotations. +

+ +

Display Conventions and Configuration

+

+Items are colored by SV type: +

+

+

+Insertions are placed at the insertion site with a width of 1 bp; deletions +and inversions span the affected reference interval. Filters are available +for SV type, SV length, carrier-haplotype count, distinct sample count, +whether the site falls in a Tandem Repeat Finder region and the fraction +of the variant overlapping segmental duplications. +

+

+The detail page shows, where available: +

+

+ +

Methods

+

+HGSVC2 generated phased haplotype-resolved de novo assemblies for 32 +diploid samples across five 1000 Genomes superpopulations. Assemblies +were built from PacBio continuous long reads and HiFi reads and phased +with Strand-seq. Structural variants were discovered from each haplotype +assembly using PAV and validated with multiple orthogonal callers +(including PBSV, Bionano, DeepVariant, PAV-LRA, and others recorded in +per-site validation columns). The final SV set was merged to produce the +integrated callset used here. +

+

+Population-scale allele frequencies (POP_*_AF) were derived by imputing +the HGSVC2 SVs back into the full 1000 Genomes short-read cohort. These +fields are only available for insertions and deletions. +

+

+Two tables were merged for display here: +variants_freeze4_sv_insdel.tsv.gz (DEL + INS, 111,330 records) and +variants_freeze4_sv_inv.tsv.gz (INV, 416 records). Type-specific +columns (POP_*_AF for insdel, RGN_REF_INNER for inversions) are shown as +empty on the detail page when they do not apply. +

+ +

Data Access

+

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=hgsvc2Sv. +

+

+The bigBed is available from +our +download server as hgsvc2.bb. Example: +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc2.bb -chrom=chr21 -start=0 -end=100000000 stdout. +

+

+The original annotation tables and VCFs are available from the + +HGSVC2 v2.0 integrated callset on the IGSR FTP site. +

+ +

Credits

+

+Thanks to the Human Genome Structural Variation Consortium (HGSVC) and +the 1000 Genomes Project for releasing this dataset. Later HGSVC releases +are also available as UCSC tracks: +HGSVC3 65 SVs. +

+ +

References

+ + +

+Ebert P, Audano PA, Zhu Q, Rodriguez-Martin B, Porubsky D, Bonder MJ, Sulovari A, Ebler J, Zhou W, +Serra Mari R et al. + +Haplotype-resolved diverse human genomes and integrated analysis of structural variation. +Science. 2021 Apr 2;372(6537). +PMID: 33632895; PMC: PMC8026704 +

+