9a11061ca6b40fe16bdfd09b1af53192f6c7c85b max Tue Apr 21 08:13:02 2026 -0700 lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3 Subtrack stanzas for these SV callsets landed in earlier commits but the conversion scripts and per-track HTML description pages were never added; trackDb therefore had no doc to serve. This commit catches up. Docs (new): - colorsDbSv.html CoLoRSdb 1,427-sample long-read SVs - gustafsonSv.html 1KG ONT 100 (Gustafson 2024, PMID 39358015) - hgsvc2Sv.html HGSVC2 (Ebert 2021, PMID 33632895) - hprc2Sv.html HPRC release-2 pangenome SVs (no PMID yet; see humanpangenome.org/hprc-data-release-2/) - onekg3202Sr.html 1KG 3202 Illumina SHORT-READ GATK-SV (Byrska-Bishop 2022, PMID 36055201) Scripts (new): - lrSvGustafson.as / lrSvGustafsonVcfToBed.py - lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py (merges insdel + inv tables) - lrSvHprc2.as / lrSvHprc2VcfToBed.py (streams wave-decomposed VCF, explodes multi-allelic rows, filters to SV-sized or INV) - lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py HGSVC3 also on hs1: - hgsvc3Sv.html: note that the hs1 build is native (not lifted): HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and released separate annotation tables per reference. Added the T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb download link to Data Access. - doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers back to doc/hg38/lrSv.txt for the full process. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/hprc2Sv.html src/hg/makeDb/trackDb/human/hprc2Sv.html new file mode 100644 index 00000000000..5c3e89e4d93 --- /dev/null +++ src/hg/makeDb/trackDb/human/hprc2Sv.html @@ -0,0 +1,96 @@ +

Description

+

+This track shows structural variants (SVs) derived from the Human Pangenome +Reference Consortium (HPRC) release-2 pangenome graph. The graph was built +with minigraph-cactus from PacBio HiFi haplotype-resolved assemblies of 233 +samples (including T2T-CHM13 and the diverse 1000 Genomes Project sample +set), aligned to the GRCh38 reference path. Variants were extracted from +the graph with vg deconstruct and decomposed into atomic alleles +with vcfwave (WFA2-lib). +

+

+The track contains 1,483,114 SV-sized alleles (length ≥ 50 bp) split by +type: 1,106,190 insertions, 192,597 deletions, 178,178 complex alleles and +6,149 inversions. Each row carries the allele count, allele frequency, +number of samples with data and the snarl-nesting level of the variant in +the pangenome decomposition tree. +

+ +

Display Conventions and Configuration

+

+Items are colored by SV type: +

+

+

+Insertions are placed at the insertion site with a width of 1 bp; deletions, +complex alleles and inversions span the affected reference interval. +Filters are available for SV type, SV length, allele frequency and snarl +level (0 = top-level bubble; higher values are nested within parent +bubbles). +

+ +

Methods

+

+The HPRC v2.0 minigraph-cactus pangenome was downloaded as +hprc-v2.0-mc-grch38.sv.gfa.gz (the graph) and +hprc-v2.0-mc-grch38.wave.vcf.gz (the corresponding +wave-decomposed VCF) from the HPRC S3 release bucket. The VCF is the +result of running vg deconstruct on the graph with GRCh38 as the +reference path and then vcfwave / WFA2-lib to split complex +multi-allelic records into atomic alleles with per-allele TYPE and LEN +fields. +

+

+For display here, the wave VCF was streamed and each ALT was emitted as +its own BED row. Alleles were retained if their absolute length was +≥ 50 bp or if the record carried the INV flag (inversions may +be shorter). Allele counts, frequencies, and sample counts are taken +directly from the per-allele INFO fields. +

+

+The list of assemblies underlying the pangenome is documented at +human-pangenomics/hprc_intermediate_assembly +alignments_v2.0.csv. +

+ +

Data Access

+

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator, and accessed +programmatically through our API, +track=hprc2Sv. +

+

+The bigBed is available from +our +download server as hprc2.bb. Example: +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2.bb -chrom=chr21 -start=0 -end=100000000 stdout. +

+

+The original pangenome graph and the wave-decomposed VCF are available +from the HPRC public S3 bucket, as linked from the +HPRC +release-2 announcement. +

+ +

Credits

+

+Thanks to the Human Pangenome Reference Consortium for building and +publicly releasing the release-2 minigraph-cactus pangenome. +

+ +

References

+

+HPRC release-2 data is not yet described in a formal peer-reviewed +publication. See the Human Pangenome Project release announcement +for background and data-access details: + +HPRC data release 2. +