src/hg/makeDb/trackDb/human/hprc2Sv.html 9a11061ca6b40fe16bdfd09b1af53192f6c7c85b

9a11061ca6b40fe16bdfd09b1af53192f6c7c85b
max
  Tue Apr 21 08:13:02 2026 -0700
lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3

Subtrack stanzas for these SV callsets landed in earlier commits but
the conversion scripts and per-track HTML description pages were
never added; trackDb therefore had no doc to serve. This commit
catches up.

Docs (new):
- colorsDbSv.html     CoLoRSdb 1,427-sample long-read SVs
- gustafsonSv.html    1KG ONT 100 (Gustafson 2024, PMID 39358015)
- hgsvc2Sv.html       HGSVC2 (Ebert 2021, PMID 33632895)
- hprc2Sv.html        HPRC release-2 pangenome SVs (no PMID yet;
see humanpangenome.org/hprc-data-release-2/)
- onekg3202Sr.html    1KG 3202 Illumina SHORT-READ GATK-SV
(Byrska-Bishop 2022, PMID 36055201)

Scripts (new):
- lrSvGustafson.as / lrSvGustafsonVcfToBed.py
- lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py  (merges insdel + inv tables)
- lrSvHprc2.as / lrSvHprc2VcfToBed.py    (streams wave-decomposed VCF,
explodes multi-allelic rows,
filters to SV-sized or INV)
- lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py

HGSVC3 also on hs1:
- hgsvc3Sv.html: note that the hs1 build is native (not lifted):
HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and
released separate annotation tables per reference. Added the
T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb
download link to Data Access.
- doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers
back to doc/hg38/lrSv.txt for the full process.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/hprc2Sv.html src/hg/makeDb/trackDb/human/hprc2Sv.html
new file mode 100644
index 00000000000..5c3e89e4d93
--- /dev/null
+++ src/hg/makeDb/trackDb/human/hprc2Sv.html
@@ -0,0 +1,96 @@
+<h2>Description</h2>
+<p>
+This track shows structural variants (SVs) derived from the Human Pangenome
+Reference Consortium (HPRC) release-2 pangenome graph. The graph was built
+with minigraph-cactus from PacBio HiFi haplotype-resolved assemblies of 233
+samples (including T2T-CHM13 and the diverse 1000 Genomes Project sample
+set), aligned to the GRCh38 reference path. Variants were extracted from
+the graph with <tt>vg deconstruct</tt> and decomposed into atomic alleles
+with <tt>vcfwave</tt> (WFA2-lib).
+</p>
+<p>
+The track contains 1,483,114 SV-sized alleles (length &ge; 50 bp) split by
+type: 1,106,190 insertions, 192,597 deletions, 178,178 complex alleles and
+6,149 inversions. Each row carries the allele count, allele frequency,
+number of samples with data and the snarl-nesting level of the variant in
+the pangenome decomposition tree.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Items are colored by SV type:
+<ul>
+<li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
+<li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
+<li><span style="color: rgb(140,0,200);">Complex alleles (COMPLEX)</span> - purple</li>
+<li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
+</ul>
+</p>
+<p>
+Insertions are placed at the insertion site with a width of 1 bp; deletions,
+complex alleles and inversions span the affected reference interval.
+Filters are available for SV type, SV length, allele frequency and snarl
+level (0 = top-level bubble; higher values are nested within parent
+bubbles).
+</p>
+
+<h2>Methods</h2>
+<p>
+The HPRC v2.0 minigraph-cactus pangenome was downloaded as
+<tt>hprc-v2.0-mc-grch38.sv.gfa.gz</tt> (the graph) and
+<tt>hprc-v2.0-mc-grch38.wave.vcf.gz</tt> (the corresponding
+wave-decomposed VCF) from the HPRC S3 release bucket. The VCF is the
+result of running <tt>vg deconstruct</tt> on the graph with GRCh38 as the
+reference path and then <tt>vcfwave</tt> / WFA2-lib to split complex
+multi-allelic records into atomic alleles with per-allele TYPE and LEN
+fields.
+</p>
+<p>
+For display here, the wave VCF was streamed and each ALT was emitted as
+its own BED row. Alleles were retained if their absolute length was
+&ge; 50 bp or if the record carried the <tt>INV</tt> flag (inversions may
+be shorter). Allele counts, frequencies, and sample counts are taken
+directly from the per-allele INFO fields.
+</p>
+<p>
+The list of assemblies underlying the pangenome is documented at
+<a href="https://github.com/human-pangenomics/hprc_intermediate_assembly/blob/main/data_tables/pangenomes/alignments_v2.0.csv"
+   target="_blank">human-pangenomics/hprc_intermediate_assembly
+<tt>alignments_v2.0.csv</tt></a>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively in table format with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
+programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
+track=<i>hprc2Sv</i>.
+</p>
+<p>
+The bigBed is available from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
+download server</a> as <tt>hprc2.bb</tt>. Example:
+<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hprc2.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
+</p>
+<p>
+The original pangenome graph and the wave-decomposed VCF are available
+from the HPRC public S3 bucket, as linked from the
+<a href="https://humanpangenome.org/hprc-data-release-2/" target="_blank">HPRC
+release-2 announcement</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to the Human Pangenome Reference Consortium for building and
+publicly releasing the release-2 minigraph-cactus pangenome.
+</p>
+
+<h2>References</h2>
+<p>
+HPRC release-2 data is not yet described in a formal peer-reviewed
+publication. See the Human Pangenome Project release announcement
+for background and data-access details:
+<a href="https://humanpangenome.org/hprc-data-release-2/" target="_blank">
+HPRC data release 2</a>.
+</p>