9a11061ca6b40fe16bdfd09b1af53192f6c7c85b
max
  Tue Apr 21 08:13:02 2026 -0700
lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3

Subtrack stanzas for these SV callsets landed in earlier commits but
the conversion scripts and per-track HTML description pages were
never added; trackDb therefore had no doc to serve. This commit
catches up.

Docs (new):
- colorsDbSv.html     CoLoRSdb 1,427-sample long-read SVs
- gustafsonSv.html    1KG ONT 100 (Gustafson 2024, PMID 39358015)
- hgsvc2Sv.html       HGSVC2 (Ebert 2021, PMID 33632895)
- hprc2Sv.html        HPRC release-2 pangenome SVs (no PMID yet;
see humanpangenome.org/hprc-data-release-2/)
- onekg3202Sr.html    1KG 3202 Illumina SHORT-READ GATK-SV
(Byrska-Bishop 2022, PMID 36055201)

Scripts (new):
- lrSvGustafson.as / lrSvGustafsonVcfToBed.py
- lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py  (merges insdel + inv tables)
- lrSvHprc2.as / lrSvHprc2VcfToBed.py    (streams wave-decomposed VCF,
explodes multi-allelic rows,
filters to SV-sized or INV)
- lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py

HGSVC3 also on hs1:
- hgsvc3Sv.html: note that the hs1 build is native (not lifted):
HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and
released separate annotation tables per reference. Added the
T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb
download link to Data Access.
- doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers
back to doc/hg38/lrSv.txt for the full process.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/gustafsonSv.html src/hg/makeDb/trackDb/human/gustafsonSv.html
new file mode 100644
index 00000000000..65923b24b32
--- /dev/null
+++ src/hg/makeDb/trackDb/human/gustafsonSv.html
@@ -0,0 +1,96 @@
+<h2>Description</h2>
+<p>
+This track shows structural variants (SVs) from Oxford Nanopore long-read
+whole-genome sequencing of 100 individuals in the 1000 Genomes Project,
+as released by the 1000 Genomes Project ONT Sequencing Consortium and
+described in Gustafson et al. 2024. The cohort spans all five 1000
+Genomes superpopulations and 19 subpopulations. Samples were sequenced
+with ONT R9.4.1 pores at ~37x coverage with median read N50 of ~54 kb.
+</p>
+<p>
+The track contains 113,696 SVs (63,177 insertions, 49,704 deletions,
+744 inversions, 71 duplications). Each variant was called by up to five
+independent methods (three alignment-based: Sniffles2, cuteSV, SVIM;
+and assembly-based hapdiff on Flye or Shasta/Hapdup assemblies) and then
+merged across callers and samples with Jasmine to produce a
+cross-sample consensus catalog.
+</p>
+<p>
+This 100-sample Gustafson cohort is distinct from the Vienna
+1000-Genomes-ONT release (<a href="hgTrackUi?g=lrSv1kgOnt">1KG ONT SVs</a>),
+which uses different samples, pore chemistry and callers; the two
+releases share neither samples nor calls.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Items are colored by SV type:
+<ul>
+<li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
+<li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
+<li><span style="color: rgb(0,160,0);">Duplications (DUP)</span> - green</li>
+<li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
+</ul>
+</p>
+<p>
+Insertions are placed at the insertion site with a width of 1 bp; deletions,
+duplications and inversions span the affected reference interval. Filters
+are available for SV type, SV length and carrier-sample count. The detail
+page also shows the number of per-caller calls supporting each site
+(VARCALLS) and whether the source caller marked the breakpoints as precise.
+</p>
+
+<h2>Methods</h2>
+<p>
+Long-read whole-genome sequencing was performed on 100 1000 Genomes
+samples with ONT R9.4.1 pores at a median coverage of ~37x and read N50
+of ~54 kb. Reads were aligned to GRCh38 with minimap2 and, for a subset,
+with the CARD pipeline. De novo assemblies were produced with Flye and
+with Shasta/Hapdup. Per-sample structural variant calls were generated
+with five independent methods (Sniffles2, cuteSV, SVIM on alignments;
+hapdiff on Flye and on Shasta/Hapdup assemblies) and merged across
+callers with Jasmine in two stages: first within each sample
+(intra-sample) to build per-sample consensus SVs, then across all 100
+samples to produce the shared site-level callset used here.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively in table format with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
+programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
+track=<i>gustafsonSv</i>.
+</p>
+<p>
+The bigBed is available from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
+download server</a> as <tt>gustafson.bb</tt>. Example:
+<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/gustafson.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
+</p>
+<p>
+The original VCF is available from the 1000 Genomes ONT S3 bucket:
+<a href="https://s3.amazonaws.com/1000g-ont/Gustafson_etal_2024_preprint_SUPPLEMENTAL/20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz" target="_blank">
+20240423_jasmine_intrasample_noBND_custom_suppvec_alphanumeric_header_JASMINE.vcf.gz</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to Gustafson and colleagues and the 1000 Genomes Project ONT
+Sequencing Consortium for releasing this dataset.
+</p>
+
+<h2>References</h2>
+
+
+<p>
+Gustafson JA, Gibson SB, Damaraju N, Zalusky MPG, Hoekzema K, Twesigomwe D, Yang L, Snead AA,
+Richmond PA, De Coster W <em>et al</em>.
+<a href="http://genome.cshlp.org/lookup/pmidlookup?view=long&amp;pmid=39358015" target="_blank">
+High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive
+catalog of human genetic variation</a>.
+<em>Genome Res</em>. 2024 Nov 20;34(11):2061-2073.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39358015" target="_blank">39358015</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11610458/" target="_blank">PMC11610458</a>
+</p>
+