9a11061ca6b40fe16bdfd09b1af53192f6c7c85b
max
  Tue Apr 21 08:13:02 2026 -0700
lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3

Subtrack stanzas for these SV callsets landed in earlier commits but
the conversion scripts and per-track HTML description pages were
never added; trackDb therefore had no doc to serve. This commit
catches up.

Docs (new):
- colorsDbSv.html     CoLoRSdb 1,427-sample long-read SVs
- gustafsonSv.html    1KG ONT 100 (Gustafson 2024, PMID 39358015)
- hgsvc2Sv.html       HGSVC2 (Ebert 2021, PMID 33632895)
- hprc2Sv.html        HPRC release-2 pangenome SVs (no PMID yet;
see humanpangenome.org/hprc-data-release-2/)
- onekg3202Sr.html    1KG 3202 Illumina SHORT-READ GATK-SV
(Byrska-Bishop 2022, PMID 36055201)

Scripts (new):
- lrSvGustafson.as / lrSvGustafsonVcfToBed.py
- lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py  (merges insdel + inv tables)
- lrSvHprc2.as / lrSvHprc2VcfToBed.py    (streams wave-decomposed VCF,
explodes multi-allelic rows,
filters to SV-sized or INV)
- lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py

HGSVC3 also on hs1:
- hgsvc3Sv.html: note that the hs1 build is native (not lifted):
HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and
released separate annotation tables per reference. Added the
T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb
download link to Data Access.
- doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers
back to doc/hg38/lrSv.txt for the full process.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/hgsvc3Sv.html src/hg/makeDb/trackDb/human/hgsvc3Sv.html
index e13dc55629b..df83bb19c5a 100644
--- src/hg/makeDb/trackDb/human/hgsvc3Sv.html
+++ src/hg/makeDb/trackDb/human/hgsvc3Sv.html
@@ -1,115 +1,138 @@
 <h2>Description</h2>
 <p>
 This track shows structural variants (SVs) from the third phase of the
 Human Genome Structural Variation Consortium (HGSVC3). The callset comes
 from 65 diverse individuals across five continental groups, each sequenced
 with PacBio HiFi (~47x), Oxford Nanopore ultra-long reads (~56x) and
 complemented with Strand-seq, optical mapping, Hi-C and Iso-Seq for
 haplotype-resolved assembly. SVs were discovered from the de novo assemblies
 with PAV v2.4.0.1 and cross-validated by ten additional orthogonal callers.
 </p>
 <p>
 The track merges the two final SV annotation tables from the HGSVC3 v1.0
 release on GRCh38: 176,232 insertions/deletions and 300 inversions, for a
 total of 176,532 SVs. Each row is a site-level variant with the list of
 carrier haplotypes and additional structural annotations.
 </p>
+<p>
+The same track is also available natively on the T2T-CHM13 (hs1)
+assembly: HGSVC3 independently aligned all haplotype-resolved assemblies
+to both GRCh38 and T2T-CHM13 and released a separate set of annotation
+tables per reference. The hs1 track is built directly from the
+<a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/" target="_blank">
+HGSVC3 T2T-CHM13 annotation tables</a> (188,224 DEL+INS and 276 INV;
+188,500 SVs total) — no liftOver is involved.
+</p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 Items are colored by SV type:
 <ul>
 <li><span style="color: rgb(200,0,0);">Deletions (DEL)</span> - red</li>
 <li><span style="color: rgb(0,0,200);">Insertions (INS)</span> - blue</li>
 <li><span style="color: rgb(230,140,0);">Inversions (INV)</span> - orange</li>
 </ul>
 </p>
 <p>
 Insertions are placed at the insertion site with a width of 1 bp; deletions
 and inversions span the affected reference interval. Filters are available
 for SV type, SV length, carrier-haplotype count, distinct sample count,
 whether the site falls in a Tandem Repeat Finder region and the fraction
 of the variant overlapping segmental duplications.
 </p>
 <p>
 The detail page shows, where available:
 <ul>
 <li><b>Allele / Sample Count</b>: number of carrier haplotypes (out of the
 2*65 = 130 phased haplotypes plus unphased "un" entries) and the number of
 distinct samples carrying the variant.</li>
 <li><b>Reference / Contig Homology</b>: microhomology length (5',3') at the
 breakpoints in the reference and in the assembly contig (insertions and
 deletions only).</li>
 <li><b>Inner Inversion Region</b>: for inversions, the coordinate range of
 the inner inverted sequence, distinct from the outer breakpoint interval.</li>
 <li><b>Transposable Element</b>: when the inserted or deleted sequence was
 classified as a known TE family.</li>
 <li><b>Segmental Duplication Overlap</b>: fraction of the variant interval
 overlapping UCSC segmental duplications in the reference.</li>
 <li><b>Carrier Haplotypes</b>: full list of haplotype IDs (e.g.
 <tt>HG00096-h1</tt>, <tt>HG00096-h2</tt>, <tt>HG00514-un</tt>) carrying the
 variant.</li>
 </ul>
 </p>
 
 <h2>Methods</h2>
 <p>
 HGSVC3 produced haplotype-resolved de novo assemblies for 65 samples
 spanning five continental groups. Assemblies were built from PacBio HiFi
 and Oxford Nanopore reads, phased with Strand-seq and further validated
 with Hi-C and optical mapping. Structural variants were called by aligning
 each haplotype back to the reference with PAV v2.4.0.1; calls were then
 cross-referenced with ten independent callers. The final annotation tables
 (this track's input) include merge statistics (MERGE_RO, MERGE_OFFSET,
 MERGE_SZRO, MERGE_OFFSZ, MERGE_MATCH) that describe how well each
 per-sample call matched the merged consensus site.
 </p>
 <p>
 Two tables were merged for display here:
 <tt>variants_GRCh38_sv_insdel_HGSVC2024v1.0.tsv.gz</tt> (DEL + INS, 176,232
 records) and <tt>variants_GRCh38_sv_inv_HGSVC2024v1.0.tsv.gz</tt> (INV, 300
 records). Type-specific columns (HOM_REF/HOM_TIG/TE for insdel;
 RGN_REF_INNER for inversions) are shown as empty on the detail page when
 they do not apply.
 </p>
+<p>
+The hs1 (T2T-CHM13) version of this track uses the same merge pipeline on
+the HGSVC3 T2T-CHM13 tables
+(<tt>variants_T2T-CHM13_sv_insdel_HGSVC2024v1.0.tsv.gz</tt> and
+<tt>variants_T2T-CHM13_sv_inv_HGSVC2024v1.0.tsv.gz</tt>) downloaded from
+<a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/" target="_blank">
+the HGSVC3 T2T-CHM13 release directory</a>.
+</p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed
 programmatically through our <a href="https://api.genome.ucsc.edu">API</a>,
 track=<i>hgsvc3Sv</i>.
 </p>
 <p>
-The bigBed is available from
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our
-download server</a> as <tt>hgsvc3.bb</tt>. Example:
-<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc3.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
+The bigBed is available from our download server for both assemblies:
+<ul>
+<li>GRCh38:
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc3.bb" target="_blank">
+hg38 hgsvc3.bb</a></li>
+<li>T2T-CHM13:
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hs1/lrSv/hgsvc3.bb" target="_blank">
+hs1 hgsvc3.bb</a></li>
+</ul>
+Example: <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/hgsvc3.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
 </p>
 <p>
 The original annotation tables are available from the
 <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/GRCh38/annotation_table/" target="_blank">
 HGSVC3 release</a> on the IGSR FTP site.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the Human Genome Structural Variation Consortium (HGSVC) and all
 participating sequencing and analysis centers for making the HGSVC3
 annotation tables publicly available.
 </p>
 
 <h2>References</h2>
 
 
 <p>
 Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo
 D <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-025-09140-6" target="_blank">
 Complex genetic variation in nearly complete human genomes</a>.
 <em>Nature</em>. 2025 Aug;644(8076):430-441.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702183" target="_blank">40702183</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350169/" target="_blank">PMC12350169</a>
 </p>