9a11061ca6b40fe16bdfd09b1af53192f6c7c85b
max
  Tue Apr 21 08:13:02 2026 -0700
lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3

Subtrack stanzas for these SV callsets landed in earlier commits but
the conversion scripts and per-track HTML description pages were
never added; trackDb therefore had no doc to serve. This commit
catches up.

Docs (new):
- colorsDbSv.html     CoLoRSdb 1,427-sample long-read SVs
- gustafsonSv.html    1KG ONT 100 (Gustafson 2024, PMID 39358015)
- hgsvc2Sv.html       HGSVC2 (Ebert 2021, PMID 33632895)
- hprc2Sv.html        HPRC release-2 pangenome SVs (no PMID yet;
see humanpangenome.org/hprc-data-release-2/)
- onekg3202Sr.html    1KG 3202 Illumina SHORT-READ GATK-SV
(Byrska-Bishop 2022, PMID 36055201)

Scripts (new):
- lrSvGustafson.as / lrSvGustafsonVcfToBed.py
- lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py  (merges insdel + inv tables)
- lrSvHprc2.as / lrSvHprc2VcfToBed.py    (streams wave-decomposed VCF,
explodes multi-allelic rows,
filters to SV-sized or INV)
- lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py

HGSVC3 also on hs1:
- hgsvc3Sv.html: note that the hs1 build is native (not lifted):
HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and
released separate annotation tables per reference. Added the
T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb
download link to Data Access.
- doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers
back to doc/hg38/lrSv.txt for the full process.

refs #36258

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/doc/hs1/lrSv.txt src/hg/makeDb/doc/hs1/lrSv.txt
new file mode 100644
index 00000000000..2ac7a1808ab
--- /dev/null
+++ src/hg/makeDb/doc/hs1/lrSv.txt
@@ -0,0 +1,30 @@
+# 2026-04-21 Claude max
+
+# Long-read SVs on hs1 (T2T-CHM13). HGSVC3 released a parallel set of SV
+# annotation tables native to T2T-CHM13, which we convert with the same
+# pipeline as the hg38 HGSVC3 subtrack. The full process (converter,
+# autoSql, bigBed build, trackDb setup, summary table, references) is
+# documented in ~/kent/src/hg/makeDb/doc/hg38/lrSv.txt; this file only
+# lists the hs1-specific shell steps.
+
+mkdir -p /hive/data/genomes/hs1/bed/lrSv/hgsvc3
+cd /hive/data/genomes/hs1/bed/lrSv/hgsvc3
+
+wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/variants_T2T-CHM13_sv_insdel_HGSVC2024v1.0.tsv.gz
+wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/variants_T2T-CHM13_sv_inv_HGSVC2024v1.0.tsv.gz
+
+# 188,224 DEL+INS + 276 INV = 188,500 SVs, natively on T2T-CHM13. The
+# converter is the same one used for the hg38 track (shared .as + .py).
+python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc3TsvToBed.py \
+    variants_T2T-CHM13_sv_insdel_HGSVC2024v1.0.tsv.gz \
+    variants_T2T-CHM13_sv_inv_HGSVC2024v1.0.tsv.gz \
+    hgsvc3.bed
+bedSort hgsvc3.bed hgsvc3.sorted.bed
+bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc3.as \
+    -tab hgsvc3.sorted.bed /hive/data/genomes/hs1/chrom.sizes hgsvc3.bb
+
+# Symlink under /gbdb/hs1/lrSv with the same filename as the hg38 track,
+# so the trackDb bigDataUrl (/gbdb/$D/lrSv/hgsvc3.bb) resolves on both
+# assemblies.
+mkdir -p /gbdb/hs1/lrSv
+ln -sf /hive/data/genomes/hs1/bed/lrSv/hgsvc3/hgsvc3.bb /gbdb/hs1/lrSv/hgsvc3.bb