9a11061ca6b40fe16bdfd09b1af53192f6c7c85b max Tue Apr 21 08:13:02 2026 -0700 lrSv: add HTML doc pages and conversion scripts for recent subtracks, + hs1 HGSVC3 Subtrack stanzas for these SV callsets landed in earlier commits but the conversion scripts and per-track HTML description pages were never added; trackDb therefore had no doc to serve. This commit catches up. Docs (new): - colorsDbSv.html CoLoRSdb 1,427-sample long-read SVs - gustafsonSv.html 1KG ONT 100 (Gustafson 2024, PMID 39358015) - hgsvc2Sv.html HGSVC2 (Ebert 2021, PMID 33632895) - hprc2Sv.html HPRC release-2 pangenome SVs (no PMID yet; see humanpangenome.org/hprc-data-release-2/) - onekg3202Sr.html 1KG 3202 Illumina SHORT-READ GATK-SV (Byrska-Bishop 2022, PMID 36055201) Scripts (new): - lrSvGustafson.as / lrSvGustafsonVcfToBed.py - lrSvHgsvc2.as / lrSvHgsvc2TsvToBed.py (merges insdel + inv tables) - lrSvHprc2.as / lrSvHprc2VcfToBed.py (streams wave-decomposed VCF, explodes multi-allelic rows, filters to SV-sized or INV) - lrSv1kg3202Sr.as / lrSv1kg3202SrVcfToBed.py HGSVC3 also on hs1: - hgsvc3Sv.html: note that the hs1 build is native (not lifted): HGSVC3 aligned all assemblies to both GRCh38 and T2T-CHM13 and released separate annotation tables per reference. Added the T2T-CHM13 source URL to the Methods section and the hs1 hgsvc3.bb download link to Data Access. - doc/hs1/lrSv.txt (new): hs1-specific wget + build steps; refers back to doc/hg38/lrSv.txt for the full process. refs #36258 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/doc/hs1/lrSv.txt src/hg/makeDb/doc/hs1/lrSv.txt new file mode 100644 index 00000000000..2ac7a1808ab --- /dev/null +++ src/hg/makeDb/doc/hs1/lrSv.txt @@ -0,0 +1,30 @@ +# 2026-04-21 Claude max + +# Long-read SVs on hs1 (T2T-CHM13). HGSVC3 released a parallel set of SV +# annotation tables native to T2T-CHM13, which we convert with the same +# pipeline as the hg38 HGSVC3 subtrack. The full process (converter, +# autoSql, bigBed build, trackDb setup, summary table, references) is +# documented in ~/kent/src/hg/makeDb/doc/hg38/lrSv.txt; this file only +# lists the hs1-specific shell steps. + +mkdir -p /hive/data/genomes/hs1/bed/lrSv/hgsvc3 +cd /hive/data/genomes/hs1/bed/lrSv/hgsvc3 + +wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/variants_T2T-CHM13_sv_insdel_HGSVC2024v1.0.tsv.gz +wget https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Variant_Calls/1.0/T2T-CHM13/annotation_table/variants_T2T-CHM13_sv_inv_HGSVC2024v1.0.tsv.gz + +# 188,224 DEL+INS + 276 INV = 188,500 SVs, natively on T2T-CHM13. The +# converter is the same one used for the hg38 track (shared .as + .py). +python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc3TsvToBed.py \ + variants_T2T-CHM13_sv_insdel_HGSVC2024v1.0.tsv.gz \ + variants_T2T-CHM13_sv_inv_HGSVC2024v1.0.tsv.gz \ + hgsvc3.bed +bedSort hgsvc3.bed hgsvc3.sorted.bed +bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHgsvc3.as \ + -tab hgsvc3.sorted.bed /hive/data/genomes/hs1/chrom.sizes hgsvc3.bb + +# Symlink under /gbdb/hs1/lrSv with the same filename as the hg38 track, +# so the trackDb bigDataUrl (/gbdb/$D/lrSv/hgsvc3.bb) resolves on both +# assemblies. +mkdir -p /gbdb/hs1/lrSv +ln -sf /hive/data/genomes/hs1/bed/lrSv/hgsvc3/hgsvc3.bb /gbdb/hs1/lrSv/hgsvc3.bb