8a5a466f5e13a020954014cdefc81400072db516
max
  Tue Apr 21 08:29:55 2026 -0700
lrSv: add hprc2 hs1 subtrack using T2T-CHM13 wave VCF, refs #36258

The HPRC release-2 pangenome publishes a wave-decomposed VCF against
both GRCh38 and T2T-CHM13. We already had the GRCh38 version as the
hprc2Sv subtrack on hg38; this adds the parallel T2T-CHM13 build under
/gbdb/hs1/lrSv/hprc2.bb. The existing trackDb stanza (bigDataUrl
/gbdb/$D/lrSv/hprc2.bb) picks it up on hs1 without changes.

1,451,269 SV rows kept (937,425 INS, 360,960 DEL, 147,898 COMPLEX,
4,986 INV) using the existing lrSvHprc2VcfToBed.py converter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index b164679d86d..918b750b12e 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -348,30 +348,33 @@
 # wave-decomposed VCF (what we actually convert):
 wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.wave.vcf.gz
 
 # The wave VCF contains ~20M atomic alleles including SNVs. The converter
 # streams the multi-allelic rows, explodes one BED row per ALT, and keeps
 # only SV-sized alleles (|LEN| >= 50 bp) plus all records carrying the
 # INV flag. 1,483,114 SVs kept (1,106,190 INS, 192,597 DEL, 178,178
 # COMPLEX, 6,149 INV).
 
 python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2VcfToBed.py \
     hprc-v2.0-mc-grch38.wave.vcf.gz hprc2.bed
 bedSort hprc2.bed hprc2.sorted.bed
 bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2.as \
     -tab hprc2.sorted.bed /hive/data/genomes/hg38/chrom.sizes hprc2.bb
 
+# HPRC also releases a wave VCF against T2T-CHM13; the hs1 version of this
+# subtrack is built in ~/kent/src/hg/makeDb/doc/hs1/lrSv.txt.
+
 ##########
 # 2026-04-20 Claude max
 
 # CPC + HPRC Phase 1 pangenome SVs (105 samples).
 # Paper: Gao et al. 2023, Nature, PMID 37316654
 # Data : https://github.com/Shuhua-Group/Chinese-Pangenome-Consortium-Phase-I
 # The VCF is on T2T-CHM13v2 (hs1) contigs renamed "CHM13v2.chrN".
 # Source VCF (CPC.HPRC.Phase1.processed.SVs.normed.vcf.gz, 3.7 GB) was
 # produced with pggb + vcfwave + bcftools norm; each graph snarl appears
 # as one VCF row per alternative allele, with genotypes for 105 samples.
 
 mkdir -p /hive/data/genomes/hg38/bed/lrSv/cpc1
 cd /hive/data/genomes/hg38/bed/lrSv/cpc1
 
 # (VCF already placed here by the user)