8a5a466f5e13a020954014cdefc81400072db516 max Tue Apr 21 08:29:55 2026 -0700 lrSv: add hprc2 hs1 subtrack using T2T-CHM13 wave VCF, refs #36258 The HPRC release-2 pangenome publishes a wave-decomposed VCF against both GRCh38 and T2T-CHM13. We already had the GRCh38 version as the hprc2Sv subtrack on hg38; this adds the parallel T2T-CHM13 build under /gbdb/hs1/lrSv/hprc2.bb. The existing trackDb stanza (bigDataUrl /gbdb/$D/lrSv/hprc2.bb) picks it up on hs1 without changes. 1,451,269 SV rows kept (937,425 INS, 360,960 DEL, 147,898 COMPLEX, 4,986 INV) using the existing lrSvHprc2VcfToBed.py converter. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt index b164679d86d..918b750b12e 100644 --- src/hg/makeDb/doc/hg38/lrSv.txt +++ src/hg/makeDb/doc/hg38/lrSv.txt @@ -348,30 +348,33 @@ # wave-decomposed VCF (what we actually convert): wget https://s3-us-west-2.amazonaws.com/human-pangenomics/pangenomes/freeze/release2/minigraph-cactus/hprc-v2.0-mc-grch38.wave.vcf.gz # The wave VCF contains ~20M atomic alleles including SNVs. The converter # streams the multi-allelic rows, explodes one BED row per ALT, and keeps # only SV-sized alleles (|LEN| >= 50 bp) plus all records carrying the # INV flag. 1,483,114 SVs kept (1,106,190 INS, 192,597 DEL, 178,178 # COMPLEX, 6,149 INV). python3 ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2VcfToBed.py \ hprc-v2.0-mc-grch38.wave.vcf.gz hprc2.bed bedSort hprc2.bed hprc2.sorted.bed bedToBigBed -type=bed9+ -as=$HOME/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2.as \ -tab hprc2.sorted.bed /hive/data/genomes/hg38/chrom.sizes hprc2.bb +# HPRC also releases a wave VCF against T2T-CHM13; the hs1 version of this +# subtrack is built in ~/kent/src/hg/makeDb/doc/hs1/lrSv.txt. + ########## # 2026-04-20 Claude max # CPC + HPRC Phase 1 pangenome SVs (105 samples). # Paper: Gao et al. 2023, Nature, PMID 37316654 # Data : https://github.com/Shuhua-Group/Chinese-Pangenome-Consortium-Phase-I # The VCF is on T2T-CHM13v2 (hs1) contigs renamed "CHM13v2.chrN". # Source VCF (CPC.HPRC.Phase1.processed.SVs.normed.vcf.gz, 3.7 GB) was # produced with pggb + vcfwave + bcftools norm; each graph snarl appears # as one VCF row per alternative allele, with genotypes for 105 samples. mkdir -p /hive/data/genomes/hg38/bed/lrSv/cpc1 cd /hive/data/genomes/hg38/bed/lrSv/cpc1 # (VCF already placed here by the user)