src/hg/makeDb/doc/hg38/lrSv.txt 6b0d68657267f1e02c47d4224ea62446bbbb2ba0

6b0d68657267f1e02c47d4224ea62446bbbb2ba0
max
  Fri May 22 06:55:52 2026 -0700
small non-AI changes to the html docs pages of the long-read SV tracks

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index f19e8c2e813..67d923f950d 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -426,15 +426,57 @@
 # directly (placed in /hive/data/genomes/hg38/bed/lrSv/colorsDb/).
 # The previous bigBed came from an older build and declared `af` as a
 # string; the new build uses a checked-in converter, stores AF as a
 # float so the numeric filter works, and adds a derived `insLen`
 # column so the shared lrSv supertrack-level filter.insLen does not
 # error for this subtrack.
 
 cd /hive/data/genomes/hg38/bed/lrSv/colorsDb
 # Upstream VCFs (same pbsv.jasmine release, one per reference path):
 #   CoLoRSdb.GRCh38.v1.2.0.pbsv.jasmine.vcf.gz  (hg38, 426,239 SVs)
 #   CoLoRSdb.CHM13.v1.2.0.pbsv.jasmine.vcf.gz   (hs1,  839,714 SVs)
 bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvColorsDbSvBuild.sh
 # hg38: 59 MB, 192,534 DEL + 232,973 INS + 732 INV
 # hs1 : 87 MB (more variants due to T2T-added regions)
 # Existing /gbdb symlinks (sv.hg38.bb, sv.hs1.bb) are unchanged.
+
+##########
+# 2026-05-21 Claude max
+#
+# hprc2JasmineSv: SV callsets from 231 HPRC v2 haplotype-resolved
+# assemblies. The Hall lab (Wen-Wei Liao) ran 14 SV callers per sample:
+# DELLY, DeBreak, DeepVariant, PAV, SVDSS, SVIM, SVIM-asm, Sniffles2,
+# cuteSV, cuteSV-asm, dipcall, longcallD, pbsv, sawfish.
+# The per-sample multi-caller output was harmonized into three per-sample
+# VCFs (dipcall, PAV, longcallD pipelines). One file per sample per
+# assembly path (GRCh38_no_alt, CHM13v2) was placed into
+# /hive/data/genomes/hg38/bed/lrSv/hprc2jasmine/input/ via download_inputs.sh
+# (S3 URLs from merged_callsets.index.csv). No publication yet.
+#
+# Build steps:
+#   1. Split each per-sample VCF by chromosome and filter to SV-sized
+#      records (|alt-ref| >= 30 bp), keeping the REF/ALT sequences
+#      intact. Route by assembly tag in the filename into
+#      split2-hg38/<chr>/<sample>.vcf and split2-hs1/<chr>/<sample>.vcf.
+#      Filtering at this stage drops ~135x of records (SNVs + small
+#      indels) which is what makes the next step tractable.
+#   2. Jasmine-merge across samples per chromosome with default
+#      sequence-aware options: --ignore_merged_inputs --normalize_type.
+#      Outputs go to output2/<asm>/merged.<chr>.vcf, then bcftools
+#      concat + sort produce output2/<asm>/merged_all.vcf.gz.
+#   3. Convert merged VCFs to BED9+ with the multi-caller fields
+#      preserved (SUPP -> nSamples and AC, NCALLERS, CALLERS, SOURCES,
+#      MR), bedSort + bedToBigBed.
+cd /hive/data/genomes/hg38/bed/lrSv/hprc2jasmine
+bash splitVcfsFilterSv.sh
+bash processJasmineSvSeq.sh hg38
+bash processJasmineSvSeq.sh hs1
+bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvHprc2JasmineBuild.sh
+# hg38: 335,494 SVs merged (full 22 autosomes; chrX/chrY absent from inputs)
+# hs1:  (built same way from CHM13v2 per-sample calls)
+#
+# Note: an earlier symbolic-ALT pipeline (splitVcfs.sh + symbolizeVcfs.sh
+# + processJasmine.sh, output/) was used as a workaround for a Jasmine
+# NPE in sequence comparison. Once the inputs are pre-filtered to
+# SV-sized records the NPE no longer fires, so the current pipeline runs
+# Jasmine with its normal sequence-aware merging. The symbolic-pipeline
+# scripts and output/ tree are retained for comparison.