1732661494ece5e645a9522f15a0f5922b035d1a
max
  Wed Apr 22 08:57:11 2026 -0700
colorsDbSv: rebuild from pbsv+Jasmine source VCFs with richer AS

Rebuild the CoLoRSdb SV bigBeds for hg38 and hs1 from the upstream
pbsv+Jasmine VCFs that the CoLoRSdb project distributes directly.
The previous bigBed stored AF as a string (breaking the numeric
filter) and lacked insLen (causing a "filter on field insLen not in
AS file" error under the supertrack-level filter). The new build:

- stores AF as a float
- adds a derived insLen column (alt-ref length delta for INS, 0
otherwise) so the shared lrSv insLen filter applies
- keeps every INFO field from the source (SVTYPE, SVLEN, END, AC,
AN, NS, AC_Hom, AC_Het, AC_Hemi, AF, HWE, ExcHet, nhomalt) plus
REF/ALT
- uses the canonical svName(TYPE, featLen, AC) label via lrSvCommon

Record counts match the source VCFs: 426,239 on hg38 (59 MB) and
839,714 on hs1 (87 MB). /gbdb symlinks unchanged. The trackDb
colorsDbSv stanza is updated to reference the new AS field names
(acHom/acHet/acHemi, AF, AN) and to add the insLen filter. Also
fixes a nearby `version 1.1` -> `dataVersion 1.1` typo in
lrSv1kgOnt that was failing the tagTypes check.

refs #36258

diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt
index 8f373210431..914cb1d001b 100644
--- src/hg/makeDb/doc/hg38/lrSv.txt
+++ src/hg/makeDb/doc/hg38/lrSv.txt
@@ -449,15 +449,36 @@
 # 2026-04-21 Claude max
 #
 # cpc1Sv rebuilt as CPC-only (58 samples). The upstream VCF contains
 # 105 samples (58 CPC + 47 HPRC Phase 1). For this version we
 # identify the 58 CPC columns by sample name prefix (HIFI032* or
 # RY*), recompute AC/AN/NS from those GT columns only, and drop
 # snarls where no CPC sample carries any alt. HPRC-specific SVs are
 # therefore excluded; the HPRC contribution is already represented
 # in the HPRC SV tracks elsewhere in this lrSv supertrack.
 #
 # Pipeline (same build script, updated Python converter):
 cd /hive/data/genomes/hg38/bed/lrSv/cpc1
 bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvCpc1Build.sh
 #   hs1 sites: 46,092 (down from 97,205 combined)
 #   hg38 lifted: 36,030 (down from 81,261); 10,062 unmapped
+
+##########
+# 2026-04-22 Claude max
+#
+# colorsDbSv: rebuild both hg38 and hs1 bigBeds from the upstream
+# pbsv+Jasmine VCFs now that the CoLoRSdb project redistributes them
+# directly (placed in /hive/data/genomes/hg38/bed/lrSv/colorsDb/).
+# The previous bigBed came from an older build and declared `af` as a
+# string; the new build uses a checked-in converter, stores AF as a
+# float so the numeric filter works, and adds a derived `insLen`
+# column so the shared lrSv supertrack-level filter.insLen does not
+# error for this subtrack.
+
+cd /hive/data/genomes/hg38/bed/lrSv/colorsDb
+# Upstream VCFs (same pbsv.jasmine release, one per reference path):
+#   CoLoRSdb.GRCh38.v1.2.0.pbsv.jasmine.vcf.gz  (hg38, 426,239 SVs)
+#   CoLoRSdb.CHM13.v1.2.0.pbsv.jasmine.vcf.gz   (hs1,  839,714 SVs)
+bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvColorsDbSvBuild.sh
+# hg38: 59 MB, 192,534 DEL + 232,973 INS + 732 INV
+# hs1 : 87 MB (more variants due to T2T-added regions)
+# Existing /gbdb symlinks (sv.hg38.bb, sv.hs1.bb) are unchanged.