1732661494ece5e645a9522f15a0f5922b035d1a max Wed Apr 22 08:57:11 2026 -0700 colorsDbSv: rebuild from pbsv+Jasmine source VCFs with richer AS Rebuild the CoLoRSdb SV bigBeds for hg38 and hs1 from the upstream pbsv+Jasmine VCFs that the CoLoRSdb project distributes directly. The previous bigBed stored AF as a string (breaking the numeric filter) and lacked insLen (causing a "filter on field insLen not in AS file" error under the supertrack-level filter). The new build: - stores AF as a float - adds a derived insLen column (alt-ref length delta for INS, 0 otherwise) so the shared lrSv insLen filter applies - keeps every INFO field from the source (SVTYPE, SVLEN, END, AC, AN, NS, AC_Hom, AC_Het, AC_Hemi, AF, HWE, ExcHet, nhomalt) plus REF/ALT - uses the canonical svName(TYPE, featLen, AC) label via lrSvCommon Record counts match the source VCFs: 426,239 on hg38 (59 MB) and 839,714 on hs1 (87 MB). /gbdb symlinks unchanged. The trackDb colorsDbSv stanza is updated to reference the new AS field names (acHom/acHet/acHemi, AF, AN) and to add the insLen filter. Also fixes a nearby `version 1.1` -> `dataVersion 1.1` typo in lrSv1kgOnt that was failing the tagTypes check. refs #36258 diff --git src/hg/makeDb/doc/hg38/lrSv.txt src/hg/makeDb/doc/hg38/lrSv.txt index 8f373210431..914cb1d001b 100644 --- src/hg/makeDb/doc/hg38/lrSv.txt +++ src/hg/makeDb/doc/hg38/lrSv.txt @@ -449,15 +449,36 @@ # 2026-04-21 Claude max # # cpc1Sv rebuilt as CPC-only (58 samples). The upstream VCF contains # 105 samples (58 CPC + 47 HPRC Phase 1). For this version we # identify the 58 CPC columns by sample name prefix (HIFI032* or # RY*), recompute AC/AN/NS from those GT columns only, and drop # snarls where no CPC sample carries any alt. HPRC-specific SVs are # therefore excluded; the HPRC contribution is already represented # in the HPRC SV tracks elsewhere in this lrSv supertrack. # # Pipeline (same build script, updated Python converter): cd /hive/data/genomes/hg38/bed/lrSv/cpc1 bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvCpc1Build.sh # hs1 sites: 46,092 (down from 97,205 combined) # hg38 lifted: 36,030 (down from 81,261); 10,062 unmapped + +########## +# 2026-04-22 Claude max +# +# colorsDbSv: rebuild both hg38 and hs1 bigBeds from the upstream +# pbsv+Jasmine VCFs now that the CoLoRSdb project redistributes them +# directly (placed in /hive/data/genomes/hg38/bed/lrSv/colorsDb/). +# The previous bigBed came from an older build and declared `af` as a +# string; the new build uses a checked-in converter, stores AF as a +# float so the numeric filter works, and adds a derived `insLen` +# column so the shared lrSv supertrack-level filter.insLen does not +# error for this subtrack. + +cd /hive/data/genomes/hg38/bed/lrSv/colorsDb +# Upstream VCFs (same pbsv.jasmine release, one per reference path): +# CoLoRSdb.GRCh38.v1.2.0.pbsv.jasmine.vcf.gz (hg38, 426,239 SVs) +# CoLoRSdb.CHM13.v1.2.0.pbsv.jasmine.vcf.gz (hs1, 839,714 SVs) +bash ~/kent/src/hg/makeDb/scripts/lrSv/lrSvColorsDbSvBuild.sh +# hg38: 59 MB, 192,534 DEL + 232,973 INS + 732 INV +# hs1 : 87 MB (more variants due to T2T-added regions) +# Existing /gbdb symlinks (sv.hg38.bb, sv.hs1.bb) are unchanged.