366afa4a74c46ec6fb2b667a2902a873feec40cf
max
  Mon Apr 20 23:00:05 2026 -0700
varFreqsAll: rebuild combined bigBed to include GA4K and CoLoRSdb

Regenerate the All Databases Combined track with the two long-read
PacBio subtracks (GA4K 552 samples and CoLoRSdb v1.2.0 1,027 samples)
that were added to varFreqs since the March build. Source count rises
from 21 to 23 databases; final bigBed is 37.7 GB with 1.17B records
and 113 fields. Updates varFreqs.ra filterValues.sources and per-
database AF/AC filters for the two new sources, and databases.tsv
+ varFreqs.txt (build notes).

refs #36642

diff --git src/hg/makeDb/doc/hg38/varFreqs.txt src/hg/makeDb/doc/hg38/varFreqs.txt
index 0dcc1232735..990cf38c473 100644
--- src/hg/makeDb/doc/hg38/varFreqs.txt
+++ src/hg/makeDb/doc/hg38/varFreqs.txt
@@ -250,15 +250,40 @@
 # /hive/data/genomes/hg38/bed/lrSv/colorsDb/ (placed there when the
 # CoLoRSdb SV track was first built under lrSv). We just add VCF
 # symlinks under each assembly's varFreqs directory using a consistent
 # filename so the shared trackDb stanza can use $D.
 
 mkdir -p /gbdb/hg38/varFreqs/colorsDb /gbdb/hs1/varFreqs/colorsDb
 ln -sf /hive/data/genomes/hg38/bed/lrSv/colorsDb/CoLoRSdb.GRCh38.v1.2.0.deepvariant.glnexus.vcf.gz     /gbdb/hg38/varFreqs/colorsDb/colorsDbSnv.vcf.gz
 ln -sf /hive/data/genomes/hg38/bed/lrSv/colorsDb/CoLoRSdb.GRCh38.v1.2.0.deepvariant.glnexus.vcf.gz.tbi /gbdb/hg38/varFreqs/colorsDb/colorsDbSnv.vcf.gz.tbi
 ln -sf /hive/data/genomes/hg38/bed/lrSv/colorsDb/CoLoRSdb.CHM13.v1.2.0.deepvariant.glnexus.vcf.gz      /gbdb/hs1/varFreqs/colorsDb/colorsDbSnv.vcf.gz
 ln -sf /hive/data/genomes/hg38/bed/lrSv/colorsDb/CoLoRSdb.CHM13.v1.2.0.deepvariant.glnexus.vcf.gz.tbi  /gbdb/hs1/varFreqs/colorsDb/colorsDbSnv.vcf.gz.tbi
 
 # The varFreqs.ra trackDb file is already in human/ (shared for both
 # hg38 and hs1 via the human/trackDb.ra include), so no move was needed.
 # Only colorsDbSnv is expected to render on hs1 - the other varFreqs
 # subtracks have hg38-only data and will silently show nothing there.
+
+##########
+# 2026-04-20 Claude max
+#
+# Rebuilt varFreqsAll combined bigBed to include GA4K and CoLoRSdb
+# long-read PacBio subtracks that were added to varFreqs since the
+# last build (Mar 20).
+#
+# Steps (in /hive/data/genomes/hg38/bed/varFreqs/all):
+# 1. Added GA4K and CoLoRSdb rows to
+#      ~/kent/src/hg/makeDb/scripts/varFreqs/databases.tsv
+#    and appended their /gbdb paths to files.txt.
+# 2. Deleted merged.vcf.gz and merged.annotated.vcf.gz to force a full
+#    merge + bcftools csq re-annotation (per-sample normalized VCFs
+#    from the previous run were kept; only the two new VCFs were
+#    normalized in Step 4).
+# 3. Ran ./mergeAndAnnotate.sh (~55 min: 5 min per-file, ~15 min merge,
+#    ~35 min csq).
+# 4. Ran ./vcfToBigBed.py --output-prefix varFreqsAll --threads 8
+#    (Phase 1 pre-extract ~90 min, Phase 2 chrom BED build ~30 min).
+# 5. bedToBigBed on 275 GB sorted BED (~2 h) to produce 37.7 GB
+#    varFreqsAll.bb with 1,165,666,478 records and 113 fields.
+# 6. Updated varFreqs.ra filterValues.sources and added
+#    filterByRange.GA4KAF/AC and filterByRange.CoLoRSdbAF/AC.
+# Existing /gbdb/hg38/varFreqs/varFreqsAll.bb symlink was preserved.