65da29c9d74d4dd832ab7f16899ad3b209b92da4
max
  Wed May 6 08:43:57 2026 -0700
varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track

vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were
hive-only); the build is now reproducible from a fresh kent checkout.

Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642):

- normalize_consequence(): bcftools csq emits "&"-joined compound terms
like "stop_gained&frameshift" which exact-match-failed the old 8-bucket
consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a
single record can match multiple buckets, and appends ",others" to any
token list with no named-filter token. Trackdb gains 4 buckets (3' UTR,
5' UTR, Non-coding, Other) and switches to filterType.consequence
multipleListOr.

- Source-attribution bug: the old check only inspected the unified AC/AF
slot. AllOfUs ships only per-population fields ("." in the unified
slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M
rows in the previous bigBed had an empty "sources" column. Fix scans
per-population slots before declaring "no data".

- parse_bcsq() returns "" instead of "." for aaChange/dnaChange on
non-coding variants, so the mouseOver and detail page render a clean
blank line.

- maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as
"0.000003" instead of "3.31347e-06".

- autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed
filename; required for hgIntegrator wiring).

NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv,
files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the
trackDb filter UI. NPM individual subtrack stays tableBrowser off
(license); folded into varFreqsAll same as finngen / kova / mgrb /
swefreq / tishkoff180.

varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/
bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step
completes.

refs #36642

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/scripts/varFreqs/databases.tsv src/hg/makeDb/scripts/varFreqs/databases.tsv
index 142589ff57b..66ec3032f7b 100644
--- src/hg/makeDb/scripts/varFreqs/databases.tsv
+++ src/hg/makeDb/scripts/varFreqs/databases.tsv
@@ -1,23 +1,24 @@
 # Database configuration for varFreqsAll combined track
 # key	name	vcf	ac_field	af_field
 # Use "." for fields that don't exist in the VCF
 AllOfUs	AllOfUs	/gbdb/hg38/varFreqs/allofus/allOfUs.locAncFreq.vcf.gz	.	.
 SPARK	SPARK WES	/gbdb/hg38/varFreqs/sfari/SPARK.iWES_v3.2024_08.deepvariant.norm.vcf.gz	AC	AF
 SFARI_WGS	SFARI WGS	/gbdb/hg38/varFreqs/sfari/wgs_12519_genome.deepvariant.norm.vcf.gz	AC	AF
 GenomeAsia	GenomeAsia SNVs	/gbdb/hg38/varFreqs/ga100k/ga100k.subst.vcf.gz	AC	AF
 GenomeAsiaIndel	GenomeAsia Indels	/gbdb/hg38/varFreqs/ga100k/ga100k.indels.vcf.gz	AC	AF
+NPM	NPM Singapore	/gbdb/hg38/varFreqs/npm/SG10K_Health_r5.3.2.sites.vcf.bgz	AC	AF
 KOVA	KOVA Korea	/gbdb/hg38/varFreqs/kova/kova.v7.vcf.gz	AC	AF
 ToMMo	ToMMo Japan	/gbdb/hg38/varFreqs/tommo61kjpn/tommo-61kjpn-20250616-GRCh38-snvindel-af-autosome.vcf.gz	AC	AF
 IndiGen	IndiGenomes India	/gbdb/hg38/varFreqs/indigenomes/IndiGenomes_Variants.vcf.gz	AC	AF
 FinnGen	FinnGen Finland	/gbdb/hg38/varFreqs/finngen/finnge_R12_annotated_variants_v1.vcf.gz	AC	AF
 Saudi	Saudi	/gbdb/hg38/varFreqs/saudi/saudi.vcf.gz	AC	AF
 SweGen	SweGen Sweden	/gbdb/hg38/varFreqs/swefreq/swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz	AC	AF
 TOPMed	TOPMed	/gbdb/hg38/varFreqs/topmed/topmed10.vcf.gz	AC	AF
 ABraOM	ABraOM Brazil	/gbdb/hg38/varFreqs/abraom/abraom.vcf.gz	.	AF
 ALFA	ALFA	/gbdb/hg38/varFreqs/alfa/ALFA.vcf.gz	.	AF_GLB
 MGRB	MGRB Australia	/gbdb/hg38/varFreqs/mgrb/MGRB.phase3.GRCh38.norm.vcf.gz	AC	AF
 HRC	HRC	/gbdb/hg38/varFreqs/hrc/hrc.vcf.gz	AC	AF
 MexBB	Mexico Biobank	/gbdb/hg38/varFreqs/mxb/mxb.freq.vcf.gz	AC	AF
 SGDP	SGDP	/gbdb/hg38/varFreqs/sgdpFreq/sgdp.freq.vcf.gz	AC	AF
 HGDP1kG	gnomAD HGDP+1kG	/gbdb/hg38/varFreqs/hgdp1kFreq/hgdp1k.freq.vcf.gz	AC	AF
 GREGoR	GREGoR	/gbdb/hg38/varFreqs/gregor/gregor.vcf.gz	AC	AF