65da29c9d74d4dd832ab7f16899ad3b209b92da4
max
  Wed May 6 08:43:57 2026 -0700
varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track

vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were
hive-only); the build is now reproducible from a fresh kent checkout.

Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642):

- normalize_consequence(): bcftools csq emits "&"-joined compound terms
like "stop_gained&frameshift" which exact-match-failed the old 8-bucket
consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a
single record can match multiple buckets, and appends ",others" to any
token list with no named-filter token. Trackdb gains 4 buckets (3' UTR,
5' UTR, Non-coding, Other) and switches to filterType.consequence
multipleListOr.

- Source-attribution bug: the old check only inspected the unified AC/AF
slot. AllOfUs ships only per-population fields ("." in the unified
slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M
rows in the previous bigBed had an empty "sources" column. Fix scans
per-population slots before declaring "no data".

- parse_bcsq() returns "" instead of "." for aaChange/dnaChange on
non-coding variants, so the mouseOver and detail page render a clean
blank line.

- maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as
"0.000003" instead of "3.31347e-06".

- autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed
filename; required for hgIntegrator wiring).

NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv,
files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the
trackDb filter UI. NPM individual subtrack stays tableBrowser off
(license); folded into varFreqsAll same as finngen / kova / mgrb /
swefreq / tishkoff180.

varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/
bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step
completes.

refs #36642

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv
index 86333218b55..b4e2d61e7db 100644
--- src/hg/makeDb/scripts/varFreqs/populations.tsv
+++ src/hg/makeDb/scripts/varFreqs/populations.tsv
@@ -1,32 +1,36 @@
 # Population breakdown configuration for varFreqsAll combined track
 # db_key	pop_key	pop_name	ac_field	af_field
 # AllOfUs local ancestry populations
 AllOfUs	AFR	African	AC_AFR	AF_AFR
 AllOfUs	AMR	Indigenous American	AC_AMR	AF_AMR
 AllOfUs	EAS	East Asian	AC_EAS	AF_EAS
 AllOfUs	EUR	European	AC_EUR	AF_EUR
 AllOfUs	OCE	Oceanian	AC_OCE	AF_OCE
 AllOfUs	SAS	South Asian	AC_SAS	AF_SAS
 # GenomeAsia populations (7 groups in source VCF)
 GenomeAsia	NEA	Northeast Asian	AC_NEA	AF_NEA
 GenomeAsia	SEA	Southeast Asian	AC_SEA	AF_SEA
 GenomeAsia	SAS	South Asian	AC_SAS	AF_SAS
 GenomeAsia	OCE	Oceanian	AC_OCE	AF_OCE
 GenomeAsia	AMR	American	AC_AMR	AF_AMR
 GenomeAsia	AFR	African	AC_AFR	AF_AFR
 GenomeAsia	WER	Western European Ref	AC_WER	AF_WER
 # gnomAD HGDP+1kG continental groups
 HGDP1kG	afr	African	gnomad_AC_afr	gnomad_AF_afr
 HGDP1kG	ami	Amish	gnomad_AC_ami	gnomad_AF_ami
 HGDP1kG	amr	Latino	gnomad_AC_amr	gnomad_AF_amr
 HGDP1kG	asj	Ashkenazi Jewish	gnomad_AC_asj	gnomad_AF_asj
 HGDP1kG	eas	East Asian	gnomad_AC_eas	gnomad_AF_eas
 HGDP1kG	fin	Finnish	gnomad_AC_fin	gnomad_AF_fin
 HGDP1kG	mid	Middle Eastern	gnomad_AC_mid	gnomad_AF_mid
 HGDP1kG	nfe	Non-Finnish European	gnomad_AC_nfe	gnomad_AF_nfe
 HGDP1kG	oth	Other	gnomad_AC_oth	gnomad_AF_oth
 HGDP1kG	sas	South Asian	gnomad_AC_sas	gnomad_AF_sas
 # GREGoR affected/unaffected breakdown
 GREGoR	AFF	Affected	AC_AFFECTED	.
 GREGoR	UNA	Unaffected	AC_UNAFFECTED	.
 GREGoR	UNK	Unknown	AC_UNKNOWN	.
+# NPM Singapore (SG10K_Health) ancestry groups
+NPM	Chinese	Singapore Chinese	AC_SgChinese	AF_SgChinese
+NPM	Malay	Singapore Malay	AC_SgMalay	AF_SgMalay
+NPM	Indian	Singapore Indian	AC_SgIndian	AF_SgIndian