65da29c9d74d4dd832ab7f16899ad3b209b92da4
max
  Wed May 6 08:43:57 2026 -0700
varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track

vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were
hive-only); the build is now reproducible from a fresh kent checkout.

Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642):

- normalize_consequence(): bcftools csq emits "&"-joined compound terms
like "stop_gained&frameshift" which exact-match-failed the old 8-bucket
consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a
single record can match multiple buckets, and appends ",others" to any
token list with no named-filter token. Trackdb gains 4 buckets (3' UTR,
5' UTR, Non-coding, Other) and switches to filterType.consequence
multipleListOr.

- Source-attribution bug: the old check only inspected the unified AC/AF
slot. AllOfUs ships only per-population fields ("." in the unified
slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M
rows in the previous bigBed had an empty "sources" column. Fix scans
per-population slots before declaring "no data".

- parse_bcsq() returns "" instead of "." for aaChange/dnaChange on
non-coding variants, so the mouseOver and detail page render a clean
blank line.

- maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as
"0.000003" instead of "3.31347e-06".

- autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed
filename; required for hgIntegrator wiring).

NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv,
files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the
trackDb filter UI. NPM individual subtrack stays tableBrowser off
(license); folded into varFreqsAll same as finngen / kova / mgrb /
swefreq / tishkoff180.

varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/
bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step
completes.

refs #36642

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/varFreqs.ra src/hg/makeDb/trackDb/human/varFreqs.ra
index 9ed3c9f546f..7b805e500b9 100644
--- src/hg/makeDb/trackDb/human/varFreqs.ra
+++ src/hg/makeDb/trackDb/human/varFreqs.ra
@@ -8,34 +8,35 @@
 
         track varFreqsAll
         shortLabel All Databases Combined
         longLabel Variant Frequencies: All Databases Combined with Consequence Annotations
         type bigBed 9 +
         parent varFreqs on
         bigDataUrl /gbdb/$D/varFreqs/varFreqsAll.bb
         visibility pack
         itemRgb on
         maxWindowToDraw 5000000
         priority 0.1
         mouseOver <b>Var:</b> $name<br><b>AA change:</b> $aaChange<br><b>Var type:</b> $varType<br><b>Conseq:</b> $consequence<br><b>Max AF:</b> $maxAF<br><b>Total AC:</b> $totalAC<br><b>Sources:</b> $sources
         # Variant type and consequence filters
         filterValues.varType SNV|SNV,INS|Insertion,DEL|Deletion,MNV|MNV
         filterLabel.varType Variant Type
-        filterValues.consequence missense|Missense,synonymous|Synonymous,stop_gained|Stop Gained,frameshift|Frameshift,splice_donor|Splice Donor,splice_acceptor|Splice Acceptor,intron|Intron,.|Intergenic
+        filterValues.consequence missense|Missense,synonymous|Synonymous,stop_gained|Stop Gained,frameshift|Frameshift,splice_donor|Splice Donor,splice_acceptor|Splice Acceptor,intron|Intron,3_prime_utr|3' UTR,5_prime_utr|5' UTR,non_coding|Non-coding,.|Intergenic,others|Other
+        filterType.consequence multipleListOr
         filterLabel.consequence Consequence
         # Source database filter
-        filterValues.sources AllOfUs|AllOfUs,SPARK|SPARK WES,SFARI_WGS|SFARI WGS,GenomeAsia|GenomeAsia SNVs,GenomeAsiaIndel|GenomeAsia Indels,KOVA|KOVA Korea,ToMMo|ToMMo Japan,IndiGen|IndiGenomes India,FinnGen|FinnGen Finland,Saudi|Saudi,SweGen|SweGen Sweden,TOPMed|TOPMed,ABraOM|ABraOM Brazil,ALFA|ALFA,MGRB|MGRB Australia,HRC|HRC,MexBB|Mexico Biobank,SGDP|SGDP,HGDP1kG|gnomAD HGDP+1kG,GREGoR|GREGoR,SCHEMA|SCHEMA,GA4K|GA4K PacBio LR,CoLoRSdb|CoLoRSdb PacBio LR,SVatalog|SVatalog 101 10XG SR,Tishkoff180|Tishkoff 180 African WGS
+        filterValues.sources AllOfUs|AllOfUs,SPARK|SPARK WES,SFARI_WGS|SFARI WGS,GenomeAsia|GenomeAsia SNVs,GenomeAsiaIndel|GenomeAsia Indels,KOVA|KOVA Korea,ToMMo|ToMMo Japan,IndiGen|IndiGenomes India,FinnGen|FinnGen Finland,Saudi|Saudi,SweGen|SweGen Sweden,TOPMed|TOPMed,ABraOM|ABraOM Brazil,ALFA|ALFA,MGRB|MGRB Australia,HRC|HRC,MexBB|Mexico Biobank,SGDP|SGDP,HGDP1kG|gnomAD HGDP+1kG,GREGoR|GREGoR,SCHEMA|SCHEMA,GA4K|GA4K PacBio LR,CoLoRSdb|CoLoRSdb PacBio LR,SVatalog|SVatalog 101 10XG SR,Tishkoff180|Tishkoff 180 African WGS,NPM|NPM Singapore
         filterType.sources multipleListOr
         filterLabel.sources Source Database
         # Length filters
         filterByRange.refLen on
         filterLabel.refLen Reference Length
         filterByRange.altLen on
         filterLabel.altLen Alternate Length
         filterByRange.varLen on
         filterLabel.varLen Length Change
         # Max AF filter
         filterByRange.maxAF on
         filterLabel.maxAF Max Allele Frequency
         filterLimits.maxAF 0:1
         # Total AC filter
         filterByRange.totalAC on
@@ -79,30 +80,32 @@
         filterLabel.SGDPAF SGDP AF
         filterByRange.HGDP1kGAF on
         filterLabel.HGDP1kGAF gnomAD HGDP+1kG AF (4k cohort)
         filterByRange.GREGoRAF on
         filterLabel.GREGoRAF GREGoR AF
         filterByRange.SCHEMAAF on
         filterLabel.SCHEMAAF SCHEMA AF
         filterByRange.GA4KAF on
         filterLabel.GA4KAF GA4K PacBio LR AF
         filterByRange.CoLoRSdbAF on
         filterLabel.CoLoRSdbAF CoLoRSdb PacBio LR AF
         filterByRange.SVatalogAF on
         filterLabel.SVatalogAF SVatalog 101 10XG SR AF
         filterByRange.Tishkoff180AF on
         filterLabel.Tishkoff180AF Tishkoff 180 African WGS AF
+        filterByRange.NPMAF on
+        filterLabel.NPMAF NPM Singapore AF
         # Per-database AC filters
         filterByRange.AllOfUsAC on
         filterLabel.AllOfUsAC AllOfUs AC
         filterByRange.SPARKAC on
         filterLabel.SPARKAC SPARK WES AC
         filterByRange.SFARI_WGSAC on
         filterLabel.SFARI_WGSAC SFARI WGS AC
         filterByRange.GenomeAsiaAC on
         filterLabel.GenomeAsiaAC GenomeAsia SNVs AC
         filterByRange.GenomeAsiaIndelAC on
         filterLabel.GenomeAsiaIndelAC GenomeAsia Indels AC
         filterByRange.KOVAAC on
         filterLabel.KOVAAC KOVA Korea AC
         filterByRange.ToMMoAC on
         filterLabel.ToMMoAC ToMMo Japan AC
@@ -130,30 +133,32 @@
         filterLabel.SGDPAC SGDP AC
         filterByRange.HGDP1kGAC on
         filterLabel.HGDP1kGAC gnomAD HGDP+1kG AC (4k cohort)
         filterByRange.GREGoRAC on
         filterLabel.GREGoRAC GREGoR AC
         filterByRange.SCHEMAAC on
         filterLabel.SCHEMAAC SCHEMA AC
         filterByRange.GA4KAC on
         filterLabel.GA4KAC GA4K PacBio LR AC
         filterByRange.CoLoRSdbAC on
         filterLabel.CoLoRSdbAC CoLoRSdb PacBio LR AC
         filterByRange.SVatalogAC on
         filterLabel.SVatalogAC SVatalog 101 10XG SR AC
         filterByRange.Tishkoff180AC on
         filterLabel.Tishkoff180AC Tishkoff 180 African WGS AC
+        filterByRange.NPMAC on
+        filterLabel.NPMAC NPM Singapore AC
         # Population-specific AF filters
         # AllOfUs local-ancestry populations
         # NB: these are local-ancestry-stratified frequencies (per-position, per-haplotype-class),
         # NOT the AllOfUs paper's global Rye ancestry categories. See varFreqs.html for details.
         filterByRange.AllOfUsAF_AFR on
         filterLabel.AllOfUsAF_AFR AllOfUs African AF (local ancestry)
         filterByRange.AllOfUsAF_AMR on
         filterLabel.AllOfUsAF_AMR AllOfUs Indigenous American AF (local ancestry)
         filterByRange.AllOfUsAF_EAS on
         filterLabel.AllOfUsAF_EAS AllOfUs East Asian AF (local ancestry)
         filterByRange.AllOfUsAF_EUR on
         filterLabel.AllOfUsAF_EUR AllOfUs European AF (local ancestry)
         filterByRange.AllOfUsAF_OCE on
         filterLabel.AllOfUsAF_OCE AllOfUs Oceanian AF (local ancestry)
         filterByRange.AllOfUsAF_SAS on
@@ -243,30 +248,43 @@
         filterByRange.HGDP1kGAC_sas on
         filterLabel.HGDP1kGAC_sas gnomAD v3.1.2 South Asian AC (full release)
         # GREGoR populations
         filterByRange.GREGoRAF_AFF on
         filterLabel.GREGoRAF_AFF GREGoR Affected AF
         filterByRange.GREGoRAF_UNA on
         filterLabel.GREGoRAF_UNA GREGoR Unaffected AF
         filterByRange.GREGoRAF_UNK on
         filterLabel.GREGoRAF_UNK GREGoR Unknown AF
         filterByRange.GREGoRAC_AFF on
         filterLabel.GREGoRAC_AFF GREGoR Affected AC
         filterByRange.GREGoRAC_UNA on
         filterLabel.GREGoRAC_UNA GREGoR Unaffected AC
         filterByRange.GREGoRAC_UNK on
         filterLabel.GREGoRAC_UNK GREGoR Unknown AC
+        # NPM Singapore ancestry groups
+        filterByRange.NPMAF_Chinese on
+        filterLabel.NPMAF_Chinese NPM Singapore Chinese AF
+        filterByRange.NPMAF_Malay on
+        filterLabel.NPMAF_Malay NPM Singapore Malay AF
+        filterByRange.NPMAF_Indian on
+        filterLabel.NPMAF_Indian NPM Singapore Indian AF
+        filterByRange.NPMAC_Chinese on
+        filterLabel.NPMAC_Chinese NPM Singapore Chinese AC
+        filterByRange.NPMAC_Malay on
+        filterLabel.NPMAC_Malay NPM Singapore Malay AC
+        filterByRange.NPMAC_Indian on
+        filterLabel.NPMAC_Indian NPM Singapore Indian AC
         skipEmptyFields on
        
         track allofus
         shortLabel AllOfUs v7 245k WGS
         longLabel Variant Frequencies: AllOfUs v7 - 245k WGS, local-ancestry-stratified, AC>=20
         type vcfTabix
         parent varFreqs on
         bigDataUrl /gbdb/$D/varFreqs/allofus/allOfUs.locAncFreq.vcf.gz
         dataVersion V7
         visibility dense
         priority 0.5
 
         #track me
         #shortLabel Regeneron Million Exomes 983k WES
         #longLabel Variant Frequencies: Regeneron One Million Exomes (ME) Project - 983k WGS