65da29c9d74d4dd832ab7f16899ad3b209b92da4 max Wed May 6 08:43:57 2026 -0700 varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were hive-only); the build is now reproducible from a fresh kent checkout. Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642): - normalize_consequence(): bcftools csq emits "&"-joined compound terms like "stop_gained&frameshift" which exact-match-failed the old 8-bucket consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a single record can match multiple buckets, and appends ",others" to any token list with no named-filter token. Trackdb gains 4 buckets (3' UTR, 5' UTR, Non-coding, Other) and switches to filterType.consequence multipleListOr. - Source-attribution bug: the old check only inspected the unified AC/AF slot. AllOfUs ships only per-population fields ("." in the unified slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M rows in the previous bigBed had an empty "sources" column. Fix scans per-population slots before declaring "no data". - parse_bcsq() returns "" instead of "." for aaChange/dnaChange on non-coding variants, so the mouseOver and detail page render a clean blank line. - maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as "0.000003" instead of "3.31347e-06". - autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed filename; required for hgIntegrator wiring). NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv, files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the trackDb filter UI. NPM individual subtrack stays tableBrowser off (license); folded into varFreqsAll same as finngen / kova / mgrb / swefreq / tishkoff180. varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/ bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step completes. refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/varFreqs.ra src/hg/makeDb/trackDb/human/varFreqs.ra index 9ed3c9f546f..7b805e500b9 100644 --- src/hg/makeDb/trackDb/human/varFreqs.ra +++ src/hg/makeDb/trackDb/human/varFreqs.ra @@ -8,34 +8,35 @@ track varFreqsAll shortLabel All Databases Combined longLabel Variant Frequencies: All Databases Combined with Consequence Annotations type bigBed 9 + parent varFreqs on bigDataUrl /gbdb/$D/varFreqs/varFreqsAll.bb visibility pack itemRgb on maxWindowToDraw 5000000 priority 0.1 mouseOver <b>Var:</b> $name<br><b>AA change:</b> $aaChange<br><b>Var type:</b> $varType<br><b>Conseq:</b> $consequence<br><b>Max AF:</b> $maxAF<br><b>Total AC:</b> $totalAC<br><b>Sources:</b> $sources # Variant type and consequence filters filterValues.varType SNV|SNV,INS|Insertion,DEL|Deletion,MNV|MNV filterLabel.varType Variant Type - filterValues.consequence missense|Missense,synonymous|Synonymous,stop_gained|Stop Gained,frameshift|Frameshift,splice_donor|Splice Donor,splice_acceptor|Splice Acceptor,intron|Intron,.|Intergenic + filterValues.consequence missense|Missense,synonymous|Synonymous,stop_gained|Stop Gained,frameshift|Frameshift,splice_donor|Splice Donor,splice_acceptor|Splice Acceptor,intron|Intron,3_prime_utr|3' UTR,5_prime_utr|5' UTR,non_coding|Non-coding,.|Intergenic,others|Other + filterType.consequence multipleListOr filterLabel.consequence Consequence # Source database filter - filterValues.sources AllOfUs|AllOfUs,SPARK|SPARK WES,SFARI_WGS|SFARI WGS,GenomeAsia|GenomeAsia SNVs,GenomeAsiaIndel|GenomeAsia Indels,KOVA|KOVA Korea,ToMMo|ToMMo Japan,IndiGen|IndiGenomes India,FinnGen|FinnGen Finland,Saudi|Saudi,SweGen|SweGen Sweden,TOPMed|TOPMed,ABraOM|ABraOM Brazil,ALFA|ALFA,MGRB|MGRB Australia,HRC|HRC,MexBB|Mexico Biobank,SGDP|SGDP,HGDP1kG|gnomAD HGDP+1kG,GREGoR|GREGoR,SCHEMA|SCHEMA,GA4K|GA4K PacBio LR,CoLoRSdb|CoLoRSdb PacBio LR,SVatalog|SVatalog 101 10XG SR,Tishkoff180|Tishkoff 180 African WGS + filterValues.sources AllOfUs|AllOfUs,SPARK|SPARK WES,SFARI_WGS|SFARI WGS,GenomeAsia|GenomeAsia SNVs,GenomeAsiaIndel|GenomeAsia Indels,KOVA|KOVA Korea,ToMMo|ToMMo Japan,IndiGen|IndiGenomes India,FinnGen|FinnGen Finland,Saudi|Saudi,SweGen|SweGen Sweden,TOPMed|TOPMed,ABraOM|ABraOM Brazil,ALFA|ALFA,MGRB|MGRB Australia,HRC|HRC,MexBB|Mexico Biobank,SGDP|SGDP,HGDP1kG|gnomAD HGDP+1kG,GREGoR|GREGoR,SCHEMA|SCHEMA,GA4K|GA4K PacBio LR,CoLoRSdb|CoLoRSdb PacBio LR,SVatalog|SVatalog 101 10XG SR,Tishkoff180|Tishkoff 180 African WGS,NPM|NPM Singapore filterType.sources multipleListOr filterLabel.sources Source Database # Length filters filterByRange.refLen on filterLabel.refLen Reference Length filterByRange.altLen on filterLabel.altLen Alternate Length filterByRange.varLen on filterLabel.varLen Length Change # Max AF filter filterByRange.maxAF on filterLabel.maxAF Max Allele Frequency filterLimits.maxAF 0:1 # Total AC filter filterByRange.totalAC on @@ -79,30 +80,32 @@ filterLabel.SGDPAF SGDP AF filterByRange.HGDP1kGAF on filterLabel.HGDP1kGAF gnomAD HGDP+1kG AF (4k cohort) filterByRange.GREGoRAF on filterLabel.GREGoRAF GREGoR AF filterByRange.SCHEMAAF on filterLabel.SCHEMAAF SCHEMA AF filterByRange.GA4KAF on filterLabel.GA4KAF GA4K PacBio LR AF filterByRange.CoLoRSdbAF on filterLabel.CoLoRSdbAF CoLoRSdb PacBio LR AF filterByRange.SVatalogAF on filterLabel.SVatalogAF SVatalog 101 10XG SR AF filterByRange.Tishkoff180AF on filterLabel.Tishkoff180AF Tishkoff 180 African WGS AF + filterByRange.NPMAF on + filterLabel.NPMAF NPM Singapore AF # Per-database AC filters filterByRange.AllOfUsAC on filterLabel.AllOfUsAC AllOfUs AC filterByRange.SPARKAC on filterLabel.SPARKAC SPARK WES AC filterByRange.SFARI_WGSAC on filterLabel.SFARI_WGSAC SFARI WGS AC filterByRange.GenomeAsiaAC on filterLabel.GenomeAsiaAC GenomeAsia SNVs AC filterByRange.GenomeAsiaIndelAC on filterLabel.GenomeAsiaIndelAC GenomeAsia Indels AC filterByRange.KOVAAC on filterLabel.KOVAAC KOVA Korea AC filterByRange.ToMMoAC on filterLabel.ToMMoAC ToMMo Japan AC @@ -130,30 +133,32 @@ filterLabel.SGDPAC SGDP AC filterByRange.HGDP1kGAC on filterLabel.HGDP1kGAC gnomAD HGDP+1kG AC (4k cohort) filterByRange.GREGoRAC on filterLabel.GREGoRAC GREGoR AC filterByRange.SCHEMAAC on filterLabel.SCHEMAAC SCHEMA AC filterByRange.GA4KAC on filterLabel.GA4KAC GA4K PacBio LR AC filterByRange.CoLoRSdbAC on filterLabel.CoLoRSdbAC CoLoRSdb PacBio LR AC filterByRange.SVatalogAC on filterLabel.SVatalogAC SVatalog 101 10XG SR AC filterByRange.Tishkoff180AC on filterLabel.Tishkoff180AC Tishkoff 180 African WGS AC + filterByRange.NPMAC on + filterLabel.NPMAC NPM Singapore AC # Population-specific AF filters # AllOfUs local-ancestry populations # NB: these are local-ancestry-stratified frequencies (per-position, per-haplotype-class), # NOT the AllOfUs paper's global Rye ancestry categories. See varFreqs.html for details. filterByRange.AllOfUsAF_AFR on filterLabel.AllOfUsAF_AFR AllOfUs African AF (local ancestry) filterByRange.AllOfUsAF_AMR on filterLabel.AllOfUsAF_AMR AllOfUs Indigenous American AF (local ancestry) filterByRange.AllOfUsAF_EAS on filterLabel.AllOfUsAF_EAS AllOfUs East Asian AF (local ancestry) filterByRange.AllOfUsAF_EUR on filterLabel.AllOfUsAF_EUR AllOfUs European AF (local ancestry) filterByRange.AllOfUsAF_OCE on filterLabel.AllOfUsAF_OCE AllOfUs Oceanian AF (local ancestry) filterByRange.AllOfUsAF_SAS on @@ -243,30 +248,43 @@ filterByRange.HGDP1kGAC_sas on filterLabel.HGDP1kGAC_sas gnomAD v3.1.2 South Asian AC (full release) # GREGoR populations filterByRange.GREGoRAF_AFF on filterLabel.GREGoRAF_AFF GREGoR Affected AF filterByRange.GREGoRAF_UNA on filterLabel.GREGoRAF_UNA GREGoR Unaffected AF filterByRange.GREGoRAF_UNK on filterLabel.GREGoRAF_UNK GREGoR Unknown AF filterByRange.GREGoRAC_AFF on filterLabel.GREGoRAC_AFF GREGoR Affected AC filterByRange.GREGoRAC_UNA on filterLabel.GREGoRAC_UNA GREGoR Unaffected AC filterByRange.GREGoRAC_UNK on filterLabel.GREGoRAC_UNK GREGoR Unknown AC + # NPM Singapore ancestry groups + filterByRange.NPMAF_Chinese on + filterLabel.NPMAF_Chinese NPM Singapore Chinese AF + filterByRange.NPMAF_Malay on + filterLabel.NPMAF_Malay NPM Singapore Malay AF + filterByRange.NPMAF_Indian on + filterLabel.NPMAF_Indian NPM Singapore Indian AF + filterByRange.NPMAC_Chinese on + filterLabel.NPMAC_Chinese NPM Singapore Chinese AC + filterByRange.NPMAC_Malay on + filterLabel.NPMAC_Malay NPM Singapore Malay AC + filterByRange.NPMAC_Indian on + filterLabel.NPMAC_Indian NPM Singapore Indian AC skipEmptyFields on track allofus shortLabel AllOfUs v7 245k WGS longLabel Variant Frequencies: AllOfUs v7 - 245k WGS, local-ancestry-stratified, AC>=20 type vcfTabix parent varFreqs on bigDataUrl /gbdb/$D/varFreqs/allofus/allOfUs.locAncFreq.vcf.gz dataVersion V7 visibility dense priority 0.5 #track me #shortLabel Regeneron Million Exomes 983k WES #longLabel Variant Frequencies: Regeneron One Million Exomes (ME) Project - 983k WGS