65da29c9d74d4dd832ab7f16899ad3b209b92da4 max Wed May 6 08:43:57 2026 -0700 varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were hive-only); the build is now reproducible from a fresh kent checkout. Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642): - normalize_consequence(): bcftools csq emits "&"-joined compound terms like "stop_gained&frameshift" which exact-match-failed the old 8-bucket consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a single record can match multiple buckets, and appends ",others" to any token list with no named-filter token. Trackdb gains 4 buckets (3' UTR, 5' UTR, Non-coding, Other) and switches to filterType.consequence multipleListOr. - Source-attribution bug: the old check only inspected the unified AC/AF slot. AllOfUs ships only per-population fields ("." in the unified slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M rows in the previous bigBed had an empty "sources" column. Fix scans per-population slots before declaring "no data". - parse_bcsq() returns "" instead of "." for aaChange/dnaChange on non-coding variants, so the mouseOver and detail page render a clean blank line. - maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as "0.000003" instead of "3.31347e-06". - autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed filename; required for hgIntegrator wiring). NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv, files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the trackDb filter UI. NPM individual subtrack stays tableBrowser off (license); folded into varFreqsAll same as finngen / kova / mgrb / swefreq / tishkoff180. varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/ bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step completes. refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv index 86333218b55..b4e2d61e7db 100644 --- src/hg/makeDb/scripts/varFreqs/populations.tsv +++ src/hg/makeDb/scripts/varFreqs/populations.tsv @@ -1,32 +1,36 @@ # Population breakdown configuration for varFreqsAll combined track # db_key pop_key pop_name ac_field af_field # AllOfUs local ancestry populations AllOfUs AFR African AC_AFR AF_AFR AllOfUs AMR Indigenous American AC_AMR AF_AMR AllOfUs EAS East Asian AC_EAS AF_EAS AllOfUs EUR European AC_EUR AF_EUR AllOfUs OCE Oceanian AC_OCE AF_OCE AllOfUs SAS South Asian AC_SAS AF_SAS # GenomeAsia populations (7 groups in source VCF) GenomeAsia NEA Northeast Asian AC_NEA AF_NEA GenomeAsia SEA Southeast Asian AC_SEA AF_SEA GenomeAsia SAS South Asian AC_SAS AF_SAS GenomeAsia OCE Oceanian AC_OCE AF_OCE GenomeAsia AMR American AC_AMR AF_AMR GenomeAsia AFR African AC_AFR AF_AFR GenomeAsia WER Western European Ref AC_WER AF_WER # gnomAD HGDP+1kG continental groups HGDP1kG afr African gnomad_AC_afr gnomad_AF_afr HGDP1kG ami Amish gnomad_AC_ami gnomad_AF_ami HGDP1kG amr Latino gnomad_AC_amr gnomad_AF_amr HGDP1kG asj Ashkenazi Jewish gnomad_AC_asj gnomad_AF_asj HGDP1kG eas East Asian gnomad_AC_eas gnomad_AF_eas HGDP1kG fin Finnish gnomad_AC_fin gnomad_AF_fin HGDP1kG mid Middle Eastern gnomad_AC_mid gnomad_AF_mid HGDP1kG nfe Non-Finnish European gnomad_AC_nfe gnomad_AF_nfe HGDP1kG oth Other gnomad_AC_oth gnomad_AF_oth HGDP1kG sas South Asian gnomad_AC_sas gnomad_AF_sas # GREGoR affected/unaffected breakdown GREGoR AFF Affected AC_AFFECTED . GREGoR UNA Unaffected AC_UNAFFECTED . GREGoR UNK Unknown AC_UNKNOWN . +# NPM Singapore (SG10K_Health) ancestry groups +NPM Chinese Singapore Chinese AC_SgChinese AF_SgChinese +NPM Malay Singapore Malay AC_SgMalay AF_SgMalay +NPM Indian Singapore Indian AC_SgIndian AF_SgIndian