65da29c9d74d4dd832ab7f16899ad3b209b92da4 max Wed May 6 08:43:57 2026 -0700 varFreqs: 5 vcfToBigBed.py fixes + add NPM Singapore to combined track vcfToBigBed.py and mergeAndAnnotate.sh moved into kent (they were hive-only); the build is now reproducible from a fresh kent checkout. Five vcfToBigBed.py fixes (all caught by Lou's QA pass on #36642): - normalize_consequence(): bcftools csq emits "&"-joined compound terms like "stop_gained&frameshift" which exact-match-failed the old 8-bucket consequence filter and orphaned ~8.5M records. Rewrites "&" to "," so a single record can match multiple buckets, and appends ",others" to any token list with no named-filter token. Trackdb gains 4 buckets (3' UTR, 5' UTR, Non-coding, Other) and switches to filterType.consequence multipleListOr. - Source-attribution bug: the old check only inspected the unified AC/AF slot. AllOfUs ships only per-population fields ("." in the unified slot), so all 67M+ AllOfUs variants got no source attribution -- ~43M rows in the previous bigBed had an empty "sources" column. Fix scans per-population slots before declaring "no data". - parse_bcsq() returns "" instead of "." for aaChange/dnaChange on non-coding variants, so the mouseOver and detail page render a clean blank line. - maxAF format: "{:.6g}" -> "{:.6f}" so very small AFs render as "0.000003" instead of "3.31347e-06". - autoSql `table varFreqs` -> `table varFreqsAll` (matches the bigBed filename; required for hgIntegrator wiring). NPM Singapore (SG10K_Health, 9.7k WGS) added to databases.tsv, files.txt, populations.tsv (SgChinese / SgMalay / SgIndian) and the trackDb filter UI. NPM individual subtrack stays tableBrowser off (license); folded into varFreqsAll same as finngen / kova / mgrb / swefreq / tishkoff180. varFreqsAll bigBed rebuild is in progress at /hive/data/genomes/hg38/ bed/varFreqs/all/; will land in /gbdb when the bedToBigBed step completes. refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/scripts/varFreqs/databases.tsv src/hg/makeDb/scripts/varFreqs/databases.tsv index 142589ff57b..66ec3032f7b 100644 --- src/hg/makeDb/scripts/varFreqs/databases.tsv +++ src/hg/makeDb/scripts/varFreqs/databases.tsv @@ -1,23 +1,24 @@ # Database configuration for varFreqsAll combined track # key name vcf ac_field af_field # Use "." for fields that don't exist in the VCF AllOfUs AllOfUs /gbdb/hg38/varFreqs/allofus/allOfUs.locAncFreq.vcf.gz . . SPARK SPARK WES /gbdb/hg38/varFreqs/sfari/SPARK.iWES_v3.2024_08.deepvariant.norm.vcf.gz AC AF SFARI_WGS SFARI WGS /gbdb/hg38/varFreqs/sfari/wgs_12519_genome.deepvariant.norm.vcf.gz AC AF GenomeAsia GenomeAsia SNVs /gbdb/hg38/varFreqs/ga100k/ga100k.subst.vcf.gz AC AF GenomeAsiaIndel GenomeAsia Indels /gbdb/hg38/varFreqs/ga100k/ga100k.indels.vcf.gz AC AF +NPM NPM Singapore /gbdb/hg38/varFreqs/npm/SG10K_Health_r5.3.2.sites.vcf.bgz AC AF KOVA KOVA Korea /gbdb/hg38/varFreqs/kova/kova.v7.vcf.gz AC AF ToMMo ToMMo Japan /gbdb/hg38/varFreqs/tommo61kjpn/tommo-61kjpn-20250616-GRCh38-snvindel-af-autosome.vcf.gz AC AF IndiGen IndiGenomes India /gbdb/hg38/varFreqs/indigenomes/IndiGenomes_Variants.vcf.gz AC AF FinnGen FinnGen Finland /gbdb/hg38/varFreqs/finngen/finnge_R12_annotated_variants_v1.vcf.gz AC AF Saudi Saudi /gbdb/hg38/varFreqs/saudi/saudi.vcf.gz AC AF SweGen SweGen Sweden /gbdb/hg38/varFreqs/swefreq/swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz AC AF TOPMed TOPMed /gbdb/hg38/varFreqs/topmed/topmed10.vcf.gz AC AF ABraOM ABraOM Brazil /gbdb/hg38/varFreqs/abraom/abraom.vcf.gz . AF ALFA ALFA /gbdb/hg38/varFreqs/alfa/ALFA.vcf.gz . AF_GLB MGRB MGRB Australia /gbdb/hg38/varFreqs/mgrb/MGRB.phase3.GRCh38.norm.vcf.gz AC AF HRC HRC /gbdb/hg38/varFreqs/hrc/hrc.vcf.gz AC AF MexBB Mexico Biobank /gbdb/hg38/varFreqs/mxb/mxb.freq.vcf.gz AC AF SGDP SGDP /gbdb/hg38/varFreqs/sgdpFreq/sgdp.freq.vcf.gz AC AF HGDP1kG gnomAD HGDP+1kG /gbdb/hg38/varFreqs/hgdp1kFreq/hgdp1k.freq.vcf.gz AC AF GREGoR GREGoR /gbdb/hg38/varFreqs/gregor/gregor.vcf.gz AC AF