3180d71425ab40bc022712bb95868bfe80747375 max Fri May 29 08:52:38 2026 -0700 [Claude] varFreqs: split SPARK+SCHEMA by phenotype, add disease + array combined tracks, drop array cohorts from varFreqsAll #Preview2 week - bugs introduced now will need a build patch to fix Split SFARI SPARK WES and WGS by autism status using fill-tags -S with the SPARK individuals_registration TSV (AC_AUT / AN_AUT / AF_AUT plus AC_NON_AUT / AN_NON_AUT / AF_NON_AUT). Added matching SCHEMA case/control sums (AC_CASE etc.). Two new combined bigBed tracks: varFreqsDisease (SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K) and varFreqsArray (TPMI, MexBB, UKBB). TPMI and MexBB are removed from varFreqsAll so the main combined track is purely WGS/WES. Build scripts parameterized so the same code drives all three combined builds: mergeAndAnnotate.sh gains --databases / --tag, vcfToBigBed.py gains --databases-file / --populations-file and a per-track autoSql table name. mergeAndAnnotate.sh now pins /cluster/software/src/bcftools-1.22 in PATH (--unify-chr-names is a 1.22 feature; conda's 1.14 silently fails). refs #36642 diff --git src/hg/makeDb/scripts/varFreqs/databases.tsv src/hg/makeDb/scripts/varFreqs/databases.tsv index 2202b7722c4..a0fccdd9da7 100644 --- src/hg/makeDb/scripts/varFreqs/databases.tsv +++ src/hg/makeDb/scripts/varFreqs/databases.tsv @@ -1,36 +1,36 @@ # Database configuration for varFreqsAll combined track # key name vcf ac_field af_field # Use "." for fields that don't exist in the VCF AllOfUs AllOfUs /gbdb/hg38/varFreqs/_allofus/allOfUs.locAncFreq.vcf.gz . . SPARK SPARK WES /gbdb/hg38/varFreqs/_sfari/SPARK.iWES_v3.2024_08.deepvariant.norm.vcf.gz AC AF SFARI_WGS SFARI WGS /gbdb/hg38/varFreqs/_sfari/wgs_12519_genome.deepvariant.norm.vcf.gz AC AF GenomeAsia GenomeAsia SNVs /gbdb/hg38/varFreqs/ga100k/ga100k.subst.vcf.gz AC AF GenomeAsiaIndel GenomeAsia Indels /gbdb/hg38/varFreqs/ga100k/ga100k.indels.vcf.gz AC AF NPM NPM Singapore /gbdb/hg38/varFreqs/_npm/SG10K_Health_r5.3.2.sites.vcf.bgz AC AF KOVA KOVA Korea /gbdb/hg38/varFreqs/_kova/kova.v7.vcf.gz AC AF ToMMo ToMMo Japan /gbdb/hg38/varFreqs/tommo61kjpn/tommo-61kjpn-20250616-GRCh38-snvindel-af-autosome.vcf.gz AC AF # IndiGen dropped: the IGIB IndiGenomes release ships only a VRT variation-type # bit per record (no AC, AF, or AN in INFO), so it cannot contribute counts to # the combined track. Re-add only if a future release exposes allele counts. FinnGen FinnGen Finland /gbdb/hg38/varFreqs/_finngen/finnge_R12_annotated_variants_v1.vcf.gz AC AF Saudi Saudi /gbdb/hg38/varFreqs/saudi/saudi.vcf.gz AC AF SweGen SweGen Sweden /gbdb/hg38/varFreqs/_swefreq/swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz AC AF TOPMed TOPMed /gbdb/hg38/varFreqs/_topmed/topmed10.vcf.gz AC AF ABraOM ABraOM Brazil /gbdb/hg38/varFreqs/abraom/abraom.vcf.gz . AF ALFA ALFA /gbdb/hg38/varFreqs/alfa/ALFA.vcf.gz . AF_GLB MGRB MGRB Australia /gbdb/hg38/varFreqs/_mgrb/MGRB.phase3.GRCh38.norm.vcf.gz AC . HRC HRC /gbdb/hg38/varFreqs/hrc/hrc.vcf.gz AC AF -MexBB Mexico Biobank /gbdb/hg38/varFreqs/_mxb/mxb.freq.vcf.gz AC AF +# MexBB and TPMI moved to the array-based track (databases_array.tsv): both are +# genotyping-array cohorts and are kept out of the WGS/WES varFreqsAll track. SGDP SGDP /gbdb/hg38/varFreqs/sgdpFreq/sgdp.freq.vcf.gz AC AF HGDP1kG gnomAD HGDP+1kG /gbdb/hg38/varFreqs/hgdp1kFreq/hgdp1k.freq.vcf.gz AC AF GREGoR GREGoR /gbdb/hg38/varFreqs/gregor/gregor.vcf.gz AC AF SCHEMA SCHEMA /gbdb/hg38/varFreqs/schema/SCHEMA_variant_results_withAF.vcf.gz AC AF GA4K GA4K PacBio LR /gbdb/hg38/varFreqs/ga4k/ga4kSnv.vcf.gz AC AF CoLoRSdb CoLoRSdb PacBio LR /gbdb/hg38/varFreqs/colorsDb/colorsDbSnv.vcf.gz AC AF SVatalog SVatalog 101 10XG SR /gbdb/hg38/varFreqs/svatalog/svatalog.vcf.gz AC AF Tishkoff180 Tishkoff 180 African WGS /gbdb/hg38/varFreqs/_tishkoff/tishkoff180.vcf.gz AC AF WBBC WBBC China /gbdb/hg38/varFreqs/wbbc/wbbc.vcf.gz AC AF -TPMI TPMI Taiwan /gbdb/hg38/varFreqs/_tpmi/tpmi.vcf.gz AC AF ChinaMAP China ChinaMAP /gbdb/hg38/varFreqs/_chinamap/chinamap.vcf.gz AC AF GenomeIndia GenomeIndia 9.7k WGS /gbdb/hg38/varFreqs/_genomeindia/genomeindia.vcf.gz AC AF GoNL GoNL Netherlands ~13x SR /gbdb/hg38/varFreqs/gonl/gonl.vcf.gz AC AF