3180d71425ab40bc022712bb95868bfe80747375 max Fri May 29 08:52:38 2026 -0700 [Claude] varFreqs: split SPARK+SCHEMA by phenotype, add disease + array combined tracks, drop array cohorts from varFreqsAll #Preview2 week - bugs introduced now will need a build patch to fix Split SFARI SPARK WES and WGS by autism status using fill-tags -S with the SPARK individuals_registration TSV (AC_AUT / AN_AUT / AF_AUT plus AC_NON_AUT / AN_NON_AUT / AF_NON_AUT). Added matching SCHEMA case/control sums (AC_CASE etc.). Two new combined bigBed tracks: varFreqsDisease (SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K) and varFreqsArray (TPMI, MexBB, UKBB). TPMI and MexBB are removed from varFreqsAll so the main combined track is purely WGS/WES. Build scripts parameterized so the same code drives all three combined builds: mergeAndAnnotate.sh gains --databases / --tag, vcfToBigBed.py gains --databases-file / --populations-file and a per-track autoSql table name. mergeAndAnnotate.sh now pins /cluster/software/src/bcftools-1.22 in PATH (--unify-chr-names is a 1.22 feature; conda's 1.14 silently fails). refs #36642 diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv index d1c9dedb64b..4928a7d455a 100644 --- src/hg/makeDb/scripts/varFreqs/populations.tsv +++ src/hg/makeDb/scripts/varFreqs/populations.tsv @@ -1,24 +1,32 @@ # Population breakdown configuration for varFreqsAll combined track # db_key pop_key pop_name ac_field af_field # AllOfUs local ancestry populations AllOfUs AFR African AC_AFR AF_AFR AllOfUs AMR Indigenous American AC_AMR AF_AMR AllOfUs EAS East Asian AC_EAS AF_EAS AllOfUs EUR European AC_EUR AF_EUR AllOfUs OCE Oceanian AC_OCE AF_OCE AllOfUs SAS South Asian AC_SAS AF_SAS +# SFARI SPARK autism phenotype split (asd column of individuals_registration) +SPARK AUT ASD proband AC_AUT AF_AUT +SPARK NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT +SFARI_WGS AUT ASD proband AC_AUT AF_AUT +SFARI_WGS NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT +# SCHEMA schizophrenia case/control split (summed across analysis groups) +SCHEMA CASE Schizophrenia case AC_CASE AF_CASE +SCHEMA CTRL Control AC_CTRL AF_CTRL # GenomeAsia populations (7 groups in source VCF) GenomeAsia NEA Northeast Asian AC_NEA AF_NEA GenomeAsia SEA Southeast Asian AC_SEA AF_SEA GenomeAsia SAS South Asian AC_SAS AF_SAS GenomeAsia OCE Oceanian AC_OCE AF_OCE GenomeAsia AMR American AC_AMR AF_AMR GenomeAsia AFR African AC_AFR AF_AFR GenomeAsia WER Western European Ref AC_WER AF_WER # gnomAD HGDP+1kG continental groups HGDP1kG afr African gnomad_AC_afr gnomad_AF_afr HGDP1kG ami Amish gnomad_AC_ami gnomad_AF_ami HGDP1kG amr Latino gnomad_AC_amr gnomad_AF_amr HGDP1kG asj Ashkenazi Jewish gnomad_AC_asj gnomad_AF_asj HGDP1kG eas East Asian gnomad_AC_eas gnomad_AF_eas HGDP1kG fin Finnish gnomad_AC_fin gnomad_AF_fin