64a3f9e7813e823cf724ea188c3928a911578286 max Thu Jun 4 00:32:22 2026 -0700 varFreqs: replace All Databases Combined with two phenotype-split tracks Replace the single varFreqsAll combined track (and drop the varFreqsDisease track) with two matched tracks for visual case-vs-background comparison: varFreqsAffected - variants seen in the affected/case arms of disease cohorts (SFARI SPARK WES/WGS ASD probands, SCHEMA cases, GREGoR affected, GA4K); ~130,000 individuals varFreqsBackground - population reference cohorts + the unaffected/control arms of disease cohorts ("all other variants"); ~1.5 million individuals A variant seen in both groups appears in both tracks. Genotyping-array cohorts stay out of both (varFreqsArray unchanged). vcfToBigBed.py gains --split-affected to emit both tracks in one pass; it reads phenotype tags (affected/unaffected/unknown) from populations.tsv and is_disease/disease_role from databases.tsv, and derives the length-filter ranges from the observed data. TOPMed reclassified as a population cohort. SPARK WGS display name changed to SFARI SPARK WGS for consistency with the standalone subtracks. Fixed the trackDb mouseOver $-substitution prefix collision by wrapping fields in ${}. New description pages for both tracks. refs #36642 diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv index 4928a7d455a..da279c3a887 100644 --- src/hg/makeDb/scripts/varFreqs/populations.tsv +++ src/hg/makeDb/scripts/varFreqs/populations.tsv @@ -1,49 +1,52 @@ # Population breakdown configuration for varFreqsAll combined track -# db_key pop_key pop_name ac_field af_field +# db_key pop_key pop_name ac_field af_field [phenotype] +# Optional 6th column "phenotype" (affected|unaffected|unknown) tags a disease cohort's +# case/control arms so the build can aggregate an affected-vs-unaffected summary across +# cohorts. Ancestry/region populations leave it blank. # AllOfUs local ancestry populations AllOfUs AFR African AC_AFR AF_AFR AllOfUs AMR Indigenous American AC_AMR AF_AMR AllOfUs EAS East Asian AC_EAS AF_EAS AllOfUs EUR European AC_EUR AF_EUR AllOfUs OCE Oceanian AC_OCE AF_OCE AllOfUs SAS South Asian AC_SAS AF_SAS # SFARI SPARK autism phenotype split (asd column of individuals_registration) -SPARK AUT ASD proband AC_AUT AF_AUT -SPARK NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT -SFARI_WGS AUT ASD proband AC_AUT AF_AUT -SFARI_WGS NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT +SPARK AUT ASD proband AC_AUT AF_AUT affected +SPARK NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT unaffected +SFARI_WGS AUT ASD proband AC_AUT AF_AUT affected +SFARI_WGS NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT unaffected # SCHEMA schizophrenia case/control split (summed across analysis groups) -SCHEMA CASE Schizophrenia case AC_CASE AF_CASE -SCHEMA CTRL Control AC_CTRL AF_CTRL +SCHEMA CASE Schizophrenia case AC_CASE AF_CASE affected +SCHEMA CTRL Control AC_CTRL AF_CTRL unaffected # GenomeAsia populations (7 groups in source VCF) GenomeAsia NEA Northeast Asian AC_NEA AF_NEA GenomeAsia SEA Southeast Asian AC_SEA AF_SEA GenomeAsia SAS South Asian AC_SAS AF_SAS GenomeAsia OCE Oceanian AC_OCE AF_OCE GenomeAsia AMR American AC_AMR AF_AMR GenomeAsia AFR African AC_AFR AF_AFR GenomeAsia WER Western European Ref AC_WER AF_WER # gnomAD HGDP+1kG continental groups HGDP1kG afr African gnomad_AC_afr gnomad_AF_afr HGDP1kG ami Amish gnomad_AC_ami gnomad_AF_ami HGDP1kG amr Latino gnomad_AC_amr gnomad_AF_amr HGDP1kG asj Ashkenazi Jewish gnomad_AC_asj gnomad_AF_asj HGDP1kG eas East Asian gnomad_AC_eas gnomad_AF_eas HGDP1kG fin Finnish gnomad_AC_fin gnomad_AF_fin HGDP1kG mid Middle Eastern gnomad_AC_mid gnomad_AF_mid HGDP1kG nfe Non-Finnish European gnomad_AC_nfe gnomad_AF_nfe HGDP1kG oth Other gnomad_AC_oth gnomad_AF_oth HGDP1kG sas South Asian gnomad_AC_sas gnomad_AF_sas # GREGoR affected/unaffected breakdown -GREGoR AFF Affected AC_AFFECTED . -GREGoR UNA Unaffected AC_UNAFFECTED . -GREGoR UNK Unknown AC_UNKNOWN . +GREGoR AFF Affected AC_AFFECTED . affected +GREGoR UNA Unaffected AC_UNAFFECTED . unaffected +GREGoR UNK Unknown AC_UNKNOWN . unknown # NPM Singapore (SG10K_Health) ancestry groups NPM Chinese Singapore Chinese AC_SgChinese AF_SgChinese NPM Malay Singapore Malay AC_SgMalay AF_SgMalay NPM Indian Singapore Indian AC_SgIndian AF_SgIndian # WBBC Westlake BioBank for Chinese regional Han groups (AC not present, will be synthesized from AF*AN at build time) WBBC North North Han . North_AF WBBC Central Central Han . Central_AF WBBC South South Han . South_AF WBBC Lingnan Lingnan Han . Lingnan_AF