64a3f9e7813e823cf724ea188c3928a911578286
max
Thu Jun 4 00:32:22 2026 -0700
varFreqs: replace All Databases Combined with two phenotype-split tracks
Replace the single varFreqsAll combined track (and drop the varFreqsDisease
track) with two matched tracks for visual case-vs-background comparison:
varFreqsAffected - variants seen in the affected/case arms of disease
cohorts (SFARI SPARK WES/WGS ASD probands, SCHEMA cases,
GREGoR affected, GA4K); ~130,000 individuals
varFreqsBackground - population reference cohorts + the unaffected/control
arms of disease cohorts ("all other variants");
~1.5 million individuals
A variant seen in both groups appears in both tracks. Genotyping-array cohorts
stay out of both (varFreqsArray unchanged).
vcfToBigBed.py gains --split-affected to emit both tracks in one pass; it reads
phenotype tags (affected/unaffected/unknown) from populations.tsv and
is_disease/disease_role from databases.tsv, and derives the length-filter
ranges from the observed data. TOPMed reclassified as a population cohort.
SPARK WGS display name changed to SFARI SPARK WGS for consistency with the
standalone subtracks. Fixed the trackDb mouseOver $-substitution prefix
collision by wrapping fields in ${}. New description pages for both tracks.
refs #36642
diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv
index 4928a7d455a..da279c3a887 100644
--- src/hg/makeDb/scripts/varFreqs/populations.tsv
+++ src/hg/makeDb/scripts/varFreqs/populations.tsv
@@ -1,49 +1,52 @@
# Population breakdown configuration for varFreqsAll combined track
-# db_key pop_key pop_name ac_field af_field
+# db_key pop_key pop_name ac_field af_field [phenotype]
+# Optional 6th column "phenotype" (affected|unaffected|unknown) tags a disease cohort's
+# case/control arms so the build can aggregate an affected-vs-unaffected summary across
+# cohorts. Ancestry/region populations leave it blank.
# AllOfUs local ancestry populations
AllOfUs AFR African AC_AFR AF_AFR
AllOfUs AMR Indigenous American AC_AMR AF_AMR
AllOfUs EAS East Asian AC_EAS AF_EAS
AllOfUs EUR European AC_EUR AF_EUR
AllOfUs OCE Oceanian AC_OCE AF_OCE
AllOfUs SAS South Asian AC_SAS AF_SAS
# SFARI SPARK autism phenotype split (asd column of individuals_registration)
-SPARK AUT ASD proband AC_AUT AF_AUT
-SPARK NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT
-SFARI_WGS AUT ASD proband AC_AUT AF_AUT
-SFARI_WGS NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT
+SPARK AUT ASD proband AC_AUT AF_AUT affected
+SPARK NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT unaffected
+SFARI_WGS AUT ASD proband AC_AUT AF_AUT affected
+SFARI_WGS NON_AUT Non-ASD family AC_NON_AUT AF_NON_AUT unaffected
# SCHEMA schizophrenia case/control split (summed across analysis groups)
-SCHEMA CASE Schizophrenia case AC_CASE AF_CASE
-SCHEMA CTRL Control AC_CTRL AF_CTRL
+SCHEMA CASE Schizophrenia case AC_CASE AF_CASE affected
+SCHEMA CTRL Control AC_CTRL AF_CTRL unaffected
# GenomeAsia populations (7 groups in source VCF)
GenomeAsia NEA Northeast Asian AC_NEA AF_NEA
GenomeAsia SEA Southeast Asian AC_SEA AF_SEA
GenomeAsia SAS South Asian AC_SAS AF_SAS
GenomeAsia OCE Oceanian AC_OCE AF_OCE
GenomeAsia AMR American AC_AMR AF_AMR
GenomeAsia AFR African AC_AFR AF_AFR
GenomeAsia WER Western European Ref AC_WER AF_WER
# gnomAD HGDP+1kG continental groups
HGDP1kG afr African gnomad_AC_afr gnomad_AF_afr
HGDP1kG ami Amish gnomad_AC_ami gnomad_AF_ami
HGDP1kG amr Latino gnomad_AC_amr gnomad_AF_amr
HGDP1kG asj Ashkenazi Jewish gnomad_AC_asj gnomad_AF_asj
HGDP1kG eas East Asian gnomad_AC_eas gnomad_AF_eas
HGDP1kG fin Finnish gnomad_AC_fin gnomad_AF_fin
HGDP1kG mid Middle Eastern gnomad_AC_mid gnomad_AF_mid
HGDP1kG nfe Non-Finnish European gnomad_AC_nfe gnomad_AF_nfe
HGDP1kG oth Other gnomad_AC_oth gnomad_AF_oth
HGDP1kG sas South Asian gnomad_AC_sas gnomad_AF_sas
# GREGoR affected/unaffected breakdown
-GREGoR AFF Affected AC_AFFECTED .
-GREGoR UNA Unaffected AC_UNAFFECTED .
-GREGoR UNK Unknown AC_UNKNOWN .
+GREGoR AFF Affected AC_AFFECTED . affected
+GREGoR UNA Unaffected AC_UNAFFECTED . unaffected
+GREGoR UNK Unknown AC_UNKNOWN . unknown
# NPM Singapore (SG10K_Health) ancestry groups
NPM Chinese Singapore Chinese AC_SgChinese AF_SgChinese
NPM Malay Singapore Malay AC_SgMalay AF_SgMalay
NPM Indian Singapore Indian AC_SgIndian AF_SgIndian
# WBBC Westlake BioBank for Chinese regional Han groups (AC not present, will be synthesized from AF*AN at build time)
WBBC North North Han . North_AF
WBBC Central Central Han . Central_AF
WBBC South South Han . South_AF
WBBC Lingnan Lingnan Han . Lingnan_AF