64a3f9e7813e823cf724ea188c3928a911578286
max
  Thu Jun 4 00:32:22 2026 -0700
varFreqs: replace All Databases Combined with two phenotype-split tracks

Replace the single varFreqsAll combined track (and drop the varFreqsDisease
track) with two matched tracks for visual case-vs-background comparison:
varFreqsAffected   - variants seen in the affected/case arms of disease
cohorts (SFARI SPARK WES/WGS ASD probands, SCHEMA cases,
GREGoR affected, GA4K); ~130,000 individuals
varFreqsBackground - population reference cohorts + the unaffected/control
arms of disease cohorts ("all other variants");
~1.5 million individuals
A variant seen in both groups appears in both tracks. Genotyping-array cohorts
stay out of both (varFreqsArray unchanged).

vcfToBigBed.py gains --split-affected to emit both tracks in one pass; it reads
phenotype tags (affected/unaffected/unknown) from populations.tsv and
is_disease/disease_role from databases.tsv, and derives the length-filter
ranges from the observed data. TOPMed reclassified as a population cohort.
SPARK WGS display name changed to SFARI SPARK WGS for consistency with the
standalone subtracks. Fixed the trackDb mouseOver $-substitution prefix
collision by wrapping fields in ${}. New description pages for both tracks.

refs #36642

diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv
index 4928a7d455a..da279c3a887 100644
--- src/hg/makeDb/scripts/varFreqs/populations.tsv
+++ src/hg/makeDb/scripts/varFreqs/populations.tsv
@@ -1,49 +1,52 @@
 # Population breakdown configuration for varFreqsAll combined track
-# db_key	pop_key	pop_name	ac_field	af_field
+# db_key	pop_key	pop_name	ac_field	af_field	[phenotype]
+# Optional 6th column "phenotype" (affected|unaffected|unknown) tags a disease cohort's
+# case/control arms so the build can aggregate an affected-vs-unaffected summary across
+# cohorts. Ancestry/region populations leave it blank.
 # AllOfUs local ancestry populations
 AllOfUs	AFR	African	AC_AFR	AF_AFR
 AllOfUs	AMR	Indigenous American	AC_AMR	AF_AMR
 AllOfUs	EAS	East Asian	AC_EAS	AF_EAS
 AllOfUs	EUR	European	AC_EUR	AF_EUR
 AllOfUs	OCE	Oceanian	AC_OCE	AF_OCE
 AllOfUs	SAS	South Asian	AC_SAS	AF_SAS
 # SFARI SPARK autism phenotype split (asd column of individuals_registration)
-SPARK	AUT	ASD proband	AC_AUT	AF_AUT
-SPARK	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT
-SFARI_WGS	AUT	ASD proband	AC_AUT	AF_AUT
-SFARI_WGS	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT
+SPARK	AUT	ASD proband	AC_AUT	AF_AUT	affected
+SPARK	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT	unaffected
+SFARI_WGS	AUT	ASD proband	AC_AUT	AF_AUT	affected
+SFARI_WGS	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT	unaffected
 # SCHEMA schizophrenia case/control split (summed across analysis groups)
-SCHEMA	CASE	Schizophrenia case	AC_CASE	AF_CASE
-SCHEMA	CTRL	Control	AC_CTRL	AF_CTRL
+SCHEMA	CASE	Schizophrenia case	AC_CASE	AF_CASE	affected
+SCHEMA	CTRL	Control	AC_CTRL	AF_CTRL	unaffected
 # GenomeAsia populations (7 groups in source VCF)
 GenomeAsia	NEA	Northeast Asian	AC_NEA	AF_NEA
 GenomeAsia	SEA	Southeast Asian	AC_SEA	AF_SEA
 GenomeAsia	SAS	South Asian	AC_SAS	AF_SAS
 GenomeAsia	OCE	Oceanian	AC_OCE	AF_OCE
 GenomeAsia	AMR	American	AC_AMR	AF_AMR
 GenomeAsia	AFR	African	AC_AFR	AF_AFR
 GenomeAsia	WER	Western European Ref	AC_WER	AF_WER
 # gnomAD HGDP+1kG continental groups
 HGDP1kG	afr	African	gnomad_AC_afr	gnomad_AF_afr
 HGDP1kG	ami	Amish	gnomad_AC_ami	gnomad_AF_ami
 HGDP1kG	amr	Latino	gnomad_AC_amr	gnomad_AF_amr
 HGDP1kG	asj	Ashkenazi Jewish	gnomad_AC_asj	gnomad_AF_asj
 HGDP1kG	eas	East Asian	gnomad_AC_eas	gnomad_AF_eas
 HGDP1kG	fin	Finnish	gnomad_AC_fin	gnomad_AF_fin
 HGDP1kG	mid	Middle Eastern	gnomad_AC_mid	gnomad_AF_mid
 HGDP1kG	nfe	Non-Finnish European	gnomad_AC_nfe	gnomad_AF_nfe
 HGDP1kG	oth	Other	gnomad_AC_oth	gnomad_AF_oth
 HGDP1kG	sas	South Asian	gnomad_AC_sas	gnomad_AF_sas
 # GREGoR affected/unaffected breakdown
-GREGoR	AFF	Affected	AC_AFFECTED	.
-GREGoR	UNA	Unaffected	AC_UNAFFECTED	.
-GREGoR	UNK	Unknown	AC_UNKNOWN	.
+GREGoR	AFF	Affected	AC_AFFECTED	.	affected
+GREGoR	UNA	Unaffected	AC_UNAFFECTED	.	unaffected
+GREGoR	UNK	Unknown	AC_UNKNOWN	.	unknown
 # NPM Singapore (SG10K_Health) ancestry groups
 NPM	Chinese	Singapore Chinese	AC_SgChinese	AF_SgChinese
 NPM	Malay	Singapore Malay	AC_SgMalay	AF_SgMalay
 NPM	Indian	Singapore Indian	AC_SgIndian	AF_SgIndian
 # WBBC Westlake BioBank for Chinese regional Han groups (AC not present, will be synthesized from AF*AN at build time)
 WBBC	North	North Han	.	North_AF
 WBBC	Central	Central Han	.	Central_AF
 WBBC	South	South Han	.	South_AF
 WBBC	Lingnan	Lingnan Han	.	Lingnan_AF