3180d71425ab40bc022712bb95868bfe80747375
max
  Fri May 29 08:52:38 2026 -0700
[Claude] varFreqs: split SPARK+SCHEMA by phenotype, add disease + array combined tracks, drop array cohorts from varFreqsAll

#Preview2 week - bugs introduced now will need a build patch to fix
Split SFARI SPARK WES and WGS by autism status using fill-tags -S with the
SPARK individuals_registration TSV (AC_AUT / AN_AUT / AF_AUT plus
AC_NON_AUT / AN_NON_AUT / AF_NON_AUT). Added matching SCHEMA case/control
sums (AC_CASE etc.). Two new combined bigBed tracks: varFreqsDisease
(SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K) and varFreqsArray (TPMI,
MexBB, UKBB). TPMI and MexBB are removed from varFreqsAll so the main
combined track is purely WGS/WES.

Build scripts parameterized so the same code drives all three combined
builds: mergeAndAnnotate.sh gains --databases / --tag, vcfToBigBed.py
gains --databases-file / --populations-file and a per-track autoSql table
name. mergeAndAnnotate.sh now pins /cluster/software/src/bcftools-1.22 in
PATH (--unify-chr-names is a 1.22 feature; conda's 1.14 silently fails).

refs #36642

diff --git src/hg/makeDb/scripts/varFreqs/populations.tsv src/hg/makeDb/scripts/varFreqs/populations.tsv
index d1c9dedb64b..4928a7d455a 100644
--- src/hg/makeDb/scripts/varFreqs/populations.tsv
+++ src/hg/makeDb/scripts/varFreqs/populations.tsv
@@ -1,24 +1,32 @@
 # Population breakdown configuration for varFreqsAll combined track
 # db_key	pop_key	pop_name	ac_field	af_field
 # AllOfUs local ancestry populations
 AllOfUs	AFR	African	AC_AFR	AF_AFR
 AllOfUs	AMR	Indigenous American	AC_AMR	AF_AMR
 AllOfUs	EAS	East Asian	AC_EAS	AF_EAS
 AllOfUs	EUR	European	AC_EUR	AF_EUR
 AllOfUs	OCE	Oceanian	AC_OCE	AF_OCE
 AllOfUs	SAS	South Asian	AC_SAS	AF_SAS
+# SFARI SPARK autism phenotype split (asd column of individuals_registration)
+SPARK	AUT	ASD proband	AC_AUT	AF_AUT
+SPARK	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT
+SFARI_WGS	AUT	ASD proband	AC_AUT	AF_AUT
+SFARI_WGS	NON_AUT	Non-ASD family	AC_NON_AUT	AF_NON_AUT
+# SCHEMA schizophrenia case/control split (summed across analysis groups)
+SCHEMA	CASE	Schizophrenia case	AC_CASE	AF_CASE
+SCHEMA	CTRL	Control	AC_CTRL	AF_CTRL
 # GenomeAsia populations (7 groups in source VCF)
 GenomeAsia	NEA	Northeast Asian	AC_NEA	AF_NEA
 GenomeAsia	SEA	Southeast Asian	AC_SEA	AF_SEA
 GenomeAsia	SAS	South Asian	AC_SAS	AF_SAS
 GenomeAsia	OCE	Oceanian	AC_OCE	AF_OCE
 GenomeAsia	AMR	American	AC_AMR	AF_AMR
 GenomeAsia	AFR	African	AC_AFR	AF_AFR
 GenomeAsia	WER	Western European Ref	AC_WER	AF_WER
 # gnomAD HGDP+1kG continental groups
 HGDP1kG	afr	African	gnomad_AC_afr	gnomad_AF_afr
 HGDP1kG	ami	Amish	gnomad_AC_ami	gnomad_AF_ami
 HGDP1kG	amr	Latino	gnomad_AC_amr	gnomad_AF_amr
 HGDP1kG	asj	Ashkenazi Jewish	gnomad_AC_asj	gnomad_AF_asj
 HGDP1kG	eas	East Asian	gnomad_AC_eas	gnomad_AF_eas
 HGDP1kG	fin	Finnish	gnomad_AC_fin	gnomad_AF_fin