64a3f9e7813e823cf724ea188c3928a911578286
max
Thu Jun 4 00:32:22 2026 -0700
varFreqs: replace All Databases Combined with two phenotype-split tracks
Replace the single varFreqsAll combined track (and drop the varFreqsDisease
track) with two matched tracks for visual case-vs-background comparison:
varFreqsAffected - variants seen in the affected/case arms of disease
cohorts (SFARI SPARK WES/WGS ASD probands, SCHEMA cases,
GREGoR affected, GA4K); ~130,000 individuals
varFreqsBackground - population reference cohorts + the unaffected/control
arms of disease cohorts ("all other variants");
~1.5 million individuals
A variant seen in both groups appears in both tracks. Genotyping-array cohorts
stay out of both (varFreqsArray unchanged).
vcfToBigBed.py gains --split-affected to emit both tracks in one pass; it reads
phenotype tags (affected/unaffected/unknown) from populations.tsv and
is_disease/disease_role from databases.tsv, and derives the length-filter
ranges from the observed data. TOPMed reclassified as a population cohort.
SPARK WGS display name changed to SFARI SPARK WGS for consistency with the
standalone subtracks. Fixed the trackDb mouseOver $-substitution prefix
collision by wrapping fields in ${}. New description pages for both tracks.
refs #36642
diff --git src/hg/makeDb/scripts/varFreqs/databases.tsv src/hg/makeDb/scripts/varFreqs/databases.tsv
index a0fccdd9da7..dccc7af0731 100644
--- src/hg/makeDb/scripts/varFreqs/databases.tsv
+++ src/hg/makeDb/scripts/varFreqs/databases.tsv
@@ -1,36 +1,42 @@
# Database configuration for varFreqsAll combined track
-# key name vcf ac_field af_field
+# key name vcf ac_field af_field is_disease disease_role
# Use "." for fields that don't exist in the VCF
-AllOfUs AllOfUs /gbdb/hg38/varFreqs/_allofus/allOfUs.locAncFreq.vcf.gz . .
-SPARK SPARK WES /gbdb/hg38/varFreqs/_sfari/SPARK.iWES_v3.2024_08.deepvariant.norm.vcf.gz AC AF
-SFARI_WGS SFARI WGS /gbdb/hg38/varFreqs/_sfari/wgs_12519_genome.deepvariant.norm.vcf.gz AC AF
-GenomeAsia GenomeAsia SNVs /gbdb/hg38/varFreqs/ga100k/ga100k.subst.vcf.gz AC AF
-GenomeAsiaIndel GenomeAsia Indels /gbdb/hg38/varFreqs/ga100k/ga100k.indels.vcf.gz AC AF
-NPM NPM Singapore /gbdb/hg38/varFreqs/_npm/SG10K_Health_r5.3.2.sites.vcf.bgz AC AF
-KOVA KOVA Korea /gbdb/hg38/varFreqs/_kova/kova.v7.vcf.gz AC AF
-ToMMo ToMMo Japan /gbdb/hg38/varFreqs/tommo61kjpn/tommo-61kjpn-20250616-GRCh38-snvindel-af-autosome.vcf.gz AC AF
+# is_disease=1: cohort assembled to study a disease (autism, schizophrenia, rare disease).
+# disease_role: for a disease cohort with NO affected/unaffected population split, what is
+# the whole cohort? "affected" (e.g. GA4K rare-disease probands) feeds the affected
+# summary; blank means use the per-population phenotype tags in populations.tsv instead.
+# TOPMed is is_disease=0: it is an NHLBI population/biobank reference (used like gnomAD),
+# not an affected-disease case cohort, and ships no affected/unaffected label.
+AllOfUs AllOfUs /gbdb/hg38/varFreqs/_allofus/allOfUs.locAncFreq.vcf.gz . . 0
+SPARK SFARI SPARK WES /gbdb/hg38/varFreqs/_sfari/SPARK.iWES_v3.2024_08.deepvariant.norm.vcf.gz AC AF 1
+SFARI_WGS SFARI SPARK WGS /gbdb/hg38/varFreqs/_sfari/wgs_12519_genome.deepvariant.norm.vcf.gz AC AF 1
+GenomeAsia GenomeAsia SNVs /gbdb/hg38/varFreqs/ga100k/ga100k.subst.vcf.gz AC AF 0
+GenomeAsiaIndel GenomeAsia Indels /gbdb/hg38/varFreqs/ga100k/ga100k.indels.vcf.gz AC AF 0
+NPM NPM Singapore /gbdb/hg38/varFreqs/_npm/SG10K_Health_r5.3.2.sites.vcf.bgz AC AF 0
+KOVA KOVA Korea /gbdb/hg38/varFreqs/_kova/kova.v7.vcf.gz AC AF 0
+ToMMo ToMMo Japan /gbdb/hg38/varFreqs/tommo61kjpn/tommo-61kjpn-20250616-GRCh38-snvindel-af-autosome.vcf.gz AC AF 0
# IndiGen dropped: the IGIB IndiGenomes release ships only a VRT variation-type
# bit per record (no AC, AF, or AN in INFO), so it cannot contribute counts to
# the combined track. Re-add only if a future release exposes allele counts.
-FinnGen FinnGen Finland /gbdb/hg38/varFreqs/_finngen/finnge_R12_annotated_variants_v1.vcf.gz AC AF
-Saudi Saudi /gbdb/hg38/varFreqs/saudi/saudi.vcf.gz AC AF
-SweGen SweGen Sweden /gbdb/hg38/varFreqs/_swefreq/swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz AC AF
-TOPMed TOPMed /gbdb/hg38/varFreqs/_topmed/topmed10.vcf.gz AC AF
-ABraOM ABraOM Brazil /gbdb/hg38/varFreqs/abraom/abraom.vcf.gz . AF
-ALFA ALFA /gbdb/hg38/varFreqs/alfa/ALFA.vcf.gz . AF_GLB
-MGRB MGRB Australia /gbdb/hg38/varFreqs/_mgrb/MGRB.phase3.GRCh38.norm.vcf.gz AC .
-HRC HRC /gbdb/hg38/varFreqs/hrc/hrc.vcf.gz AC AF
+FinnGen FinnGen Finland /gbdb/hg38/varFreqs/_finngen/finnge_R12_annotated_variants_v1.vcf.gz AC AF 0
+Saudi Saudi /gbdb/hg38/varFreqs/saudi/saudi.vcf.gz AC AF 0
+SweGen SweGen Sweden /gbdb/hg38/varFreqs/_swefreq/swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz AC AF 0
+TOPMed TOPMed /gbdb/hg38/varFreqs/_topmed/topmed10.vcf.gz AC AF 0
+ABraOM ABraOM Brazil /gbdb/hg38/varFreqs/abraom/abraom.vcf.gz . AF 0
+ALFA ALFA /gbdb/hg38/varFreqs/alfa/ALFA.vcf.gz . AF_GLB 0
+MGRB MGRB Australia /gbdb/hg38/varFreqs/_mgrb/MGRB.phase3.GRCh38.norm.vcf.gz AC . 0
+HRC HRC /gbdb/hg38/varFreqs/hrc/hrc.vcf.gz AC AF 0
# MexBB and TPMI moved to the array-based track (databases_array.tsv): both are
# genotyping-array cohorts and are kept out of the WGS/WES varFreqsAll track.
-SGDP SGDP /gbdb/hg38/varFreqs/sgdpFreq/sgdp.freq.vcf.gz AC AF
-HGDP1kG gnomAD HGDP+1kG /gbdb/hg38/varFreqs/hgdp1kFreq/hgdp1k.freq.vcf.gz AC AF
-GREGoR GREGoR /gbdb/hg38/varFreqs/gregor/gregor.vcf.gz AC AF
-SCHEMA SCHEMA /gbdb/hg38/varFreqs/schema/SCHEMA_variant_results_withAF.vcf.gz AC AF
-GA4K GA4K PacBio LR /gbdb/hg38/varFreqs/ga4k/ga4kSnv.vcf.gz AC AF
-CoLoRSdb CoLoRSdb PacBio LR /gbdb/hg38/varFreqs/colorsDb/colorsDbSnv.vcf.gz AC AF
-SVatalog SVatalog 101 10XG SR /gbdb/hg38/varFreqs/svatalog/svatalog.vcf.gz AC AF
-Tishkoff180 Tishkoff 180 African WGS /gbdb/hg38/varFreqs/_tishkoff/tishkoff180.vcf.gz AC AF
-WBBC WBBC China /gbdb/hg38/varFreqs/wbbc/wbbc.vcf.gz AC AF
-ChinaMAP China ChinaMAP /gbdb/hg38/varFreqs/_chinamap/chinamap.vcf.gz AC AF
-GenomeIndia GenomeIndia 9.7k WGS /gbdb/hg38/varFreqs/_genomeindia/genomeindia.vcf.gz AC AF
-GoNL GoNL Netherlands ~13x SR /gbdb/hg38/varFreqs/gonl/gonl.vcf.gz AC AF
+SGDP SGDP /gbdb/hg38/varFreqs/sgdpFreq/sgdp.freq.vcf.gz AC AF 0
+HGDP1kG gnomAD HGDP+1kG /gbdb/hg38/varFreqs/hgdp1kFreq/hgdp1k.freq.vcf.gz AC AF 0
+GREGoR GREGoR /gbdb/hg38/varFreqs/gregor/gregor.vcf.gz AC AF 1
+SCHEMA SCHEMA /gbdb/hg38/varFreqs/schema/SCHEMA_variant_results_withAF.vcf.gz AC AF 1
+GA4K GA4K PacBio LR /gbdb/hg38/varFreqs/ga4k/ga4kSnv.vcf.gz AC AF 1 affected
+CoLoRSdb CoLoRSdb PacBio LR /gbdb/hg38/varFreqs/colorsDb/colorsDbSnv.vcf.gz AC AF 0
+SVatalog SVatalog 101 10XG SR /gbdb/hg38/varFreqs/svatalog/svatalog.vcf.gz AC AF 0
+Tishkoff180 Tishkoff 180 African WGS /gbdb/hg38/varFreqs/_tishkoff/tishkoff180.vcf.gz AC AF 0
+WBBC WBBC China /gbdb/hg38/varFreqs/wbbc/wbbc.vcf.gz AC AF 0
+ChinaMAP China ChinaMAP /gbdb/hg38/varFreqs/_chinamap/chinamap.vcf.gz AC AF 0
+GenomeIndia GenomeIndia 9.7k WGS /gbdb/hg38/varFreqs/_genomeindia/genomeindia.vcf.gz AC AF 0
+GoNL GoNL Netherlands ~13x SR /gbdb/hg38/varFreqs/gonl/gonl.vcf.gz AC AF 0