68c5b3b5dfc4053ff78a6b1d236bd1ac90251cfa lrnassar Mon Jun 1 14:40:45 2026 -0700 varFreqs: description pages for the three combined tracks and "SNV" rename sweep. Add varFreqsDisease.html and varFreqsArray.html so the two new combined tracks have full Description/Display/Methods/Data Access/References. Add a Caveats section on varFreqsArray about chip-data quality vs sequencing. Update varFreqsAll.html and the supertrack varFreqs.html to reflect the three-combined-track family (cross-links between siblings, new "Combined Tracks" section, new table rows, and updated source/variant counts). Add a GoNL row to the supertrack table. Sweep 37 subtrack longLabels and four cross-referencing description pages (colorsDbSnv.html, mei.html, meiSwegen.html, phasedVars.html) from "Variant Frequencies:" to "SNV Frequencies:" to match the supertrack shortLabel. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index fa9d6dbb231..bb8288f2744 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,76 +1,99 @@

Description

This supertrack collects variant allele frequencies from population-scale sequencing and genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays. -The data was not reprocessed in a harmonized way; the variant VCFs were collected from the projects as-is. -The goal is a single place to compare how common -a variant is across different populations, ancestries, and cohorts, for -projects that cannot be recomputed by gnomAD soon. The main -combined track merges all databases into one summary track, -with filters, summed population frequencies and recalculated protein-effect annotations. -There is also one subtrack per project with the original VCF data and all the annotations that the project provides. -The different projects use different pipelines and sequencing technologies. Click any of the projects -above or below for a summary of their sample selection, sequencing assay and software pipeline. -Many projects do not allow us to distribute the data, but we document how to request it -and provide all converters.

+The data was not reprocessed in a harmonized way; the variant VCFs were collected from the +projects as-is. The goal is a single place to compare how common a variant is across +different populations, ancestries, and cohorts, for projects that cannot be recomputed by +gnomAD soon. Three combined tracks aggregate the source data along different lines, and +there is also one subtrack per project with the original VCF data and all the annotations +that the project provides. The different projects use different pipelines and sequencing +technologies. Click any of the projects above or below for a summary of their sample +selection, sequencing assay and software pipeline. Many projects do not allow us to +distribute the data, but we document how to request it and provide all converters. +

Data from projects that provide haplotype-phased genotypes can also be found elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank can also be found in the "Phased Variants" track. Their VCF versions below show only the isolate frequency per variant.

Please contact us (genome@soe.ucsc.edu) if you know of a project that we should add. So far, Regeneron's Million Exomes and Mexico City Studies (request rejected) and Taiwan Biobank (pending).

Combined Track (All Databases)

Combined Tracks

-The "All Databases Combined" track merges variants from all individual databases into a single -bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals. -The track supports filtering by variant type -(SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained, -frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum -and per-database), and allele count (total or per-database). The track is useful in dense mode -to get a quick overview of variant density across all projects, or with filters to find -variants present in specific databases or within certain frequency ranges. With the "clone track" -feature you can clone this track and keep multiple versions, each with different filters activated. -The "Density mode" checkbox on the track configuration page shows a plot of the -density of variants passing a filter, one per track clone. +Three combined tracks merge variants from the individual subtracks into single bigBed files +with predicted protein consequences and cross-database filtering. All three use the same +filter conventions (variant type, consequence, source database, allele frequency, allele +count, and per-database AF/AC).

All Databases Combined — 1.34 + billion variants from 28 sequencing-based cohorts (WGS, WES, long-read). The default + summary view of the supertrack. Excludes the genotyping-array cohorts.
Disease-related Databases Combined + — 932 million variants from six disease-focused cohorts (SPARK, SFARI WGS, + TOPMed, SCHEMA, GREGoR, GA4K), with phenotype-stratified AC/AF where the source + provides it.
Genotyping Array Databases Combined + — 14.7 million variants from three array cohorts (TPMI Taiwan, Mexico Biobank, + UK Biobank imputed). Kept separate because chip data has different per-variant + confidence than sequencing.

Available Datasets

- - - - - - + + + + + + + + + + + + + + + + + + + + + + + + @@ -122,30 +145,39 @@ + + + + + + + + +

Database	Region	N	Data Type	Cohort	Sub-populations	Downloadable from UCSC
All Databases combined	All below	1.7mil	WGS/WES/imputed			All Databases Combined	Sequencing-based, all below	~1.7mil	WGS/WES/long-read	1.34B variants	Phenotype splits for SPARK, SFARI WGS, GREGoR	No
Disease-related Databases Combined	SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K	~300k	WGS/WES/long-read	932M variants	SPARK ASD/Non-ASD, SFARI WGS ASD/Non-ASD, SCHEMA case/control, GREGoR aff/unaff/unknown	No
Genotyping Array Databases Combined	TPMI, MexBB, UKBB	~530k	Array / imputed	14.7M variants	—	No
AllOfUs v7	USA	245k	WGS	General population, diverse	African, Indigenous American, East Asian, European, Oceanian, South Asian (local ancestry; see Notes below)	Yes
TOPMED Freeze 10	USA	361k	Imputed array (HRC+UK10K+1KGp3 ref panel)	White British subset of UK Biobank, Neale Lab Round 2 GWAS	—	Yes
SweGen	Sweden	1k	WGS	Cross-section of Swedish population	—	No
GoNL	Netherlands	498	WGS (~13x)	250 unrelated Dutch trios (parents only)	—	Yes
SCHEMA	Multi-national	121k	WES	Schizophrenia: 24k cases, 97k controls	—	Yes
Japan ToMMO 61k	Japan	61k	WGS	General population