68c5b3b5dfc4053ff78a6b1d236bd1ac90251cfa lrnassar Mon Jun 1 14:40:45 2026 -0700 varFreqs: description pages for the three combined tracks and "SNV" rename sweep. Add varFreqsDisease.html and varFreqsArray.html so the two new combined tracks have full Description/Display/Methods/Data Access/References. Add a Caveats section on varFreqsArray about chip-data quality vs sequencing. Update varFreqsAll.html and the supertrack varFreqs.html to reflect the three-combined-track family (cross-links between siblings, new "Combined Tracks" section, new table rows, and updated source/variant counts). Add a GoNL row to the supertrack table. Sweep 37 subtrack longLabels and four cross-referencing description pages (colorsDbSnv.html, mei.html, meiSwegen.html, phasedVars.html) from "Variant Frequencies:" to "SNV Frequencies:" to match the supertrack shortLabel. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index fa9d6dbb231..bb8288f2744 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,76 +1,99 @@

Description

This supertrack collects variant allele frequencies from population-scale sequencing and genotyping projects worldwide, from a total of ~1.7 million genomes/exomes/arrays. -The data was not reprocessed in a harmonized way; the variant VCFs were collected from the projects as-is. -The goal is a single place to compare how common -a variant is across different populations, ancestries, and cohorts, for -projects that cannot be recomputed by gnomAD soon. The main -combined track merges all databases into one summary track, -with filters, summed population frequencies and recalculated protein-effect annotations. -There is also one subtrack per project with the original VCF data and all the annotations that the project provides. -The different projects use different pipelines and sequencing technologies. Click any of the projects -above or below for a summary of their sample selection, sequencing assay and software pipeline. -Many projects do not allow us to distribute the data, but we document how to request it -and provide all converters.

+The data was not reprocessed in a harmonized way; the variant VCFs were collected from the +projects as-is. The goal is a single place to compare how common a variant is across +different populations, ancestries, and cohorts, for projects that cannot be recomputed by +gnomAD soon. Three combined tracks aggregate the source data along different lines, and +there is also one subtrack per project with the original VCF data and all the annotations +that the project provides. The different projects use different pipelines and sequencing +technologies. Click any of the projects above or below for a summary of their sample +selection, sequencing assay and software pipeline. Many projects do not allow us to +distribute the data, but we document how to request it and provide all converters. +

Data from projects that provide haplotype-phased genotypes can also be found elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank can also be found in the "Phased Variants" track. Their VCF versions below show only the isolate frequency per variant.

Please contact us (genome@soe.ucsc.edu) if you know of a project that we should add. So far, Regeneron's Million Exomes and Mexico City Studies (request rejected) and Taiwan Biobank (pending).

-

Combined Track (All Databases)

+

Combined Tracks

-The "All Databases Combined" track merges variants from all individual databases into a single -bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals. -The track supports filtering by variant type -(SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained, -frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum -and per-database), and allele count (total or per-database). The track is useful in dense mode -to get a quick overview of variant density across all projects, or with filters to find -variants present in specific databases or within certain frequency ranges. With the "clone track" -feature you can clone this track and keep multiple versions, each with different filters activated. -The "Density mode" checkbox on the track configuration page shows a plot of the -density of variants passing a filter, one per track clone. +Three combined tracks merge variants from the individual subtracks into single bigBed files +with predicted protein consequences and cross-database filtering. All three use the same +filter conventions (variant type, consequence, source database, allele frequency, allele +count, and per-database AF/AC).

+

Available Datasets

- - - - - - + + + + + + + + + + + + + + + + + + + + + + + + @@ -122,30 +145,39 @@ + + + + + + + + +
Database Region N Data Type Cohort Sub-populations Downloadable from UCSC
All Databases combinedAll below1.7milWGS/WES/imputedAll Databases CombinedSequencing-based, all below~1.7milWGS/WES/long-read1.34B variantsPhenotype splits for SPARK, SFARI WGS, GREGoRNo
Disease-related Databases CombinedSPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K~300kWGS/WES/long-read932M variantsSPARK ASD/Non-ASD, SFARI WGS ASD/Non-ASD, SCHEMA case/control, GREGoR aff/unaff/unknownNo
Genotyping Array Databases CombinedTPMI, MexBB, UKBB~530kArray / imputed14.7M variants No
AllOfUs v7 USA 245k WGS General population, diverse African, Indigenous American, East Asian, European, Oceanian, South Asian (local ancestry; see Notes below) Yes
TOPMED Freeze 10 USA361k Imputed array (HRC+UK10K+1KGp3 ref panel) White British subset of UK Biobank, Neale Lab Round 2 GWAS Yes
SweGen Sweden 1k WGS Cross-section of Swedish population No
GoNLNetherlands498WGS (~13x)250 unrelated Dutch trios (parents only)Yes
SCHEMA Multi-national 121k WES Schizophrenia: 24k cases, 97k controls Yes
Japan ToMMO 61k Japan 61k WGS General population