9a49afb16653363a70d8e4d205513008b7b08df5 max Wed May 13 06:18:42 2026 -0700 varFreqs: add UK Biobank subtrack from Neale Lab Round 2 imputed-v3 variant manifest (13.7M variants, 361k white British samples). TSV → VCF conversion + CrossMap hg19→hg38, refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index dbdb68feb5e..0f2e88374dd 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -10,32 +10,31 @@ with filters, summed population frequencies and recalculated protein-effect annotations. In addition, there is one subtrack per project with the original VCF data and all the annotations that the project provides. The different projects use different pipelines and sequencing technologies, click any of the projects above or below for a summary of their sample selection, sequencing assay and software pipeline. Many projects do not allow us to distribute the data but we document how the data can be requested and provide all converters.

Data from projects that provide haplotype-phased genotypes can also be found elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP, HGDP+1000 Genomes and Mexico Biobank can also be found in the "Phased Variants" track. Their VCF versions below show only the isolate frequency per variant.

Please contact us (genome@soe.ucsc.edu), if you know a project that we should add. So far, -we already requested these: UK Biobank (pending for a year), -Regeneron's Million Exomes and Mexico City Studies (request rejected), Taiwan Biobank (pending). +Regeneron's Million Exomes and Mexico City Studies (request rejected) and Taiwan Biobank (pending).

Combined Track (All Databases)

The "All Databases Combined" track merges variants from all individual databases into a single bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals. The track supports filtering by variant type (SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained, frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum and per-database), and allele count (total or per-database). This track is either useful in dense mode for getting a quick overview of variant density across all projects, or with filters to find variants present in specific databases or within certain frequency ranges. Note that with the "clone track" feature you can clone this track and have multiple versions, each with different filters activated. You can also use our "Density mode" checkbox on the track configuration page to show a plot with the density of variants passing a filter, one per track clone. @@ -105,30 +104,39 @@ 408k WGS/WES/array mix Aggregated dbGaP studies, mixed phenotypes — Yes FinnGen R12 Finland 500k Imputed (8.5k WGS ref panel) National biobank, ~10% of population — Yes + + UK Biobank (Neale Lab v3) + UK + 361k + Imputed array (HRC+UK10K+1KGp3 ref panel) + White British subset of UK Biobank, Neale Lab Round 2 GWAS + — + Yes + SweGen Sweden 1k WGS Cross-section of Swedish population — No SCHEMA Multi-national 121k WES Schizophrenia: 24k cases, 97k controls