9a49afb16653363a70d8e4d205513008b7b08df5 max Wed May 13 06:18:42 2026 -0700 varFreqs: add UK Biobank subtrack from Neale Lab Round 2 imputed-v3 variant manifest (13.7M variants, 361k white British samples). TSV → VCF conversion + CrossMap hg19→hg38, refs #36642 Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/human/ukbb.html src/hg/makeDb/trackDb/human/ukbb.html new file mode 100644 index 00000000000..4a9ba780122 --- /dev/null +++ src/hg/makeDb/trackDb/human/ukbb.html @@ -0,0 +1,118 @@ +

Description

+

+This track shows allele frequencies and imputation quality scores for +13,743,085 variants observed in 361,194 UK Biobank participants of white +British ancestry. The +UK Biobank +is a prospective study of around 500,000 adults aged 40-69 at recruitment +in the UK, with linked genotype, imaging and health-record data. The +allele counts shown here are taken from the Neale Lab's open release of +imputed-v3 GWAS results, which the Lab made freely available as a +companion to their large phenotype-wide GWAS of UK Biobank (Round 2 of +the +Neale Lab +UK Biobank GWAS). +

+ +

+The Neale Lab pipeline restricts to white British ancestry to limit +population-stratification confounding in the GWAS. As a consequence the +frequencies in this track are not representative of the multi-ancestry UK +Biobank cohort - they describe a single population subset. The +gnomAD HGDP+1kG, +ToMMo Japan, +AllOfUs and other tracks in this +collection provide complementary frequencies from other populations. +

+ +

Display

+

+The track uses the standard UCSC VCF display. Hover over a variant to see +the allele frequency, imputation INFO score, HWE p-value, hom-ref / het / +hom-alt sample counts and the most-severe VEP consequence reported by +the Neale Lab. +

+ +

Methods

+

+UK Biobank participants were genotyped on the UK Biobank Axiom and UK +BiLEVE Axiom arrays. The Wellcome Trust Centre for Human Genetics imputed +the array data against a combined reference panel of the Haplotype +Reference Consortium, UK10K and 1000 Genomes Phase 3, producing +approximately 90 million imputed SNPs. The Neale Lab Round 2 (imputed-v3) +analysis started from the 487,409 individuals with phased and imputed +genotype data, filtered to 361,194 unrelated samples of white British +ancestry, and retained variants with imputation INFO score above 0.8, +minor allele frequency above 0.001 (or 1e-6 for coding variants) and +HWE p-value above 1e-10, yielding 13.7 million SNPs and short indels on +chromosomes 1-22 and X. Variant consequences are from Ensembl VEP. See +the Neale Lab +data +processing blog post and the +UK_Biobank_GWAS +GitHub repository for the full pipeline. +

+ +

+The variant manifest +variants.tsv.bgz was downloaded from the Neale Lab +UK Biobank +GWAS results page. The Neale Lab release uses GRCh37 coordinates and +provides chromosome, position, reference and alternate alleles, dbSNP +rsID, VEP consequence, imputation INFO score, allele count and +frequency, Hardy-Weinberg p-value and per-genotype sample counts. We +converted the TSV to a sites-only VCF using a custom Python script and +lifted the coordinates to GRCh38 with CrossMap and the UCSC +hg19ToHg38.over.chain. 39,659 rows with allele count zero (variants +present only in the imputation panel) were dropped, 6,889 failed +liftOver and 1,834 mapped to alt/random/fix contigs, leaving 13,743,085 +variants in the final file. AN was set to twice the +n_called field, following the Neale Lab convention. +The full pipeline is documented in the +makeDoc +file of the track, and the conversion script is available from +our +GitHub repository. +

+ +

Data Access

+

+The variant frequencies can be explored interactively using the +Table Browser or the +Data Integrator, and exported to +spreadsheet or tab-separated tables. From scripts, data can be accessed +via our REST API +with track=ukbb. +

+

+The VCF file is also available from +our +download server as ukbb.vcf.gz. Individual regions can be +extracted with tabix, for example +tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/ukbb/ukbb.vcf.gz chr21:1-100000000. +The original Neale Lab manifest variants.tsv.bgz is linked from +the +Neale Lab UK +Biobank GWAS results page and is distributed under UK Biobank's +data-access conditions. +

+ +

Credits

+

+Thanks to the UK Biobank participants and to Benjamin Neale, Liam +Abbott, Raymond Walters, Duncan Palmer and the rest of the Neale Lab for +making the Round 2 imputed-v3 GWAS results, including the variant +manifest used here, publicly available. +

+ +

References

+

+Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, +O'Connell J et al. + +The UK Biobank resource with deep phenotyping and genomic data. +Nature. 2018 Oct;562(7726):203-209. +PMID: 30305743; PMC: PMC6786975 +

+