9bfd58221b1539193cb7f0a317b4e959c1c7e49a
max
Thu May 21 01:00:45 2026 -0700
varFreqs: AI generated text sounds bad, hard to read, so remove typical AI language. "humanizer" pass on all 31 varFreqs description pages — cut em dashes, copula avoidance ("serves as", "stands as"), "-ing" puffery, and boilerplate filler ("We provide documentation that indicates how..."). Title-case headings and meaningful emphasis preserved. No facts/URLs/counts/versions changed. tpmi.html added as a new file (was previously uncommitted). refs #36642
Co-Authored-By: Claude Sonnet 4.6
The Neale Lab pipeline restricts to white British ancestry to limit population-stratification confounding in the GWAS. As a consequence the frequencies in this track are not representative of the multi-ancestry UK -Biobank cohort - they describe a single population subset. The +Biobank cohort. They describe a single population subset. The gnomAD HGDP+1kG, ToMMo Japan, AllOfUs and other tracks in this collection provide complementary frequencies from other populations.
The track uses the standard UCSC VCF display. Hover over a variant to see the allele frequency, imputation INFO score, HWE p-value, hom-ref / het / hom-alt sample counts and the most-severe VEP consequence reported by the Neale Lab.
UK Biobank participants were genotyped on the UK Biobank Axiom and UK BiLEVE Axiom arrays. The Wellcome Trust Centre for Human Genetics imputed the array data against a combined reference panel of the Haplotype -Reference Consortium, UK10K and 1000 Genomes Phase 3, producing +Reference Consortium, UK10K and 1000 Genomes Phase 3. This produced approximately 90 million imputed SNPs. The Neale Lab Round 2 (imputed-v3) analysis started from the 487,409 individuals with phased and imputed genotype data, filtered to 361,194 unrelated samples of white British ancestry, and retained variants with imputation INFO score above 0.8, minor allele frequency above 0.001 (or 1e-6 for coding variants) and -HWE p-value above 1e-10, yielding 13.7 million SNPs and short indels on -chromosomes 1-22 and X. Variant consequences are from Ensembl VEP. See +HWE p-value above 1e-10. The final set has 13.7 million SNPs and short +indels on chromosomes 1-22 and X. Variant consequences are from Ensembl VEP. See the Neale Lab data processing blog post and the UK_Biobank_GWAS GitHub repository for the full pipeline.
The variant manifest variants.tsv.bgz was downloaded from the Neale Lab UK Biobank GWAS results page. The Neale Lab release uses GRCh37 coordinates and provides chromosome, position, reference and alternate alleles, dbSNP rsID, VEP consequence, imputation INFO score, allele count and frequency, Hardy-Weinberg p-value and per-genotype sample counts. We converted the TSV to a sites-only VCF using a custom Python script and lifted the coordinates to GRCh38 with CrossMap and the UCSC hg19ToHg38.over.chain. 39,659 rows with allele count zero (variants present only in the imputation panel) were dropped, 6,889 failed liftOver and 1,834 mapped to alt/random/fix contigs, leaving 13,743,085 variants in the final file. AN was set to twice the -n_called field, following the Neale Lab convention. +n_called field, per the Neale Lab convention. The full pipeline is documented in the makeDoc file of the track, and the conversion script is available from our GitHub repository.
-The variant frequencies can be explored interactively using the +The variant frequencies can be explored with the Table Browser or the Data Integrator, and exported to spreadsheet or tab-separated tables. From scripts, data can be accessed via our REST API with track=ukbb.
The VCF file is also available from our download server as ukbb.vcf.gz. Individual regions can be extracted with tabix, for example tabix http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/ukbb/ukbb.vcf.gz chr21:1-100000000. The original Neale Lab manifest variants.tsv.bgz is linked from the Neale Lab UK