d7817fcecf25ab8669176afc941cadd468729f4a max Tue Nov 25 08:57:14 2025 -0800 adding Singapore to variant frequencies track diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index c3678d7ace9..053f7ef112c 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,95 +1,31 @@

Description

This container shows results from projects where the variant frequencies, aka allele frequencies, are publicly available. The tracks were collected from the projects listed below. Projects that provide haplotype-phased genotypes/variants can be found elsewhere: 1000 Genomes is a separate track, and the projects HGDP, SGDP, -HGDP+1k and MXB can be found in the "Phased Variants" track. +HGDP+1000 Genomes and Mexico Biobank can be found in the "Phased Variants" track.

If you want us to add other projects, please contact us. We asked and were -unable to obtain variant frequencies from the following projects: NPM Singapore -(10k, Chinese/Indian/Malay, free, but requires login at NPM. -UK Biobank (request pending). +unable to obtain variant frequencies from the following projects: UK Biobank (request pending), All of us (granted), +SFARI SPARK (in process).

-Available on hg19 and hg38:

+The following projects were added: -

-Available only on hg38:

- +

Display Conventions

Most tracks only show the variant and allele frequencies on mouseover or clicks. When zoomed in, tracks display alleles with base-specific coloring. Homozygote data are shown as one letter, while heterozygotes will be displayed with both letters.

-

-Full haplotype display - only for the MXB and HGDP tracks: -In "pack" mode, this track sorts the haplotypes. This can be -useful for determining the similarity between the samples and inferring -inheritance at a particular locus. -Each sample's phased and/or homozygous genotypes are split into haplotypes, -clustered by similarity around a central variant (in pink), and sorted for -display by their position in the clustering tree. Click a variant to center on it. -The tree (as space allows) is drawn in the label area next to the track image. -Leaf clusters, in which all haplotypes are identical (at least for the variants -used in clustering), are colored purple. -

-

-For a full description of how the display works, please see our -Haplotype Display help page. -

For NCBI ALFA: This track has no single VCF with INFO fields, but uses multiple subtracks instead, one per ancestry.

Data Access

Most of the data in these tracks are not available for download from UCSC. -Only individual variants can be browsed on our website. +Data can be browsed on our website. But the data can be downloaded for free from the original projects. Accessing the -data usually requires a click-through license on the respectice websites: +data usually requires a click-through license on the respectice websites, links are either +provided above in the project description or with more details here:

MXB: Allele frequencies by geographical state and ancestry are available via the MexVar platform. Raw genotype data are available under controlled access at the EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email andres.moreno@cinvestav.mx.

MCPS: VCFs with summarized allele frequencies are available from the MCPS website.

Regeneron one million exomes: VCFs with summarized allele frequencies are available from @@ -257,30 +187,47 @@ processing included GenomeStudio → PLINK conversion, strand alignment, removal of duplicates, update of map positions using dbSNP Build 151 and low-quality variants/individuals, and relatedness filtering.

SGDP: The version used was https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/, merged with bcftools and lifted to hg38 with CrossMap.

KOVA: V7 of the TSV.gz was obtained from the KOVA staff and converted to VCF. It is not available for download from our site but can be requested from the KOVA website.

+

Finngen: R12 was downloaded from https://finngen.gitbook.io/documentation/data-download and converted to VCF with a Python script.

+ +

NPM Singapore: Whole Genome Sequencing (WGS) data processing followed +GATK4 best practices. GATK4 germline variant analysis workflow written in WDL +was adapted to use Nextflow and deployed at the National Supercomputing Centre, +Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the +BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single +sample gVCFs. The gVCF files were joint-called then loaded in Hail, an +open-source python-based data analysis library suited to work with +population-scale with genomic data collections. Low-quality WGS libraries and +low-quality variants were removed. QC-ed variants were functionally annotated +using Ensembl Variant Effect Predictor (VEP) (version 95). Functional +annotations for variant impacting protein-coding were also complemented with +information on the potential alteration to their cognate protein's 3D structure +and drug binding ability. +

+

Credits

MXB: We thank the Center for Research and Advanced Studies (Cinvestav) of Mexico for generating and providing the frequency data, the National Institute of Medical Sciences and Nutrition (INCMNSZ) for DNA extraction, and the Ministry of Health together with the National Institute of Public Health (INSP) for the design and implementation of the National Health Survey 2000 (ENSA 2000). We also thank the ENSA-Genomics Consortium for their contributions to sample collection and data processing that made possible the construction of the MXB genomic resource.

MCPS: Data produced by Regeneron RGC and collaborators, which are the University of Oxford, Universidad Nacional Autónoma de México (UNAM) and National Institute of Genomic Medicine in Mexico. @@ -295,30 +242,50 @@ Regeneron Million Exomes: The Regeneron Genetics Center, and its collaborators (collectively, the "Collaborators") bear no responsibility for the analyses or interpretations of the data presented here. Any opinions, insights, or conclusions presented herein are those of the authors and not of the Collaborators. This research has been conducted using the UK Biobank Resource under application number 26041.

SGDP: This project was funded by the Simons Foundation. Thanks to David Reich and Swapan Mallick for help with importing the data.

KOVA: Thanks to Insu Jang and the KOVA director for providing variant frequencies in TSV format.

+

+Finngen: We want to acknowledge the participants and investigators of the FinnGen study. +

+ +

+NPM Singapore: Thanks to the NPM Data Access Committee and Eleanor for granting our data request. +By browsing the data, you agree to use the data only for academic, non-commercial +research to improve human health (biology/disease). We request all data users +agree to protect the +confidentiality of the data subjects in any research papers or publications +that they may prepare, by taking all reasonable care to limit the possibility +of identification. In particular, the data users shall not to use, or attempt +to use, the data to deliberately compromise or otherwise infringe the +confidentiality of information on data subjects and their right to privacy. +If you use any of the data obtained from the CHORUS variant browser, we request +that you cite the NPM flagship paper (Wong et al, 2023). All data users of the +data must take note that the data provider and relevant SG10K_Health cohort +owners bear no responsibility for the further analysis or interpretation of the +data.

+

Thanks to Alex Ioannidis, UCSC, and Andreas Lahner, MGZ, for feedback on this track.

References

Barberena-Jonas, C. et al. (2025). MexVar database: Clinical genetic variation beyond the Hispanic label in the Mexican Biobank. Nature Medicine (in press).

Sohail M, Moreno-Estrada A. The Mexican Biobank Project promotes genetic discovery, inclusive science and local capacity building. Dis Model Mech. 2024 Jan 1;17(1). PMID: 38299665; PMC: Nature. 2016 Oct 13;538(7624):201-206. PMID: 27654912; PMC: PMC5161557

Lee J, Lee J, Jeon S, Lee J, Jang I, Yang JO, Park S, Lee B, Choi J, Choi BO et al. A database of 5305 healthy Korean individuals reveals genetic and clinical implications for an East Asian population. Exp Mol Med. 2022 Nov;54(11):1862-1871. PMID: 36323850; PMC: PMC9628380

+

+Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, Reeve MP, Laivuori H, +Aavikko M, Kaunisto MA et al. + +FinnGen provides genetic insights from a well-phenotyped isolated population. +Nature. 2023 Jan;613(7944):508-518. +PMID: 36653562; PMC: PMC9849126 +

+ +

+Wong E, Bertin N, Hebrard M, Tirado-Magallanes R, Bellis C, Lim WK, Chua CY, Tong PML, Chua R, Mak K +et al. + +The Singapore National Precision Medicine Strategy. +Nat Genet. 2023 Feb;55(2):178-186. +PMID: 36658435 +

+