c3c037cd89b6095049807b367371fc559a07ee95 max Mon Nov 3 17:43:57 2025 -0800 docs update for var freqs track diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html index b68cab22aa6..dcd29543118 100644 --- src/hg/makeDb/trackDb/human/varFreqs.html +++ src/hg/makeDb/trackDb/human/varFreqs.html @@ -1,71 +1,119 @@ <h2>Description</h2> <p> This container track contains annotation tracks with variant frequencies, aka allele frequencies, from these projects: </p> <ul> <li> - <b>Mexico Biobank (MXB)</b>: This track displays alleles and their haplotype linkage from the Mexico Biobank - (MXB), based on genotyping of 6,011 individuals sampled across all 32 states of Mexico during + <b><a href="https://www.mxbiobank.org/" target=_blank>Mexico Biobank (MXB)</a></b>: This track displays + alleles and their haplotypes from the Mexico Biobank + (MXB), based on array genotyping of 6,011 individuals sampled across all 32 states of Mexico during the 2000 National Health Survey (ENSA 2000) conducted by the National Institute of Public Health (INSP). + Frequencies can be plotted onto a map on <a href="https://morenolab.shinyapps.io/mexvar/" target=_blank>MexVar</a>. + (Publication?) </li> <li> - <b>Mexico City Prospective Study (MCPS)</b>: This track displays only the allele frequencies from the Mexico City - Prospective Study (MXB). From 9,950 whole genome sequenced individuals and 141,046 exome sequenced and genotyped individuals from the Mexico City Prospective Study (MCPS). For details see Ziyatdinov A, Nature 2023 in the reference section of this page. + <b><a href="https://rgc-mcps.regeneron.com/home" target=_blank>Mexico City Prospective Study (MCPS)</a></b>: + 9,950 whole genome sequenced individuals + and 141,046 exome sequenced and genotyped individuals from the Mexico + City Prospective Study (MCPS), a collaboration between the Regeneron Genetics + Center, University of Oxford, Universidad Nacional Autónoma de México (UNAM), + National Institute of Genomic Medicine in Mexico, Abbvie Inc. and AstraZeneca + UK. For details see (Ziyatdinov A, Nature 2023), the reference section. </li> <li> - <b>Million Exomes Project (ME)</b>: Variant frequencies from whole-exome sequencing data from 983,578 individuals sequenced by the Regeneron Genetics Center (RGC). These data span dozens of collaborations including large biobanks and health systems. All data were generated by the RGC on a single, harmonized sequencing and informatics protocol. The dataset includes individuals across diverse ancestral populations, encompassing outbred and founder populations and cohorts with high rates of consanguinity. - </li> + <b><a href="https://rgc-research.regeneron.com/me/home" + target=_blank>Regeneron Million Exomes Project (ME)</a></b>: + Whole-exomes of + 983,578 individuals sequenced by the Regeneron Genetics Center (RGC). + These data span dozens of collaborations including large biobanks and + health systems. All data were generated by the RGC on a single, harmonized +sequencing and informatics protocol. The dataset includes individuals across +diverse ancestral populations, encompassing outbred and founder populations and +cohorts with high rates of consanguinity. See (Sun et al, Nature 2024) for details. </li> <li> - <b>NHLBI TOPMED Freeze 8</b>: NHLBI TOPMed (Trans-Omics for Precision Medicine) program, launched by the U.S. National Heart, Lung, and Blood Institute, integrates whole-genome sequencing with molecular, clinical, and environmental data from large, well-phenotyped cohorts. Its goal is to uncover the biological mechanisms underlying heart, lung, blood, and sleep disorders to advance precision medicine and improve population health. Freeze 10 contains 868,581,653 variants from 150,899 whole genomes - </li> + <b><a href="https://topmed.nhlbi.nih.gov/" target=_blank>NHLBI TOPMED + Freeze 10</a></b>: NHLBI TOPMed (Trans-Omics for Precision + Medicine) program, launched by the U.S. National Heart, Lung, and Blood + Institute, integrates whole-genome sequencing with molecular, clinical, + and environmental data from large, well-phenotyped cohorts. Its goal is to + uncover the biological mechanisms underlying heart, lung, blood, and sleep +disorders to advance precision medicine and improve population health. Freeze +10 contains 868,581,653 variants from 150,899 whole genomes. VCFs were +downloaded from <a href="https://bravo.sph.umich.edu/terms.html" + target=_blank>BRAVO</a>. </li> + <li> + <b>GenomeAsia Pilot (GAsP) </b>: Whole-genome sequencing data of 1,739 + individuals from 219 population groups across Asia. See (GenomeAsia + Consortium, Nature 2019) for details. </li> </ul> <h2>Display Conventions</h2> +<p>Most tracks only show the variant and allele frequencies on mouseover or clicks. +When zoomed in, tracks display alleles with base-specific coloring. Homozygote +data are shown as one letter, while heterozygotes will be displayed with both +letters. +</p> + <p> -In "pack" mode, the MXB track sorts the haplotypes. This can be useful for determining the -similarity between the samples and inferring inheritance at a particular locus. +For the MXB track: In "pack" mode, this track sorts the haplotypes. This can be +useful for determining the similarity between the samples and inferring +inheritance at a particular locus. For a full description of how the display works, please see our <a href="../goldenpath/help/hgVcfTrackHelp.html">Haplotype Display help page</a>. Briefly, each sample's phased and/or homozygous genotypes are split into haplotypes, clustered by similarity around a central variant (in pink), and sorted for display by their position in the clustering tree. Click a variant to center on it. The tree (as space allows) is drawn in the label area next to the track image. Leaf clusters, in which all haplotypes are identical (at least for the variants used in clustering), are colored purple. </p> -<p> -When zoomed it, it display alleles with base-specific coloring. Homozygote -data are shown as one letter, while heterozygotes will be displayed with both -letters. -</p> <h2>Data Access</h2> +<p>Most of the data in these tracks are not available for download from UCSC. +Only individual variants can be browsed on our website. +But the data can be downloaded +for free from the original projects. Accessing the +data usually requires a click-through license on the respectice websites: +</p> + <p> MXB: Allele frequencies by geographical state and ancestry are available via the <a target=_blank href="https://morenolab.shinyapps.io/mexvar/">MexVar platform</a>. Raw genotype data are available under controlled access at the -EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). +EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email andres.moreno@cinvestav.mx. </p> <p> -MCPS: Summarized allele frequencies are available from +MCPS: VCFs with summarized allele frequencies are available from the <a target=_blank href="https://rgc-mcps.regeneron.com/">MCPS website</a>. </p> +<p> +Regeneron one million exomes: VCFs with summarized allele frequencies are available from +the <a target=_blank href="https://rgc-research.regeneron.com/me/resources">RGC ME website</a>. +</p> +<p> +TOPMED: VCFs with summarized allele frequencies are available from +the <a target=_blank href="https://bravo.sph.umich.edu/">TOPMED BRAVO website</a>. They require a login. +</p> +<p> +GenomeAsia Pilot: VCFs are available from UCSC and also from the +the <a target=_blank href="https://browser.genomeasia100k.org/#tid=download">GenomeAsia 100K website</a>. No license nor login. +</p> <h2>Methods</h2> <p> MXB: Genotyping was performed with the Illumina Multi-Ethnic Global Array (MEGA, ~1.8M SNPs), optimized for admixed populations and enriched for ancestry-informative and medically relevant variants. Only autosomal, biallelic SNPs passing quality control are included. Samples were selected from 898 recruitment sites, with prioritization of indigenous language speakers. Data processing included GenomeStudio → PLINK conversion, strand alignment, removal of duplicates, update of map positions using dbSNP Build 151 and low-quality variants/individuals, and relatedness filtering. </p> <h2>Credits</h2> <p> @@ -77,30 +125,39 @@ the ENSA-Genomics Consortium for their contributions to sample collection and data processing that made possible the construction of the MXB genomic resource. </p> <p> MCPS: Data produced by Regeneron RGC and collaborators, which are the University of Oxford, Universidad Nacional Autónoma de México (UNAM) and National Institute of Genomic Medicine in Mexico. The Regeneron Genetics Center, University of Oxford, Universidad Nacional Autónoma de México (UNAM), National Institute of Genomic Medicine in Mexico, Abbvie Inc. and AstraZeneca UK Limited (collectively, the “Collaborators”) bear no responsibility for the analyses or interpretations of the data presented here. Any opinions, insights, or conclusions presented herein are those of the authors and not of the Collaborators. </p> </p> +<p> +Regeneron Million Exomes: The Regeneron Genetics Center, and its collaborators +(collectively, the “Collaborators”) bear no responsibility for the analyses or +interpretations of the data presented here. Any opinions, insights, or +conclusions presented herein are those of the authors and not of the +Collaborators. This research has been conducted using the UK Biobank Resource +under application number 26041. +</p> + <h2>References</h2> <p> Barberena-Jonas, C. et al. (2025). MexVar database: Clinical genetic variation beyond the Hispanic label in the Mexican Biobank. <em>Nature Medicine (in press)</em>. </p> <p> Sohail M, Moreno-Estrada A. <a href="https://journals.biologists.com/dmm/article-lookup/doi/10.1242/dmm.050522" target="_blank"> The Mexican Biobank Project promotes genetic discovery, inclusive science and local capacity building</a>. <em>Dis Model Mech</em>. 2024 Jan 1;17(1). PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38299665" target="_blank">38299665</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10855211/" target="_blank">PMC10855211</a> @@ -114,15 +171,34 @@ <em>Nature</em>. 2023 Oct;622(7984):775-783. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37821706" target="_blank">37821706</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10600006/" target="_blank">PMC10600006</a> </p> <p> Ziyatdinov A, Torres J, Alegre-Díaz J, Backman J, Mbatchou J, Turner M, Gaynor SM, Joseph T, Zou Y, Liu D <em>et al</em>. <a href="https://doi.org/10.1038/s41586-023-06595-3" target="_blank"> Genotyping, sequencing and analysis of 140,000 adults from Mexico City</a>. <em>Nature</em>. 2023 Oct;622(7984):784-793. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37821707" target="_blank">37821707</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10600010/" target="_blank">PMC10600010</a> </p> +<p> +GenomeAsia100K Consortium. +<a href="https://doi.org/10.1038/s41586-019-1793-z" target="_blank"> +The GenomeAsia 100K Project enables genetic discoveries across Asia</a>. +<em>Nature</em>. 2019 Dec;576(7785):106-111. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/31802016" target="_blank">31802016</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7054211/" target="_blank">PMC7054211</a> +</p> + +<p> +Sun KY, Bai X, Chen S, Bao S, Zhang C, Kapoor M, Backman J, Joseph T, Maxwell E, Mitra G <em>et +al</em>. +<a href="https://doi.org/10.1038/s41586-024-07556-0" target="_blank"> +A deep catalogue of protein-coding variation in 983,578 individuals</a>. +<em>Nature</em>. 2024 Jul;631(8021):583-592. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38768635" target="_blank">38768635</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11254753/" target="_blank">PMC11254753</a> +</p> +