98407c7ac7f8fda3aec01535e17b92e3a5de8177 angie Fri Nov 1 14:53:38 2019 -0700 Improving upon dbSNP's uneven descriptions of frequency-submitting projects. refs #23283 diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html index 1666146..7e84654 100644 --- src/hg/makeDb/trackDb/human/dbSnp153Composite.html +++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html @@ -80,88 +80,92 @@ Variants are colored according to functional effect on genes annotated by dbSNP. Protein-altering variants and splice site variants are red, synonymous codon variants are green, and non-coding transcript or Untranslated Region (UTR) variants are blue. </p> <p> On the track controls page, several variant properties can be included or excluded from the item labels: rs# identifier assigned by dbSNP, reference/alternate alleles, major/minor alleles (when available) and minor allele frequency (when available). -Allele frequencies are reported independently by twelve projects, as described by dbSNP: +Allele frequencies are reported independently by twelve projects: <ul> <li><a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a>: - The 1000 Genomes dataset contains data for 2,504 individuals from 26 populations. + The 1000 Genomes Phase 3 dataset contains data for 2,504 individuals from 26 populations. </li> - <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD_exomes</a>: - The GnomAD exome data set (release v2.1). + <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD exomes</a>: + The gnomAD + <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a> + exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes + in 14 populations. </li> <li><a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a>: The TOPMED dataset contains phase 3 data from freeze 5 panel that include over 60,000 individuals. The approximate ethnic breakdown is European(52%), African (31%), Hispanic or Latino (10%), and East Asian (7%) ancestry. </li> <li><a href="http://exac.broadinstitute.org/" target=_blank>ExAC</a>: The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. Individuals affected by severe pediatric disease have been removed. </li> - <li><a href="https://www.pagestudy.org/" target=_blank>PAGE_STUDY</a>: + <li><a href="https://www.pagestudy.org/" target=_blank>PAGE STUDY</a>: The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits. </li> - <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD</a>: - gnomAD v2.1 comprises a total of 16mln SNVs and 1.2mln indels from 125,748 exomes, - and 229mln SNVs and 33mln indels from 15,708 genomes. In addition to the 7 populations - already present in gnomAD 2.0.2, this release now breaks down the non-Finnish Europeans - and East Asian populations further into sub-populations. + <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD genomes</a>: + The gnomAD + <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a> + genome dataset includes 229 million SNVs and 33 million indels from 15,708 genomes + in 9 populations. </li> <li><a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a>: The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data. </li> <li><a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a>: Genetic variation in the Estonian population: pharmacogenomics study of adverse drug effects using electronic health records. </li> <li><a href="http://www.bris.ac.uk/alspac/participants/genome/" target=_blank>ALSPAC</a>: The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample including individuals obtained from the <a href="http://www.bristol.ac.uk/alspac/" target=_blank>ALSPAC population</a>. This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992. </li> <li><a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a>: The UK10K - TwinsUK project contains 1854 samples from the <a href="http://www.twinsuk.ac.uk/" target=_blank>Department of Twin Research and Genetic Epidemiology (DTR)</a>. - The dataset contains data obtained from the 11,000 identical and non-identical twins + The DTR dataset contains data obtained from the 11,000 identical and non-identical twins between the ages of 16 and 85 years old. </li> <li><a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a>: Whole-genome sequenced control population in northern Sweden reveals subregional genetic differences. This population consists of 300 whole genome sequenced human samples selected from the county of Vasterbotten in northern Sweden. To be selected for inclusion into the population, the individuals had to have reached at least 80 years of age and have no diagnosed cancer. </li> <li><a href="https://genomes.vn" target=_blank>Vietnamese</a>: - A Vietnamese Genetic Variation Database. + The Vietnamese Genetic Variation Database includes about 25 million variants (SNVs and indels) + from 406 genomes and 305 exomes of unrelated healthy Kinh Vietnamese (KHV) people. </li> </ul> The project from which to take allele frequency data defaults to 1000 Genomes but can be set to any of those projects. </p> <p> Using the track controls, variants can be filtered by <ul> <li>minimum minor allele frequency (MAF) </li> <li>variation class/type (e.g. SNV, insertion, deletion) </li> <li>functional effect on a gene (e.g. synonymous, frameshift, intron, upstream) </li>