src/hg/makeDb/trackDb/human/dbSnp153Composite.html 98407c7ac7f8fda3aec01535e17b92e3a5de8177

98407c7ac7f8fda3aec01535e17b92e3a5de8177
angie
  Fri Nov 1 14:53:38 2019 -0700
Improving upon dbSNP's uneven descriptions of frequency-submitting projects.  refs #23283

diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html
index 1666146..7e84654 100644
--- src/hg/makeDb/trackDb/human/dbSnp153Composite.html
+++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html
@@ -80,88 +80,92 @@
 Variants are colored according to functional effect on genes annotated by dbSNP.
 Protein-altering variants and splice site variants are
 red,
 synonymous codon variants are
 green,
 and non-coding transcript or Untranslated Region (UTR) variants are
 blue.
 </p>
 <p>
 On the track controls page, several variant properties can be included or excluded from
 the item labels:
 rs# identifier assigned by dbSNP,
 reference/alternate alleles,
 major/minor alleles (when available) and
 minor allele frequency (when available).
-Allele frequencies are reported independently by twelve projects, as described by dbSNP:
+Allele frequencies are reported independently by twelve projects:
   <ul>
     <li><a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a>:
-      The 1000 Genomes dataset contains data for 2,504 individuals from 26 populations.
+      The 1000 Genomes Phase 3 dataset contains data for 2,504 individuals from 26 populations.
     </li>
-    <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD_exomes</a>:
-      The GnomAD exome data set (release v2.1).
+    <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD exomes</a>:
+      The gnomAD
+      <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a>
+      exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes
+      in 14 populations.
     </li>
     <li><a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a>:
       The TOPMED dataset contains phase 3 data from freeze 5 panel that include over 60,000
       individuals. The approximate ethnic breakdown is European(52%), African (31%),
       Hispanic or Latino (10%), and East Asian (7%) ancestry.
     </li>
     <li><a href="http://exac.broadinstitute.org/" target=_blank>ExAC</a>:
       The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals
       sequenced as part of various disease-specific and population genetic studies.
       Individuals affected by severe pediatric disease have been removed.
     </li>
-    <li><a href="https://www.pagestudy.org/" target=_blank>PAGE_STUDY</a>:
+    <li><a href="https://www.pagestudy.org/" target=_blank>PAGE STUDY</a>:
       The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of
       Complex Traits.
     </li>
-    <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD</a>:
-      gnomAD v2.1 comprises a total of 16mln SNVs and 1.2mln indels from 125,748 exomes,
-      and 229mln SNVs and 33mln indels from 15,708 genomes. In addition to the 7 populations
-      already present in gnomAD 2.0.2, this release now breaks down the non-Finnish Europeans
-      and East Asian populations further into sub-populations.
+    <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD genomes</a>:
+      The gnomAD
+      <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a>
+      genome dataset includes 229 million SNVs and 33 million indels from 15,708 genomes
+      in 9 populations.
     </li>
     <li><a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a>:
       The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples
       drawn from multiple ESP cohorts and represents all of the ESP exome variant data.
     </li>
     <li><a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a>:
       Genetic variation in the Estonian population: pharmacogenomics study of
       adverse drug effects using electronic health records.
     </li>
     <li><a href="http://www.bris.ac.uk/alspac/participants/genome/" target=_blank>ALSPAC</a>:
       The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample
       including individuals obtained from the
       <a href="http://www.bristol.ac.uk/alspac/" target=_blank>ALSPAC population</a>.
       This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992.
     </li>
     <li><a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a>:
       The UK10K - TwinsUK project contains 1854 samples from the
       <a href="http://www.twinsuk.ac.uk/" target=_blank>Department of Twin Research and
       Genetic Epidemiology (DTR)</a>.
-      The dataset contains data obtained from the 11,000 identical and non-identical twins
+      The DTR dataset contains data obtained from the 11,000 identical and non-identical twins
       between the ages of 16 and 85 years old.
     </li>
     <li><a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a>:
       Whole-genome sequenced control population in northern Sweden reveals subregional
       genetic differences.  This population consists of 300 whole genome sequenced human samples
       selected from the county of Vasterbotten in northern Sweden. To be selected for inclusion
       into the population, the individuals had to have reached at least 80 years of age and have
       no diagnosed cancer.
     </li>
     <li><a href="https://genomes.vn" target=_blank>Vietnamese</a>:
-      A Vietnamese Genetic Variation Database.
+      The Vietnamese Genetic Variation Database includes about 25 million variants (SNVs and indels)
+      from 406 genomes and 305 exomes of unrelated healthy Kinh Vietnamese (KHV) people.
     </li>
   </ul>
 The project from which to take allele frequency data defaults to 1000 Genomes
 but can be set to any of those projects.
 </p>
 <p>
 Using the track controls, variants can be filtered by
 
   <ul>
     <li>minimum minor allele frequency (MAF)
     </li>
     <li>variation class/type (e.g. SNV, insertion, deletion)
     </li>
     <li>functional effect on a gene (e.g. synonymous, frameshift, intron, upstream)
     </li>