18bc9c0ef9b603af3303c4c8c04d36a5b4d97361
gperez2
  Fri Jul 22 15:08:09 2022 -0700
Updating the descriptions of the project studies for dbSNP 155,  refs #27751

diff --git src/hg/makeDb/trackDb/human/dbSnp155Composite.html src/hg/makeDb/trackDb/human/dbSnp155Composite.html
index b8f4da8..94f2b71 100644
--- src/hg/makeDb/trackDb/human/dbSnp155Composite.html
+++ src/hg/makeDb/trackDb/human/dbSnp155Composite.html
@@ -93,156 +93,176 @@
 <p><b><font color=red>Protein-altering variants and splice site variants are
 red</font></b>.
 <br><b><font color=green>Synonymous codon variants are
 green</font></b>.
 <br><b><font color=blue>
 Non-coding transcript or Untranslated Region (UTR) variants are
 blue</font></b>.
 </p>
 <p>
 On the track controls page, several variant properties can be included or excluded from
 the item labels:
 rs# identifier assigned by dbSNP,
 reference/alternate alleles,
 major/minor alleles (when available) and
 minor allele frequency (when available).
-Allele frequencies are reported independently by thirty one projects
+Allele frequencies are reported independently by the project
 (some of which may have overlapping sets of samples):
 <ul>
 <li>
 <a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a>:
 The 1000 Genomes dataset contains data for 2,504 individuals from 26 populations.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/" target=_blank>dbGaP_PopFreq</a>:
-A new source of dbGaP aggregated frequency data (&gt;1 Million Subjects) provided by dbSNP.
+The new source of dbGaP aggregated frequency data (>1 Million Subjects) provided by dbSNP.
 </li>
 <li>
 <a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a>:
 The TOPMED dataset contains freeze 8 panel that includes about 158,000 individuals. The approximate ethnic breakdown is European(41%), African (31%), Hispanic or Latino (15%), East Asian (9%), and unknown (4%) ancestry.
 </li>
 <li>
 <a href="https://academic.oup.com/database/article/doi/10.1093/database/baz146/5775747" target=_blank>KOREAN</a>:
-1465 Korean individuals
+The Korean Reference Genome Database contains data for 1,465 Korean individuals.
 </li>
 <li>
 <a href="https://www.simonsfoundation.org/simons-genome-diversity-project/" target=_blank>SGDP_PRJ</a>:
-263 C-panel fully public samples and 16 B-panel fully public samples for a total of 279 samples.
+The Simons Genome Diversity Project dataset contains 263 C-panel fully public samples and 16 B-panel
+fully public samples for a total of 279 samples.
 </li>
 <li>
 <a href="https://geneticmedicine.weill.cornell.edu/research/population-genetics" target=_blank>Qatari</a>:
-Initial mappings of the genomes of more than 1,000 Qatari nationals
+The dataset contains initial mappings of the genomes of more than 1,000 Qatari nationals.
 </li>
 <li>
 <a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a>:
-whole-genome sequenced control population in northern Sweden reveals subregional genetic differences.  This population consists of 300 whole genome sequenced human samples selected from the county of Vasterbotten in northern Sweden. To be selected for inclusion into the population, the individuals had to have reached at least 80 years of age and have no diagnosed cancer.
+The dataset contains 300 whole-genome sequenced human samples from the county of Vasterbotten in
+northern Sweden.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA267856" target=_blank>Siberian</a>:
-This project contains paired-end whole-genome sequencing data of 28 modern-day humans from Siberia and Western Russia. The genomes were sequenced in high coverage (&gt;30x, mean coverage = 39x) using Illumina HiSeq platform.
+The dataset contains paired-end whole-genome sequencing data of 28 modern-day humans from Siberia
+and Western Russia.
 </li>
 <li>
 <a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a>:
 The UK10K - TwinsUK project contains 1854 samples from the Department of Twin Research and Genetic Epidemiology (DTR). The dataset contains data obtained from the 11,000 identical and non-identical twins between the ages of 16 and 85 years old.
 </li>
 <li>
 <a href="https://jmorp.megabank.tohoku.ac.jp/201905/downloads/" target=_blank>TOMMO</a>:
-an allele frequency panel of 3552 Japanese individuals including the X chromosome
+The Tohoku Medical Megabank Project contains an allele frequency panel of 3552 Japanese individuals,
+including the X chromosome.
 </li>
 <li>
 <a href="https://www.bristol.ac.uk/alspac/" target=_blank>ALSPAC</a>:
 The UK10K - Avon Longitudinal Study of Parents and Children project contains 1927 sample including individuals obtained from the ALSPAC population. This population contains more than 14,000 mothers enrolled during pregnancy in 1991 and 1992.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB19794" target=_blank>GENOME_DK</a>:
-Sequencing of Danish parent-offspring trios to determine genomic variation within the Danish population. First release comprises of ten trios sequenced to 50X using libraries of insert sizes from 180nt to 800nt.
+The dataset contains the sequencing of Danish parent-offspring trios to determine genomic variation
+within the Danish population.
 </li>
 <li>
 <a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD</a>:
-a catalog containing 602M SNVs and 105M indels based on the whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome. By increasing the number of whole genomes almost 5-fold from gnomAD v2.1, this release represents a massive leap in analysis power for anyone interested in non-coding regions of the genome or in coding regions poorly captured by exome sequencing.  In addition, gnomAD v3 adds new diversity -- for instance, by almost doubling the number of African-American samples we had in gnomAD v2 (exomes and genomes combined), and also including our first set of allele frequencies for the Amish population.
+The gnomAD genome dataset includes a catalog containing 602M SNVs and 105M indels based on the
+whole-genome sequencing of 71,702 samples mapped to the GRCh38 build of the human reference genome.
 </li>
 <li>
 <a href="https://www.rug.nl/research/genetics/databases/genomeofthenetherlands/" target=_blank>GoNL</a>:
-The Genome of the Netherlands (GoNL) Project characterizes DNA sequence variation, common and rare, for SNVs and short insertions and deletions (indels) and large deletions in 769 individuals of Dutch ancestry selected from five biobanks under the auspices of the Dutch hub of the Biobanking and Biomolecular Research Infrastructure (BBMRI-NL). The samples come from a representative sample of 250 trio-families from all provinces in the Netherlands. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994.
+The Genome of the Netherlands (GoNL) Project characterizes DNA sequence variation, common and rare,
+for SNVs and short insertions and deletions (indels) and large deletions in 769 individuals of Dutch
+ancestry selected from five biobanks under the auspices of the Dutch hub of the Biobanking and
+Biomolecular Research Infrastructure (BBMRI-NL).
 </li>
 <li>
 <a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a>:
-Genetic variation in the Estonian population: pharmacogenomics study of adverse drug effects using electronic health records
+The dataset contains genetic variation in the Estonian population: pharmacogenomics study of adverse
+drug effects using electronic health records.
 </li>
 <li>
 <a href="http://genomes.vn/" target=_blank>Vietnamese</a>:
-A Vietnamese Genetic Variation Database
+The Kinh Vietnamese database contains 24.81 million variants (22.47 million single nucleotide
+polymorphisms (SNPs) and 2.34 million indels), of which 0.71 million variants are novel.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA609628" target=_blank>Korea1K</a>:
-1,094 Korean personal genomes with clinical information
+The dataset contains 1,094 Korean personal genomes with clinical information.
 </li>
 <li>
 <a href="https://hapmap.ncbi.nlm.nih.gov/" target=_blank>HapMap</a>:
-(HapMap is being retired.) The goal of the International HapMap Project is to develop a haplotype map of the human genome, the HapMap, which will describe the common patterns of human DNA sequence variation. The project used DNA samples from African, Asian, or European populations. The HapMap is expected to be a key resource for researchers to use to find genes affecting health, disease, and responses to drugs and environmental factors. The International HapMap Project is a partnership of scientists and funding agencies from Canada, China, Japan, Nigeria, the United Kingdom and the United States to develop a public resource that will help researchers find genes associated with human disease and response to pharmaceuticals.
+(HapMap is being retired.) The International HapMap Project contains samples from African, Asian,
+or European populations.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB36033" target=_blank>PRJEB36033</a>:
-This project was the generation and analysis of 1240k capture data from 70 ancient Sardinians. This work was done in collaboration between the research groups of John Novembre (data analysis), Johannes Krause (aDNA generation) and Francesco Cucca, resulting in a publication in 2019.
+The dataset contains ancient Sardinia genome-wide 1240k capture data from 70 ancient Sardinians.
 </li>
 <li>
 <a href="https://www.hagsc.org/hgdp/" target=_blank>HGDP_Stanford</a>:
-Genotypes (flat files) for ~ 660,918 tag SNPs (Illumina HuHap 650k), in autosomes, chromosome X and Y, the pseudoautosomal region and mitochondrial DNA, typed across 1043 individuals from all panel populations (Li JZ et al. Science 319: 1100-4, 2008).
+The Stanford HGDP SNP genotyping data consists of ~660,918 tag SNPs in autosomes, chromosome X and
+Y, the pseudoautosomal region, and mitochondrial DNA, typed across 1043 individuals from all panel
+populations.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/576826" target=_blank>Daghestan</a>:
-Extensive genome-wide autozygosity in the population isolates of Daghestan.
+The dataset contains genotypes of  >550 000 autosomal single-nucleotide polymorphisms (SNPs) in a
+set of 14 population isolates speaking Nakh-Daghestanian (ND) languages.
 </li>
 <li>
 <a href="https://www.pagestudy.org/" target=_blank>PAGE_STUDY</a>:
 The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of Complex Traits.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/577585" target=_blank>Chileans</a>:
-Genetic structure characterization of Chileans reflects historical immigration patterns.
+The dataset consists of genetic variation on the Chileans using genotype data on ~685,944 SNPs from
+313 individuals across the whole-continental country.
 </li>
 <li>
 <a href="https://www.clinbioinfosspa.es/content/medical-genome-project" target=_blank>MGP</a>:
 MGP contains aggregated information on 267 healthy individuals, representative of the Spanish population that were used as controls in the MGP (Medical Genome Project).
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/PRJEB37584" target=_blank>PRJEB37584</a>:
-Genome-wide genotyping analysis identified copy number variations in cranial meningiomas in Chinese patients, and demonstrated diverse CNV burdens among individuals with diverse clinical features.
+The dataset contains genome-wide genotype analysis that identified copy number variations in cranial
+meningiomas in Chinese patients, and demonstrated diverse CNV burdens among individuals with diverse clinical features.
 </li>
 <li>
 <a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a>:
 The NHLBI Grand Opportunity Exome Sequencing Project (GO-ESP) dataset contains 6503 samples drawn from multiple ESP cohorts and represents all of the ESP exome variant data.
 </li>
 <li>
 <a href="https://exac.broadinstitute.org" target=_blank>ExAC</a>:
 The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals sequenced as part of various disease-specific and population genetic studies. Individuals affected by severe pediatric disease have been removed.
 </li>
 <li>
 <a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD_exomes</a>:
-The GnomAD exome data set (release v2.1).
+The gnomAD v2.1 exome dataset comprises a total of 16 million SNVs and 1.2 million indels from
+125,748 exomes in 14 populations.
 </li>
 <li>
 <a href="https://thl.fi/en/web/thlfi-en/research-and-development/research-and-projects/the-national-finrisk-study" target=_blank>FINRISK</a>:
-The FINRISK cohorts comprise the respondents of representative, cross-sectional population surveys that are carried out every 5 years since 1972, to assess the risk factors of chronic diseases (e.g. CVD, diabetes, obesity, cancer) and health behavior in the working age population, in 3-5 large study areas of Finland. DNA samples were collected in the following survey years: 1987, 1992, 1997, 2002, 2007, and 2012.
+The FINRISK cohorts comprise the respondents of representative, cross-sectional population surveys
+that are carried out every 5 years since 1972, to assess the risk factors of chronic diseases (e.g.
+CVD, diabetes, obesity, cancer) and health behavior in the working age population.
 </li>
 <li>
 <a href="https://www.pharmgkb.org" target=_blank>PharmGKB</a>:
-Aggregated frequency data for all PharmGKB submissions
+The dataset contains aggregated frequency data for all PharmGKB submissions.
 </li>
 <li>
 <a href="https://www.ncbi.nlm.nih.gov/bioproject/?term=PRJEB37766" target=_blank>PRJEB37766</a>:
-Mexican Genomic Database for Addiction Research
+The Mexican Genomic Database for Addiction Research.
 </li>
 </ul>
 
 The project from which to take allele frequency data defaults to 1000 Genomes
 but can be set to any of those projects.
 </p>
 <p>
 Using the track controls, variants can be filtered by
 
   <ul>
     <li>minimum minor allele frequency (MAF)
     </li>
     <li>variation class/type (e.g. SNV, insertion, deletion)
     </li>
     <li>functional effect on a gene (e.g. synonymous, frameshift, intron, upstream)