src/hg/makeDb/trackDb/human/dbSnp153Composite.html 9a31f94233618b5b6c16814d550efddc226d5340

9a31f94233618b5b6c16814d550efddc226d5340
kuhn
  Thu Oct 28 16:29:22 2021 -0700
minor wording changes

diff --git src/hg/makeDb/trackDb/human/dbSnp153Composite.html src/hg/makeDb/trackDb/human/dbSnp153Composite.html
index 09a717b..5398446 100644
--- src/hg/makeDb/trackDb/human/dbSnp153Composite.html
+++ src/hg/makeDb/trackDb/human/dbSnp153Composite.html
@@ -1,34 +1,34 @@
 <h2>Description</h2>
 <p>
 This track shows short genetic variants
 (up to approximately 50 base pairs) from
 <A HREF="https://www.ncbi.nlm.nih.gov/SNP/" target=_blank>dbSNP</A>
 build 153:
 single-nucleotide variants (SNVs),
-small insertions, deletions, and complex deletion/insertions,
+small insertions, deletions, and complex deletion/insertions (indels),
 relative to the reference genome assembly.
 Most variants in dbSNP are rare, not true polymorphisms,
 and some variants are known to be pathogenic.
 </p><p>
 For hg38 (GRCh38), approximately 667 million distinct variants
 (RefSNP clusters with rs# ids)
-have been mapped to over 702 million genomic locations
+have been mapped to more than 702 million genomic locations
 including alternate haplotype and fix patch sequences.
 dbSNP remapped variants from hg38 to hg19 (GRCh37);
 approximately 658 million distinct variants were mapped to
-over 683 million genomic locations
+more than 683 million genomic locations
 including alternate haplotype and fix patch sequences (not
 all of which are included in UCSC's hg19).
 </p>
 <p>
 This track includes four subtracks of variants:
   <ul>
     <li><b>All dbSNP (153)</b>: the entire set (683 million for hg19, 702 million for hg38)
     </li>
     <li><b>Common dbSNP (153)</b>: approximately 15 million variants with a minor allele
       frequency (MAF) of at least 1% (0.01) in the 1000 Genomes Phase 3 dataset.
       Variants in the Mult. subset (below) are excluded.
     </li>
     <li><b>ClinVar dbSNP (153)</b>: approximately 455,000 variants mentioned in ClinVar.
       <b>Note:</b> that includes both benign and pathogenic (as well as uncertain) variants.
       Variants in the Mult. subset (below) are excluded.
@@ -65,37 +65,37 @@
 <p>
 SNVs and pure deletions are displayed as boxes covering the affected base(s).
 Pure insertions are drawn as single-pixel tickmarks between
 the base before and the base after the insertion.
 </p><p>
 Insertions and/or deletions in repetitive regions may be represented by a half-height box
 showing uncertainty in placement, followed by a full-height box showing the number of deleted
 bases, or a full-height tickmark to indicate an insertion.
 When an insertion or deletion falls in a repetitive region, the placement may be ambiguous.
 For example, if the reference genome contains "TAAAG" but some
 individuals have "TAAG" at the same location, then the variant is a deletion of a single
 A relative to the reference genome.
 However, which A was deleted?  There is no way to tell whether the first, second or third A
 was removed.
 Different variant mapping tools may place the deletion at different bases in the reference genome.
-In order to reduce errors in merging variant calls made with different left vs. right biases,
+To reduce errors in merging variant calls made with different left vs. right biases,
 dbSNP made a major change in its representation of deletion/insertion variants in build 152.
 Now, instead of assigning a single-base genomic location at one of the A's,
 dbSNP expands the coordinates to encompass the whole repetitive region,
 so the variant is represented as a deletion of 3 A's combined with an insertion of 2 A's.
 In the track display, there will be a half-height box covering the first two A's,
-followed by a full-height box covering the third A, in order to show a net loss of one base
+followed by a full-height box covering the third A, to show a net loss of one base
 but an uncertain placement within the three A's.
 </p>
 <p>
 Variants are colored according to functional effect on genes annotated by dbSNP:
 </p>
 
 <p><b><font color=red>Protein-altering variants and splice site variants are
 red</font></b>.
 <br><b><font color=green>Synonymous codon variants are
 green</font></b>.
 <br><b><font color=blue>
 Non-coding transcript or Untranslated Region (UTR) variants are
 blue</font></b>.
 </p>
 <p>
@@ -106,31 +106,31 @@
 major/minor alleles (when available) and
 minor allele frequency (when available).
 Allele frequencies are reported independently by twelve projects
 (some of which may have overlapping sets of samples):
   <ul>
     <li><a href="https://www.internationalgenome.org/" target=_blank>1000Genomes</a>:
       The 1000 Genomes Phase 3 dataset contains data for 2,504 individuals from 26 populations.
     </li>
     <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD exomes</a>:
       The gnomAD
       <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a>
       exome dataset comprises a total of 16 million SNVs and 1.2 million indels from 125,748 exomes
       in 14 populations.
     </li>
     <li><a href="https://www.nhlbiwgs.org/" target=_blank>TOPMED</a>:
-      The TOPMED dataset contains phase 3 data from freeze 5 panel that include over 60,000
+      The TOPMED dataset contains phase 3 data from freeze 5 panel that include more than 60,000
       individuals. The approximate ethnic breakdown is European(52%), African (31%),
       Hispanic or Latino (10%), and East Asian (7%) ancestry.
     </li>
     <li><a href="http://exac.broadinstitute.org/" target=_blank>ExAC</a>:
       The Exome Aggregation Consortium (ExAC) dataset contains 60,706 unrelated individuals
       sequenced as part of various disease-specific and population genetic studies.
       Individuals affected by severe pediatric disease have been removed.
     </li>
     <li><a href="https://www.pagestudy.org/" target=_blank>PAGE STUDY</a>:
       The PAGE Study: How Genetic Diversity Improves Our Understanding of the Architecture of
       Complex Traits.
     </li>
     <li><a href="https://gnomad.broadinstitute.org/" target=_blank>GnomAD genomes</a>:
       The gnomAD
       <a href="https://macarthurlab.org/2018/10/17/gnomad-v2-1/" target=_blank>v2.1</a>
@@ -394,43 +394,43 @@
     <td>At least one other mapping of this variant has erroneous coordinates.
       The mapping(s) with erroneous coordinates are excluded from this track
       and are included in the Map Err subtrack.  Sometimes despite this mapping
       having legal coordinates, there may still be an issue with this mapping's
       coordinates and alleles; you may want to click through to dbSNP to compare
       the initial submission's coordinates and alleles.
       In hg19, 55454 distinct rsIDs are affected; in hg38, 86636.
   </tr>
 </table>
 
 
 <h2>Data Sources and Methods</h2>
 <p>
 dbSNP has collected genetic variant reports from researchers worldwide for 
 <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2019/10/07/dbsnp-celebrates-20-years/"
-   target=_blank>over 20 years</a>.
+   target=_blank>more than 20 years</a>.
 Since the advent of next-generation sequencing methods and the population sequencing efforts
 that they enable, dbSNP has grown exponentially, requiring a new data schema, computational pipeline,
 web infrastructure, and download files.
 (Holmes <em>et al.</em>)
 The same challenges of exponential growth affected UCSC's presentation of dbSNP variants,
 so we have taken the opportunity to change our internal representation and import pipeline.
 Most notably, flanking sequences are no longer provided by dbSNP,
-since most submissions have been genomic variant calls in VCF format as opposed to
+because most submissions have been genomic variant calls in VCF format as opposed to
 independent sequences.
 </p>
 <p>
-We downloaded dbSNP's JSON files available from
+We downloaded JSON files available from dbSNP at
 <a href="ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/"
 target=_blank>ftp://ftp.ncbi.nlm.nih.gov/snp/archive/b153/JSON/</a>,
 extracted a subset of the information about each variant, and collated
 it into a bigBed file using the
 <a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/bigDbSnp.as"
 target=_blank>bigDbSnp.as</a> schema with the information
 necessary for filtering and displaying the variants,
 as well as a separate file containing more detailed information to be
 displayed on each variant's details page
 (<a href="https://genome-source.gi.ucsc.edu/gitlist/kent.git/blob/master/src/hg/lib/dbSnpDetails.as"
 target=_blank>dbSnpDetails.as</a> schema).
 
 <h2>Data Access</h2>
 <p>
 Since dbSNP has grown to include approximately 700 million variants, the size of the All dbSNP (153)
@@ -498,31 +498,31 @@
   </tr>
   <tr>
     <td colspan=3>
       <a href="http://hgdownload.soe.ucsc.edu/gbdb/hgFixed/dbSnp/dbSnp153Details.tab.gz"
          target=_blank>dbSnp153Details.tab.gz</a>
     </td>
     <td>gzip-compressed tab-separated text</td>
     <td>Detailed variant properties, independent of genome assembly version</td>
   </tr>
 </table>
 </p>
 <p>
 Several utilities for working with bigBed-formatted binary files can be downloaded
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads"
    target=_blank>here</a>.
-Run a utility with no arguments in order to see a brief description of the utility and its options.
+Run a utility with no arguments to see a brief description of the utility and its options.
 <ul>
   <li><b>bigBedInfo</b> provides summary statistics about a bigBed file including the number of
     items in the file.  With the <b>-as</b> option, the output includes an
     autoSql
     definition of data columns, useful for interpreting the column values.</li>
   <li><b>bigBedToBed</b> converts the binary bigBed data to tab-separated text.
     Output can be restricted to a particular region by using the -chrom, -start
     and -end options.</li>
   <li><b>bigBedNamedItems</b> extracts rows for one or more rs# IDs.</li>
 </ul>
 </p>
 
 <h4>Example: retrieve all variants in the region chr1:200001-200400</h4>
 
 <pre><tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/snp/dbSnp153.bb -chrom=chr1 -start=200000 -end=200400 stdout</tt></pre>
@@ -553,31 +553,31 @@
   <li><a href="https://esp.gs.washington.edu/" target=_blank>GoESP</a></li>
   <li><a href="https://www.geenivaramu.ee/en" target=_blank>Estonian</a></li>
   <li><a href="http://www.bris.ac.uk/alspac/participants/genome/" target=_blank>ALSPAC</a></li>
   <li><a href="https://twinsuk.ac.uk/" target=_blank>TWINSUK</a></li>
   <li><a href="https://swefreq.nbis.se/dataset/SweGen" target=_blank>NorthernSweden</a></li>
   <li><a href="https://genomes.vn" target=_blank>Vietnamese</a></li>
 </ol>
 </p><p>
 UCSC also has an
 <a href="../goldenPath/help/api.html"
    target=_blank>API</a>
 that can be used to retrieve values from a particular chromosome range.
 </p><p>
 A list of rs# IDs can be pasted/uploaded in the
 <a href="hgVai" target=_blank>Variant Annotation Integrator</a>
-tool in order to find out which genes (if any) the variants are located in,
+tool to find out which genes (if any) the variants are located in,
 as well as functional effect such as intron, coding-synonymous, missense, frameshift, etc.
 </p><p>
 Please refer to our searchable
 <A HREF="https://groups.google.com/a/soe.ucsc.edu/forum/?hl=en&fromgroups#!search/download+snps"
 target=_blank>mailing list archives</a>
 for more questions and example queries, or our
 <a HREF="../FAQ/FAQdownloads.html#download36" target=_blank>Data Access FAQ</a>
 for more information.
 </p>
 
 <h2>References</h2>
 
 <p>
 Holmes JB, Moyer E, Phan L, Maglott D, Kattman B.
 <a href="https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btz856"