src/hg/makeDb/trackDb/uniprot.html 2593718ac23f10aabf08572b67c20c3db436ef03

2593718ac23f10aabf08572b67c20c3db436ef03
max
  Thu Apr 1 02:09:59 2021 -0700
clarifying the uniprot docs a bit, refs #27308

diff --git src/hg/makeDb/trackDb/uniprot.html src/hg/makeDb/trackDb/uniprot.html
index 69006ce..5d47dbf 100644
--- src/hg/makeDb/trackDb/uniprot.html
+++ src/hg/makeDb/trackDb/uniprot.html
@@ -1,41 +1,49 @@
 <h2>Description</h2>
 
 <p>
-This track shows protein sequence annotations from the <a
+This track shows protein sequences and annotations on them from the <a
 href="https://www.uniprot.org/" target="_blank">UniProt/SwissProt</A> database,
-mapped to genomic coordinates. It also shows how the protein sequences in this database 
-map to the genome.
-The data has been curated from scientific publications by the UniProt/SwissProt staff.
-The annotations are divided into multiple subtracks, based on their &quot;feature type&quot; in UniProt:
+mapped to genomic coordinates. 
+</p>
+<p>
+UniProt/SwissProt data has been curated from scientific publications by the UniProt staff,
+UniProt/TrEMBL data has been predicted by various computational algorithms.
+The annotations are divided into multiple subtracks, based on their &quot;feature type&quot; in UniProt.
+The first two subtracks below - one for SwissProt, one for TrEMBL - show the
+alignments of protein sequences to the genome, all other tracks below are the protein annotations
+mapped through these alignments to the genome.
 </p> 
 
 <table class="stdTbl">
   <tr>
     <th>Track Name</th>
     <th>Description</th>
   </tr>
   <tr>
-    <td>UCSC Alignment, SwissProt</td>
+    <td>UCSC Alignment, SwissProt = curated protein sequences</td>
     <td>Protein sequences from SwissProt mapped onto the genome. All other
-        tracks are (start,end) annotations mapped using this track.</td> </tr>
+        tracks are (start,end) SwissProt annotations on these sequences mapped
+        using this track. Protein sequences without a single curated 
+    annotation were not added to this track.</td> </tr>
 <tr>
-    <td>UCSC Alignment, TrEMBL</td>
+    <td>UCSC Alignment, TrEMBL = predicted protein sequences</td>
     <td>Protein sequences from TrEMBL mapped onto the genome. All other tracks
-        are (start,end) annotations mapped using this track. This track is
-hidden by default. To show it, click its checkbox on the track description
-page.</td> </tr>
+        below are (start,end) TrEMBL annotations mapped to the genome using
+        this track. This track is hidden by default. To show it, click its
+        checkbox on the track configuration page. Protein sequences without a single 
+        predicted annotation on them were not added to this track.</td></tr>
   <tr>
     <td>UniProt Signal Peptides</td>
     <td>Regions found in proteins destined to be secreted, generally cleaved from mature protein.</td>
   </tr>
   <tr>
     <td>UniProt Extracellular Domains</td>
     <td>Protein domains with the comment &quot;Extracellular&quot;.</td>
   </tr>
   <tr>
     <td>UniProt Transmembrane Domains</td>
     <td>Protein domains of the type &quot;Transmembrane&quot;.</td>
   </tr>
   <tr>
     <td>UniProt Cytoplasmic Domains</td>
     <td>Protein domains with the comment &quot;Cytoplasmic&quot;.</td>
@@ -121,34 +129,34 @@
 mutated amino acids as a SwissProt annotation, it is not shown again. Two
 annotations mapped through different transcripts but with the same genome
 coordinates are only shown once.  </p>
 
 <p>Note that only for the human hg38 assembly and SwissProt annotations, there
 also is a <a
 href="hgTracks?db=hg38&hubUrl=ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/genome_annotation_tracks/UP000005640_9606_hub/hub.txt">public
 track hub</a> prepared by UniProt itself, with 
 genome annotations maintained by UniProt using their own mapping
 method based on those Gencode/Ensembl gene models that are annotated in UniProt
 for a given protein.</p>
 
 <h2>Methods</h2>
 
 <p>
-UniProt sequences were aligned to UCSC/Gencode transcript sequences first with
+UniProt sequences were aligned to one of UCSC, Gencode, Ensembl or Augustus transcript sequences, first with
 BLAT, filtered with pslReps (93% query coverage, within top 1% score), lifted
 to genome positions with pslMap and filtered again.  UniProt annotations were
-obtained from the UniProt XML file.  The annotations were then mapped to the
+obtained from the UniProt XML file.  The UniProt annotations were then mapped to the
 genome through the alignment using the pslMap program.  This mapping approach
 draws heavily on the <A HREF="http://modbase.compbio.ucsf.edu/LS-SNP/"
 TARGET="_BLANK">LS-SNP</A> pipeline by Mark Diekhans. For human and mouse, the
 alignments were filtered by retaining only proteins annotated with
 a given transcript in the Genome Browser table kgXref. Like all Genome Browser
 source code, the main script used to build this track can be found on 
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/utils/otto/uniprot/doUniprot">github</a>.
 </p>
 
 <h2>Data Access</h2>
 
 <p>
 The raw data can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a>, or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>.