src/hg/htdocs/FAQ/FAQgenes.html 55817ba932466fcc3142bfcd3261a7d91bed1c3a

55817ba932466fcc3142bfcd3261a7d91bed1c3a
max
  Thu Mar 21 11:20:56 2019 +0100
adding to faq genes page, refs #22696

diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html
index 8a7151d..3a455f1 100755
--- src/hg/htdocs/FAQ/FAQgenes.html
+++ src/hg/htdocs/FAQ/FAQgenes.html
@@ -94,30 +94,62 @@
 human and mouse transcripts. NCBI has added an automated prediction software (Gnomon)
 which we show in the &quot;<a target=_blank 
 href=../cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite>RefSeq Predicted</a>&quot; track.</p>
 
 <p>There are many other tracks in the group &quot;Genes and Gene Predictions&quot;.
 <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=genscan">Genscan</a> and <a target=_blank 
 href="../cgi-bin/hgTrackUi?db=hg19&g=nscanGene">N-Scan</a> are older 
 transcript predictor algorithms that are based on the genome sequence alone. 
 <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=augustusGene">Augustus</a> and <a 
 target=_blank href="../cgi-bin/hgTrackUi?db=hg19&g=acembly">AceView</a> are automated 
 gene-predictors that use cDNA and EST data. These and similar gene
 tracks are only relevant when you are working on a particular locus where you
 think that the manually curated gene models (Ensembl and RefSeq) have
 errors.</p>
 
+To illustrate these differences, here is an overview of a few different tracks on human (hg38) and how many transcripts they contain as of March 2019:
+  <table> 
+    <tr> 
+      <th nowrap><strong>Track name</strong></th> 
+      <th nowrap><strong>Number of transcripts</strong></th> 
+    <tr> 
+      <td>Known Gene (Gencode comprehensive V29)</td>
+      <td>226,811</td> 
+    </tr> 
+    <tr> 
+      <td>Known Gene (Gencode basic V29)</td>
+      <td>112,634</td> 
+    </tr> 
+    <tr> 
+      <td>NCBI RefSeq Predicted Transcripts</td>
+      <td>94,389</td> 
+    </tr> 
+    <tr> 
+      <td>UCSC RefSeq (Curated)</td>
+      <td>80,694</td> 
+    </tr> 
+    <tr> 
+      <td>NCBI RefSeq Curated</td>
+      <td>73,080</td> 
+    </tr> 
+    <tr> 
+      <td>CCDS</td>
+      <td>32,506</td> 
+    </tr> 
+  </table>
+
+
 <a name="genename"></a>
 <h6>What is a gene or transcript accession? </h6>
 
 <p>
 Gene symbols like BRCA1 are easy to remember but sometimes change and are not
 specific to an organism.  Therefore most databases internally use unique
 identifiers to refer to sequences and some journals require authors to use
 these in manuscripts.<br>
 
 The most common accession numbers encountered by users are either from Ensembl,
 GENCODE or RefSeq.  Human Ensembl/GENCODE gene accession numbers start with
 ENSG, e.g. &quot;ENSG00000012048&quot for BRCA1.  Every ENSG-gene has at least
 one transcript assigned to it. The transcript identifiers start with with ENST
 followed by a number, e.g.  &quot;ENST00000619216.1&quot;. NCBI refers to genes
 with plain numbers, e.g.  672 for BRCA1. Manually curated RefSeq transcript