9b780ead5d124e051e12c1be725eadc6ca7d76ae max Thu Mar 21 11:25:19 2019 +0100 adding counts to genes faq, refs #22696 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index f7a5542..142b611 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -130,30 +130,67 @@ human and mouse transcripts. NCBI has added an automated prediction software (Gnomon) which we show in the "<a target=_blank href=../cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite>RefSeq Predicted</a>" track.</p> <p>There are many other tracks in the group "Genes and Gene Predictions". <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=genscan">Genscan</a> and <a target=_blank href="../cgi-bin/hgTrackUi?db=hg19&g=nscanGene">N-Scan</a> are older transcript predictor algorithms that are based on the genome sequence alone. <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=augustusGene">Augustus</a> and <a target=_blank href="../cgi-bin/hgTrackUi?db=hg19&g=acembly">AceView</a> are automated gene-predictors that use cDNA and EST data. These and similar gene tracks are only relevant when you are working on a particular locus where you think that the manually curated gene models (Ensembl and RefSeq) have errors.</p> +<p> +To illustrate differences between the most common gene tracks, here is an +overview of a few different tracks on human (hg38) and how many transcripts +they contain as of March 2019: +</p> + +<table> + <tr> + <th nowrap><strong>Track name</strong></th> + <th nowrap><strong>Number of transcripts</strong></th> + <tr> + <td>Known Gene (Gencode comprehensive V29)</td> + <td>226,811</td> + </tr> + <tr> + <td>Known Gene (Gencode basic V29)</td> + <td>112,634</td> + </tr> + <tr> + <td>NCBI RefSeq Predicted Transcripts</td> + <td>94,389</td> + </tr> + <tr> + <td>UCSC RefSeq (Curated)</td> + <td>80,694</td> + </tr> + <tr> + <td>NCBI RefSeq Curated</td> + <td>73,080</td> + </tr> + <tr> + <td>CCDS</td> + <td>32,506</td> + </tr> + </table> + + <a name="ens"></a> <h2>The differences</h2> Some of our gene tracks look similar and contain very similar information which can be confusing. <h6>What are Ensembl and GENCODE and is there a difference?</h6> <p> Officially, the Ensembl and GENCODE gene models are the same. On the latest human and mouse genome assemblies (hg38 and mm10), the identifiers, transcript sequences, and exon coordinates are almost identical between equivalent Ensembl and GENCODE versions (excluding <a target=_blank href="FAQdownloads.html#downloadAlt">alternative sequences</a> or <a target=_blank href="FAQdownloads.html#downloadFix">fix sequences</a>).</p> <p>GENCODE uses the UCSC convention of prefixing chromosome names with "chr", e.g.