src/hg/htdocs/FAQ/FAQgenes.html b851c545e4701e24096f34a522318e3d22054493

b851c545e4701e24096f34a522318e3d22054493
lrnassar
  Wed Mar 20 13:33:09 2019 -0700
Final edits to FAQgenes page ref#22696

diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html
index ea9a494..f7a5542 100755
--- src/hg/htdocs/FAQ/FAQgenes.html
+++ src/hg/htdocs/FAQ/FAQgenes.html
@@ -32,31 +32,31 @@
 <hr>
 <p>
 <a href="index.html">Return to FAQ Table of Contents</a></p>
 
 <a name="gene"></a>
 <h2>The basics</h2>
 
 The genome browser contains many gene annotation tracks. Our users 
 often wonder what these contain and where the information that we present comes
 from.
 
 <h6>What is a gene?</h6>
 <p>
 The exact definition of &quot;gene&quot; depends on the context. In the context of 
 genome annotation, a gene has at least a name and is defined by a collection of
-related mRNA transcript sequences (&quot;isoforms&quot;). The naming of genes and the
+related RNA transcript sequences (&quot;isoforms&quot;). The naming of genes and the
 assignment of the most important transcript sequences is often done manually by
 a group of biological literature curators.  For human, genes names are created
 by the <a target=_blank href="https://www.genenames.org/">Human Gene
 Nomenclature Committee (HGNC, formerly HUGO)</a>.  Non-human species have
 similar annotation groups, e.g. Mouse Genome Informatics, Wormbase, Flybase,
 etc.
 </p>
 
 <a name="genestrans"></a>
 <h6>What is a transcript and how is it related to a gene? </h6>
 <p>
 Transcripts are defined as RNA molecules that are made from a DNA template.
 Databases like the ones at the National Library of Medicine's NCBI or the
 European Bioinformatics Institute (EBI) collect these transcript sequences from
 biologists working on a gene. Every transcript has a 
@@ -80,47 +80,56 @@
 On the version hg38/GRCh38 of the human genome, these exons cover the DNA
 nucleotides 43044295 to 43125483.</p>
 
 <a name="genename"></a>
 <h6>What is a gene or transcript accession? </h6>
 
 <p>
 Gene symbols like BRCA1 are easy to remember but sometimes change and are not
 specific to an organism.  Therefore most databases internally use unique
 identifiers to refer to sequences and some journals require authors to use
 these in manuscripts.</p>
 
 <p>
 The most common accession numbers encountered by users are either from Ensembl,
 GENCODE or RefSeq.  Human Ensembl/GENCODE gene accession numbers start with
-ENSG, e.g. &quot;ENSG00000012048&quot for BRCA1.  Every ENSG-gene has at least
+ENSG followed by a number and version number separated by a dot, e.g. 
+&quot;ENSG00000012048.21&quot for latest BRCA1.  Every ENSG-gene has at least
 one transcript assigned to it. The transcript identifiers start with with ENST
-followed by a number, e.g.  &quot;ENST00000619216.1&quot;. NCBI refers to genes
+and are likewise followed by a version number, e.g. 
+&quot;ENST00000619216.1&quot;. Additional details on Ensembl IDs can be found
+on the <a target="_blank" 
+href="https://uswest.ensembl.org/Help/Faq?id=488">Ensembl FAQ page</a>.</p>
+
+<p>
+NCBI refers to genes
 with plain numbers, e.g.  672 for BRCA1. Manually curated RefSeq transcript
 identifiers start with NM_ (coding) or NR_ (non-coding), followed by a number and version
 number separated by a dot, e.g. &quot;NR_046018.2&quot;.  If the transcript was
 predicted by the NCBI Gnomon software, the prefix is XM_ but these are rare in human.
 A table of these and other RefSeq prefixes can be
 found on the <a target=_blank
 href="https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly">
 NCBI website</a>.
 </p>
 
 <a name="mostCommon"></a>
 <h6>What are the most common gene transcript tracks?</h6>
 <p>
-Researchers sequence cDNA sequences and send these to NCBI Genbank. The
+Researchers sequence <a target="_blank" 
+href="https://en.wikipedia.org/wiki/Complementary_DNA">cDNA sequences</a> 
+and send these to NCBI Genbank. The
 Genome Browser shows these sequences in the Genbank or the <a target=_blank 
 href="../cgi-bin/hgTrackUi?db=hg38&g=est">EST track</a> (if the cDNA is just
 a single read from the 5' or 3' end). From the alignment of the cDNAs and ESTs, 
 the NCBI RefSeq group manually creates a smaller set of representative transcripts 
 which we display as the <a target=_blank 
 href=../cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite>RefSeq Curated</a> track.
 Automated programs like UCSC's or Ensembl's gene build software do the same, just
 in software, which is more systematic but also more error-prone.
 With the arrival of GENCODE, Ensembl added a manual curation to their
 human and mouse transcripts. NCBI has added an automated prediction software (Gnomon)
 which we show in the &quot;<a target=_blank 
 href=../cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite>RefSeq Predicted</a>&quot; track.</p>
 
 <p>There are many other tracks in the group &quot;Genes and Gene Predictions&quot;.
 <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=genscan">Genscan</a> and <a target=_blank