src/hg/htdocs/FAQ/FAQgenes.html fc450d4031fd9494b3a5bd8ee0fbeba83905a61a

fc450d4031fd9494b3a5bd8ee0fbeba83905a61a
brianlee
  Thu Mar 3 16:48:59 2022 -0800
Edits to new coding/non-coding gene FAQ  refs #29030

diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html
index 77ad926..a30ec1e 100755
--- src/hg/htdocs/FAQ/FAQgenes.html
+++ src/hg/htdocs/FAQ/FAQgenes.html
@@ -617,56 +617,56 @@
 out of GENCODE (or other gene) tables?</h2>
 <h3>Coding genes</h3>
 <p>
 One option for GENCODE is to use the Public MySQL server and the following query:
 <pre>
  mysql --user=genome --host=genome-mysql.soe.ucsc.edu -Ne 'select * from wgEncodeGencodeBasicV39 where cdsStartStat = "cmpl" and cdsEndStat = "cmpl";' hg38 
 </pre></p>
 <p>
 What this query does is access the hg38 database, and then from the wgEncodeGencodeBasicV39 table,
 looks to the fields of cdsStartStat and cdsEndStat for only those entries with the value cmpl,
 showing &quot;CDS is complete&quot; at the start and end, so that these are genes that are
 protein-coding entries, thereby excluding non-coding RNA genes.</p>
 <p>
 For the manually curated RefSeq gene set transcript identifiers start with NM_ for coding
 or NR_ for non-coding, followed by a number and version number separated by a dot, e.g.
-&quot;NR_046018.2&quot; for a RNA pseudogene. For RefSeq one can select coding genes by
+&quot;NR_046018.2&quot; for an RNA pseudogene. For RefSeq one can select coding genes by
 filtering for NM identifiers.</p>
 <p>If using the UCSC knownGene table one can filter for where the coding start
 and coding end fields of the table are not equivalent, e.g.
 <code>knownGene.cdsStart != knownGene.cdsEnd</code>, which would ensure the selected
 entries are coding genes.</p>
 <p>
 You can also <a href="https://groups.google.com/u/1/a/soe.ucsc.edu/g/genome/search?q=only%20coding%20genes"
 target="_blank">search our mailing-list archives</a> to read further details about only
 obtaining coding genes from the UCSC Genome Browser.</p>
 <a name="nonCoding"></a>
 <h3>Non-coding genes</h3>
 <p>
 The steps for selecting non-coding genes are essentially the opposite of the steps to select
 only coding genes. One option for GENCODE is to use the Public MySQL server
 and the following query:</p>
 <p>
 <pre>
  mysql --user=genome --host=genome-mysql.soe.ucsc.edu -Ne 'select * from wgEncodeGencodeBasicV39 where cdsStartStat != "cmpl" and cdsEndStat != "cmpl";' hg38
 </pre></p>
 <p>
 What this query does is access the hg38 database, and then from the wgEncodeGencodeBasicV39 table,
 looks to the fields of cdsStartStat and cdsEndStat for only those entries without the value cmpl,
-showing &quot;CDS is complete&quot; at the start and end, so that these are genes that are
-protein-coding entries, thereby including only non-coding RNA genes.</p>
+showing &quot;CDS is complete&quot; at the start and end, so that this removes genes that are
+protein-coding entries, thereby selecting only non-coding RNA genes.</p>
 <p>
 For the manually curated RefSeq gene set transcript identifiers start with NM_ for coding
 or NR_ for non-coding, followed by a number and version number separated by a dot, e.g.
-&quot;NR_046018.2&quot; for an RNA pseudogene. For RefSeq, one can select coding genes by
+&quot;NR_046018.2&quot; for an RNA pseudogene. For RefSeq, one can select non-coding genes by
 filtering for NR identifiers.</p>
 <p>If using the UCSC knownGene table one can filter for where the coding start
 and coding end fields of the table are equivalent, e.g.
 <code>knownGene.cdsStart = knownGene.cdsEnd</code>, which would ensure the selected
 entries are non-coding genes.</p>
 <p>
 You can also <a href="https://groups.google.com/u/1/a/soe.ucsc.edu/g/genome/search?q=only%20non-coding%20genes"
 target="_blank">search our mailing-list archives</a> to read further details about only
 obtaining non-coding genes from the UCSC Genome Browser.</p>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->