0da0030a65de00ed1a63da1213e9ca8c47cb82b4 kuhn Mon Mar 28 11:36:53 2022 -0700 reworded a line that sounded awkward diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index f922529..52d3c97 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -615,31 +615,31 @@
The best approach to get protein-coding genes out of GENCODE is to join data with a related attributes table, and specifically name the desired biotype(s).
Here is an introductory example using the Public MySQL server to access the wgEncodeGencodeBasicV39 table of all genes and the wgEncodeGencodeAttrsV39 related table to find the transcriptType for each entry and to select those that are annotated as protein-coding genes. There are a number of biotypes that can be accessed by looking at the table scheme and clicking the values link for the transcriptType field. These terms are also more fully described on the GENCODE biotypes page. -The below example will attempt to make a simple example to select +The example below will attempt to make a simple example to select all types that have "protein_coding" in this transcriptType field:
mysql -u genome -h genome-mysql.soe.ucsc.edu hg38 -e 'select g.name,a.transcriptType from wgEncodeGencodeBasicV39 g, wgEncodeGencodeAttrsV39 a where (g.name = a.transcriptId) and (a.transcriptType = "protein_coding");'
What this query does is access the hg38 database, and then from the wgEncodeGencodeBasicV39 table, it takes the name field (g.name) and looks in the related wgEncodeGencodeAttrsV39 table for a matching transcriptId field (g.name = a.transcriptId), and then screens for only entries in wgEncodeGencodeAttrsV39 that are equal to protein-coding (a.transcriptType = "protein_coding"). In this way selecting all the entries which are annotated as protein-coding. Please note this selection will return some of the unusual protein-coding cases that one would not consider, for instance, it will return genes one may not want (or want), such as Immunoglobulin and T-cell receptor components.