b714e3aa69abe5d109c72ca504240dece8d42666 mspeir Wed Jan 14 14:00:02 2026 -0800 adding faq about exonFrame info in bed output, refs #24545 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index b6acdff0764..d7eaecbb263 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -24,30 +24,31 @@ <li><a href="#hg38">For the human assembly hg38/GRCh38: What are the differences between the "GENCODE" and "All GENCODE" tracks?</a></li> <li><a href="#gencode">What is the difference between GENCODE comprehensive and basic?</a></li> <li><a href="#ncbiRefseq">What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</a></li> <li><a href="#mito">What is the best gene track for mitochondrial gene annotations?</a></li> <li><a href="#report">How shall I report a gene transcript in a manuscript?</a></li> <li><a href="#ccds">What is CCDS?</a></li> <li><a href="#justsingle">How can I show a single transcript per gene?</a></li> <li><a href="#singledownload">How can I download a file with a single transcript per gene?</a></li> <li><a href="#bioTypeFilter">How can I filter by bioType from GENCODE/RefSeq/Ensembl?</a></li> <li><a href="#whatdo">This is rather complicated. Can you tell me which gene transcript track I should use?</a></li> <li><a href="#gtfDownload">Does UCSC provide GTF/GFF files for gene models?</a></li> <li><a href="#coding">What is the best way to get only coding genes (or only non-coding genes) out of GENCODE (or other gene) tables?</a></li> +<li><a href="#exonFrame">How do I interpret the exon frame information in the BED per-exon output for gene tracks?</a></li> </ul> <hr> <p> <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="gene"></a> <h2>The basics</h2> The genome browser contains many gene annotation tracks. Our users often wonder what these contain and where the information that we present comes from. <h6>What is a gene?</h6> <p> The exact definition of "gene" depends on the context. In the context of @@ -855,17 +856,73 @@ For the manually curated RefSeq gene set, transcript identifiers start with NM_ for coding or NR_ for non-coding, followed by a number and version number separated by a dot, e.g. "NR_046018.2" for an RNA pseudogene. For RefSeq, one can select non-coding genes by filtering for NR identifiers. Note that a pseudogene of mRNA is not an unambiguous concept, and there may be a desire to look further to select certain subset types as mentioned above.</p> <p> If using the UCSC knownGene table, one can filter for where the coding start and coding end fields of the table are equivalent, e.g. <code>knownGene.cdsStart = knownGene.cdsEnd</code>, which would ensure the selected entries are non-coding genes.</p> <p> You can also <a href="https://groups.google.com/u/1/a/soe.ucsc.edu/g/genome/search?q=only%20non-coding%20genes" target="_blank">search our mailing-list archives</a> to read further details about only obtaining non-coding genes from the UCSC Genome Browser.</p> +<a name="exonFrame"></a> +<h2>How do I interpret the exon frame information in the BED per-exon output for gene tracks?</h2> +<p> +The per-exon option for BED output on the Table Browser outputs likes like so +when using a gene track: +<p> +<pre><code>chr1 1046829 1047018 NM_001077977_utr3_2_0_chr1_1046830_f 0 + +chr1 1099124 1099325 NM_001077124_utr3_0_0_chr1_1099125_r 0 - +</code></pre> +<p> +The name column contains several pieces of information separated by underscores: +<code>NM_001077124_utr3_0_0_chr1_1099125_r</code>. Here's a breakdown of that information: +<ol> + <li><code>NM_001077124</code> - Transcript accession + <li><code>utr3</code> - will be cds or utr3/5 + <li><code>0</code> - exon inFrame - This indicates the offset at the beginning + of a feature (like an exon) to reach the first base of the next complete + codon. + <li><code>0</code> - exon outFrame - This indicates the offset remaining at the end of a feature. + <li><code>chr1_1099125</code> - chromosome and start position of the exon + <li><code>r</code> - strand, "r" for reverse or "-" and "f" + for forward or "+" +</ol> +Here we're going to focus on the inFrame and outFrame specifically. The values +typically range from 0 to 2. These numbers are a representation of where in the +frame the exon starts and ends. +<table> + <thead> + <tr> + <th>Value</th> + <th>Meaning</th> + </tr> + </thead> + <tbody> + <tr> + <td>0</td> + <td>The feature starts/ends exactly at the beginning of a codon. No offset is required.</td> + </tr> + <tr> + <td>1</td> + <td>There is 1 "extra" nucleotide before/after the complete codons start.</td> + </tr> + <tr> + <td>2</td> + <td>There are 2 "extra" nucleotides before/after the complete codons start.</td> + </tr> + </tbody> +</table> +<p> +In the example lines above, the exons have "0" for both inFrame and outFrame because +they are UTR exons.</p> + +<p> +Finally, it should be noted that when the amino acid output is split per exons (where a split codon is impossible to +denote), the amino acid for split codon is placed in the exon with most of the bases.</a> + <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->