b714e3aa69abe5d109c72ca504240dece8d42666 mspeir Wed Jan 14 14:00:02 2026 -0800 adding faq about exonFrame info in bed output, refs #24545 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index b6acdff0764..d7eaecbb263 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -24,30 +24,31 @@
Return to FAQ Table of Contents
The exact definition of "gene" depends on the context. In the context of @@ -855,17 +856,73 @@ For the manually curated RefSeq gene set, transcript identifiers start with NM_ for coding or NR_ for non-coding, followed by a number and version number separated by a dot, e.g. "NR_046018.2" for an RNA pseudogene. For RefSeq, one can select non-coding genes by filtering for NR identifiers. Note that a pseudogene of mRNA is not an unambiguous concept, and there may be a desire to look further to select certain subset types as mentioned above.
If using the UCSC knownGene table, one can filter for where the coding start
and coding end fields of the table are equivalent, e.g.
knownGene.cdsStart = knownGene.cdsEnd, which would ensure the selected
entries are non-coding genes.
You can also search our mailing-list archives to read further details about only obtaining non-coding genes from the UCSC Genome Browser.
+ ++The per-exon option for BED output on the Table Browser outputs likes like so +when using a gene track: +
+
chr1 1046829 1047018 NM_001077977_utr3_2_0_chr1_1046830_f 0 +
+chr1 1099124 1099325 NM_001077124_utr3_0_0_chr1_1099125_r 0 -
+
+
+The name column contains several pieces of information separated by underscores:
+NM_001077124_utr3_0_0_chr1_1099125_r. Here's a breakdown of that information:
+
NM_001077124 - Transcript accession
+ utr3 - will be cds or utr3/5
+ 0 - exon inFrame - This indicates the offset at the beginning
+ of a feature (like an exon) to reach the first base of the next complete
+ codon.
+ 0 - exon outFrame - This indicates the offset remaining at the end of a feature.
+ chr1_1099125 - chromosome and start position of the exon
+ r - strand, "r" for reverse or "-" and "f"
+ for forward or "+"
+| Value | +Meaning | +
|---|---|
| 0 | +The feature starts/ends exactly at the beginning of a codon. No offset is required. | +
| 1 | +There is 1 "extra" nucleotide before/after the complete codons start. | +
| 2 | +There are 2 "extra" nucleotides before/after the complete codons start. | +
+In the example lines above, the exons have "0" for both inFrame and outFrame because +they are UTR exons.
+ ++Finally, it should be noted that when the amino acid output is split per exons (where a split codon is impossible to +denote), the amino acid for split codon is placed in the exon with most of the bases. +