b714e3aa69abe5d109c72ca504240dece8d42666 mspeir Wed Jan 14 14:00:02 2026 -0800 adding faq about exonFrame info in bed output, refs #24545 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index b6acdff0764..d7eaecbb263 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -24,30 +24,31 @@
  • For the human assembly hg38/GRCh38: What are the differences between the "GENCODE" and "All GENCODE" tracks?
  • What is the difference between GENCODE comprehensive and basic?
  • What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?
  • What is the best gene track for mitochondrial gene annotations?
  • How shall I report a gene transcript in a manuscript?
  • What is CCDS?
  • How can I show a single transcript per gene?
  • How can I download a file with a single transcript per gene?
  • How can I filter by bioType from GENCODE/RefSeq/Ensembl?
  • This is rather complicated. Can you tell me which gene transcript track I should use?
  • Does UCSC provide GTF/GFF files for gene models?
  • What is the best way to get only coding genes (or only non-coding genes) out of GENCODE (or other gene) tables?
  • +
  • How do I interpret the exon frame information in the BED per-exon output for gene tracks?

  • Return to FAQ Table of Contents

    The basics

    The genome browser contains many gene annotation tracks. Our users often wonder what these contain and where the information that we present comes from.
    What is a gene?

    The exact definition of "gene" depends on the context. In the context of @@ -855,17 +856,73 @@ For the manually curated RefSeq gene set, transcript identifiers start with NM_ for coding or NR_ for non-coding, followed by a number and version number separated by a dot, e.g. "NR_046018.2" for an RNA pseudogene. For RefSeq, one can select non-coding genes by filtering for NR identifiers. Note that a pseudogene of mRNA is not an unambiguous concept, and there may be a desire to look further to select certain subset types as mentioned above.

    If using the UCSC knownGene table, one can filter for where the coding start and coding end fields of the table are equivalent, e.g. knownGene.cdsStart = knownGene.cdsEnd, which would ensure the selected entries are non-coding genes.

    You can also search our mailing-list archives to read further details about only obtaining non-coding genes from the UCSC Genome Browser.

    + +

    How do I interpret the exon frame information in the BED per-exon output for gene tracks?

    +

    +The per-exon option for BED output on the Table Browser outputs likes like so +when using a gene track: +

    +

    chr1 1046829 1047018 NM_001077977_utr3_2_0_chr1_1046830_f 0 +
    +chr1 1099124 1099325 NM_001077124_utr3_0_0_chr1_1099125_r 0 -
    +
    +

    +The name column contains several pieces of information separated by underscores: +NM_001077124_utr3_0_0_chr1_1099125_r. Here's a breakdown of that information: +

      +
    1. NM_001077124 - Transcript accession +
    2. utr3 - will be cds or utr3/5 +
    3. 0 - exon inFrame - This indicates the offset at the beginning + of a feature (like an exon) to reach the first base of the next complete + codon. +
    4. 0 - exon outFrame - This indicates the offset remaining at the end of a feature. +
    5. chr1_1099125 - chromosome and start position of the exon +
    6. r - strand, "r" for reverse or "-" and "f" + for forward or "+" +
    +Here we're going to focus on the inFrame and outFrame specifically. The values +typically range from 0 to 2. These numbers are a representation of where in the +frame the exon starts and ends. + + + + + + + + + + + + + + + + + + + + + +
    ValueMeaning
    0The feature starts/ends exactly at the beginning of a codon. No offset is required.
    1There is 1 "extra" nucleotide before/after the complete codons start.
    2There are 2 "extra" nucleotides before/after the complete codons start.
    +

    +In the example lines above, the exons have "0" for both inFrame and outFrame because +they are UTR exons.

    + +

    +Finally, it should be noted that when the amino acid output is split per exons (where a split codon is impossible to +denote), the amino acid for split codon is placed in the exon with most of the bases. +