a86827f92ed8795e87539b2829821c0d824b5f36 lrnassar Tue Mar 19 15:07:51 2019 -0700 More work on unreleased FAQgenes page ref#22696 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index f33ca1e..0a1fc49 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -4,32 +4,32 @@

Frequently Asked Questions: Gene tracks

Topics

What is a gene?
What is the difference between a gene and a transcript?
What are the most common gene transcript tracks?
What is a gene name?
What are Ensembl and GENCODE and is there a difference?
What are the differences among GENCODE, Ensembl and RefSeq?
For the human assembly hg19/GRCh37: What is the difference between UCSC - Genes track, the "GENCODE" track and the "Ensembl Genes" track?
For the human assembly hg19/GRCh37: What is the difference between "UCSC + Genes" track, the "GENCODE" track and the "Ensembl Genes" track?
For the human assembly hg38/GRCh38: What are the differences between the "GENCODE" and "All GENCODE" tracks?
What is the difference between GENCODE comprehensive and basic?
What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?
What is CCDS?
How can I just show a single transcript per gene?
This is rather complicated. Can you tell me which gene transcript track I should use?

Return to FAQ Table of Contents

The basics

@@ -196,32 +196,36 @@

What is the difference between "GENCODE Comprehensive" and "GENCODE Basic"?

The "GENCODE" track offers a "basic" gene set, and a "comprehensive" gene set. The "basic" gene set represents a subset of transcripts that GENCODE believes will be useful to the majority of users. The "basic" gene set is defined as follows in the GENCODE FAQ:

"Identifies a subset of representative transcripts for each gene; prioritises full-length protein coding transcripts over partial or non-protein coding transcripts within the same gene, and intends to highlight those transcripts that will be useful to the majority of users."

By default, the track displays only the "basic" set. In order to display the complete -"comprehensive" set, the box can be tickets at the top of the GENCODE track description page. +"comprehensive" set, the box can be ticked at the top of the GENCODE track description page.

+ +

What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?

RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create transcript models. Traditionally, UCSC has aligned RefSeq with BLAT (UCSC RefSeq sub-track) and NCBI has aligned with splign. The advantages of the UCSC alignments are that they are updated more frequently and are available for older assemblies (like GRCh37/hg19), but they are less stable and they are not the official alignments. Therefore we recommend working with the NCBI annotations. When an assembly has an "NCBI RefSeq" track, we show it by default and hide the "UCSC RefSeq" track.

@@ -261,53 +265,63 @@

What is CCDS?

The Consensus Coding Sequence Project is a list of transcript coding sequence (CDS) genomic regions that are identically annotated by RefSeq and Ensembl/GENCODE. CCDS undergoes extensive manual review and you can consider these a subset of either gene track, filtered for high quality. The CCDS identifiers are very stable and allow you to link easily between the different databases. As the name implies, it does not cover UTR regions or non-coding transcripts.

How can I just show a single transcript per gene?

For the tracks "UCSC Genes" (hg19) or "GENCODE Genes" (hg38), click on their title and on the configuration page, uncheck the box "Show splice variants". Only a single transcript will be shown. The method for how this transcript is selected is described in the track documentation below the configuration settings.

+ +

For the track NCBI RefSeq (hg38), you can activate the subtrack "RefSeq HGMD". It contains only the transcripts that are part of the Human Gene Mutation Database.

This is rather complicated. Can you tell me which gene transcript track I should use?

For automated analysis, if you are doing NGS analysis and you need to capture all possible transcripts, GENCODE provides a comprehensive gene set. For human genetics or variant annotation, a more restricted transcript set is usually sufficient and "NCBI RefSeq" is the standard. If you are only interested in protein-coding annotations, CCDS or UniProt may be an option, but this is rather unusual.

For manual inspection of exon boundaries of a single gene, and especially if it is a transcript that is repetitive or hard to align (e.g. very small exons), look at the UCSC RefSeq track and watch for differences between the NCBI and UCSC exon placement. You can also BLAT the transcript sequence. Manually look at ESTs, mRNAs, TransMap and possibly Augustus, Genscan, SIB, SGP or GeneId in obscure cases where you are looking for hints on what an -alternative splicing could look like. +alternative splicing could look like.

+You may also find the Gene Support public session +helpful. This session is a collection of tracks centered around supporting evidence +for genes.