a86827f92ed8795e87539b2829821c0d824b5f36 lrnassar Tue Mar 19 15:07:51 2019 -0700 More work on unreleased FAQgenes page ref#22696 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index f33ca1e..0a1fc49 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -4,32 +4,32 @@
Return to FAQ Table of Contents
The "GENCODE" track offers a "basic" gene set, and a "comprehensive" gene set. The "basic" gene set represents a subset of transcripts that GENCODE believes will be useful to the majority of users. The "basic" gene set is defined as follows in the GENCODE FAQ:
"Identifies a subset of representative transcripts for each gene; prioritises full-length protein coding transcripts over partial or non-protein coding transcripts within the same gene, and intends to highlight those transcripts that will be useful to the majority of users."
By default, the track displays only the "basic" set. In order to display the complete -"comprehensive" set, the box can be tickets at the top of the GENCODE track description page. +"comprehensive" set, the box can be ticked at the top of the GENCODE track description page.
+ +RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create transcript models. Traditionally, UCSC has aligned RefSeq with BLAT (UCSC RefSeq sub-track) and NCBI has aligned with splign. The advantages of the UCSC alignments are that they are updated more frequently and are available for older assemblies (like GRCh37/hg19), but they are less stable and they are not the official alignments. Therefore we recommend working with the NCBI annotations. When an assembly has an "NCBI RefSeq" track, we show it by default and hide the "UCSC RefSeq" track.
@@ -261,53 +265,63 @@The Consensus Coding Sequence Project is a list of transcript coding sequence (CDS) genomic regions that are identically annotated by RefSeq and Ensembl/GENCODE. CCDS undergoes extensive manual review and you can consider these a subset of either gene track, filtered for high quality. The CCDS identifiers are very stable and allow you to link easily between the different databases. As the name implies, it does not cover UTR regions or non-coding transcripts.
For the tracks "UCSC Genes" (hg19) or "GENCODE Genes" (hg38), click on their title and on the configuration page, uncheck the box "Show splice variants". Only a single transcript will be shown. The method for how this transcript is selected is described in the track documentation below the configuration settings.
+
+
+
For the track NCBI RefSeq (hg38), you can activate the subtrack "RefSeq HGMD". It contains only the transcripts that are part of the Human Gene Mutation Database.
For automated analysis, if you are doing NGS analysis and you need to capture all possible transcripts, GENCODE provides a comprehensive gene set. For human genetics or variant annotation, a more restricted transcript set is usually sufficient and "NCBI RefSeq" is the standard. If you are only interested in protein-coding annotations, CCDS or UniProt may be an option, but this is rather unusual.
For manual inspection of exon boundaries of a single gene, and especially if it is a transcript that is repetitive or hard to align (e.g. very small exons), look at the UCSC RefSeq track and watch for differences between the NCBI and UCSC exon placement. You can also BLAT the transcript sequence. Manually look at ESTs, mRNAs, TransMap and possibly Augustus, Genscan, SIB, SGP or GeneId in obscure cases where you are looking for hints on what an -alternative splicing could look like. +alternative splicing could look like.
++You may also find the Gene Support public session +helpful. This session is a collection of tracks centered around supporting evidence +for genes.