59bde4b3cbc32dfab310226c474cf5dabc1dd383 max Wed Dec 4 05:30:07 2019 -0800 adding help on refseq annotation release, refs #24574 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 72fbb88..6fbc91b 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -10,30 +10,31 @@

Topics


Return to FAQ Table of Contents

The basics

The genome browser contains many gene annotation tracks. Our users often wonder what these contain and where the information that we present comes @@ -319,39 +320,65 @@

An anecdotal and rare example is SHANK2 and SHANK3 in hg19. It is impossible for either NCBI or BLAT to get the correct alignment and gene model because the genome sequence is missing for part of the gene. NCBI and BLAT find slightly different exon boundaries at the edge of the problematic region. NCBI's aligner tries very hard to find exons that align to any transcript sequence, so it calls a few small dubious "exons" in the affected genomic region. GENCODE V19 also used an aligner that tried very hard to find exons, but it found small dubious "exons" in different places than NCBI. The RefSeq Alignments subtrack makes the problematic region very clear with double lines indicating unalignable transcript sequence.

+ +
How shall I report a gene transcript in a manuscript?
+ +

+When reporting on GENCODE/Ensembl transcripts, please specify the ENST +identifier. It is often helpful to also specify the Ensembl release, +which is shown on the details page, when you click onto a transcript. +

+

-When reporting results as RefSeq coordinates, e.g. as HGVS, in research -articles, please specify the RefSeq annotation release and also the -RefSeq transcript ID with version (e.g. NM_012309.4 not NM_012309). -Different RefSeq transcript versions have different sequence (for example, -more sequence may be added to the UTRs or even the CDS), and so the transcript coordinates -often change from one version to the next. +When reporting RefSeq transcripts, e.g. in HGVS, prefer the "NCBI RefSeq" track +over the "UCSC RefSeq track". Please specify the RefSeq transcript ID and +also the RefSeq annotation release.

+ +
What is CCDS?

The Consensus Coding Sequence Project is a list of transcript coding sequence (CDS) genomic regions that are identically annotated by RefSeq and Ensembl/GENCODE. CCDS undergoes extensive manual review and you can consider these a subset of either gene track, filtered for high quality. The CCDS identifiers are very stable and allow you to link easily between the different databases. As the name implies, it does not cover UTR regions or non-coding transcripts.

How can I show a single transcript per gene?