223c03855d75f770eaa40b5d30d2c631f0f9df71 max Mon Jul 13 14:09:11 2020 -0700 adding multi-placement example to genes faq, refs #25883 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 1d86cb9..5ea3d01 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -302,37 +302,49 @@ "comprehensive" set, the box can be ticked at the top of the <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=knownGene">GENCODE track description page</a>.</p> <p class='text-center'> <img class='text-center' src="../images/ComprehensiveSet.png" alt="Turning on comprehensive gene set" width="750"> <a name="ncbiRefseq"></a> <h6>What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</h6> <p> RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create transcript models. Traditionally, UCSC has aligned RefSeq with BLAT (UCSC RefSeq sub-track) and NCBI has aligned with splign. The advantages of the UCSC alignments are that they are updated more frequently and are available for older assemblies (like -GRCh37/hg19), but they are less stable and they are not the official alignments. -Therefore we recommend working with the NCBI annotations. +GRCh37/hg19), but they are not placed manually to a chromosome location and are not the official alignments. +Therefore, we recommend working with the NCBI annotations. When an assembly has an "NCBI RefSeq" track, we show it by default and hide the "UCSC RefSeq" track. </p> <p> +NCBI transcripts are manually tied to a chromosome band or location. The advantage is that +when there are two almost-identical transcripts in RefSeq, each one will be +placed to the official reference location in the NCBI annotations. For example, +the transcript NM_001012276 has three almost-identical possible +placements to the genome in the UCSC RefSeq track (as it is entirely alignment-based), +but NM_001012276.3 is shown at a single location in the NCBI RefSeq track. It +may be good to know about the almost-identical alignments when doing genomics +analysis, but for clinical reporting purposes, it is preferable to use the NCBI +RefSeq track. +</p> + +<p> In some rare cases, the NCBI and UCSC exon boundaries differ. Activating both RefSeq and UCSC RefSeq tracks helps you investigate the differences. Activating the RefSeq Alignments track shows NCBI's splign alignments in more detail, including double lines where both transcript and genomic sequence are skipped in the alignment. When available, the RefSeq Diffs subtrack may be helpful too. The upcoming <a target=_blank href=https://ncbiinsights.ncbi.nlm.nih.gov/2018/10/11/matched-annotation-by-ncbi-and-embl-ebi-mane-a-new-joint-venture-to-define-a-set-of-representative-transcripts-for-human-protein-coding-genes/>MANE gene set</a> will contain a set of high-quality transcripts that are 100% alignable to the genome and are part of both RefSeq and Ensembl/GENCODE but at the time of writing this project is at an early stage. </p> <p> An anecdotal and rare example is SHANK2 and SHANK3 in hg19. It is impossible for either NCBI or BLAT to get the correct alignment and gene model because the genome sequence is missing for part of the gene. NCBI and BLAT find slightly different exon