ed1738b3b8296a6b7e7376336f054eae0b2c88c6 max Wed Jul 27 03:09:02 2022 -0700 adding clearer section header to genes faq, refs #29778 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 52d3c97..b0d238a 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -323,47 +323,47 @@
What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?

RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create annotations and UCSC and NCBI create alignments with different software (BLAT and splign, respectively). The advantages of the UCSC alignments are that they are updated constantly even for older assemblies, such as GRCh37/hg19. The advantage of NCBI alignments are that they are placed manually to a chromosome location and are the official alignments, e.g. for databases and manuscripts. Therefore, we recommend working with the NCBI annotations and when an assembly has an "NCBI RefSeq" track, we show it by default and hide the "UCSC RefSeq" track. The only exception may be hg19 (see the note at the end of this section).

The UCSC alignments can differ from the NCBI alignments for two reasons:

-

Very similar transcripts: +

Very similar transcripts resulting in transcript location swaps or duplicated transcripts: Let's take the case of two almost-identical transcripts sequences in RefSeq, with two genes in the genome where they could be placed. NCBI has a rule to place every transcript only once, and transcripts are manually tied to a chromosome band or location by NCBI, so each gene will get one and only one transcript of two. NCBI RefSeq will have two genes with one transcript each. UCSC RefSeq though places all transcripts where they align at very high identity, so both genes will get -annotated with both transcripts. For example, +annotated with both transcripts, creating duplicates. For example, the transcript NM_001012276 has three almost-identical possible -placements to the genome in the UCSC RefSeq track, as it is entirely alignment-based without any manual filtering, -but NM_001012276.3 is shown at a single location in the NCBI RefSeq track, as the NCBI -software will only retain the alignment at the manually annotated location. It -may be good to know about almost-identical alignments when doing genomic -analysis or manual inspection of NGS read alignments, but for clinical -reporting purposes or other automated analyses, we strongly recommend to use -the NCBI RefSeq track. +placements to the genome in the UCSC RefSeq track, as it is entirely alignment-based without any manual filtering. +The same transcript NM_001012276.3 is shown at a single location in the NCBI +RefSeq track, as the NCBI software will only retain the alignment at the +manually annotated location. It may be good to know about almost-identical +alignments when doing genomic analysis or manual inspection of NGS read +alignments, but for clinical reporting purposes or other automated analyses, we +strongly recommend to use the NCBI RefSeq track.

Unclear exon boundaries: In some rare cases, the NCBI and UCSC exon boundaries differ. This happens especially when sequence deletions in the genome make the placement very difficult. Activating both RefSeq and UCSC RefSeq tracks helps you investigate the differences. Activating the RefSeq Alignments track shows NCBI's splign alignments in more detail, including double lines where both transcript and genomic sequence are skipped in the alignment. When available, the RefSeq Diffs subtrack may be helpful too. The upcoming MANE gene set will contain a set of high-quality transcripts that are 100% alignable to the genome and are part of both RefSeq and Ensembl/GENCODE but at the time of writing this project is at an early stage.