81c2c4b94d02441ac67de2fe14d47c03b4ff1f31 kuhn Thu May 13 16:24:02 2021 -0700 grammar nit diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index e924873..839a57c 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -75,31 +75,31 @@ href="https://www.ncbi.nlm.nih.gov/gene/672#">BRCA1</a> has 5 protein-coding transcripts or isoforms. The first transcript has the NCBI accession number <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/NM_007294.3">NM_007294.3</a> which produces the protein with the accession<a target=_blank href="https://www.ncbi.nlm.nih.gov/protein/NP_009225.1"> NP_009225.1</a>. In the human genome, it is located on chromosome 17, where it is comprised of <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/U14680">23 exons</a>. On the version hg38/GRCh38 of the human genome, these exons cover the DNA nucleotides 43044295 to 43125483.</p> <a name="genename"></a> <h6>What is a gene or transcript accession? </h6> <p> -Gene symbols like BRCA1 are easy to remember but sometimes change and are not +Gene symbols such as BRCA1 are easy to remember but sometimes change and are not specific to an organism. Therefore most databases internally use unique identifiers to refer to sequences and some journals require authors to use these in manuscripts.</p> <p> The most common accession numbers encountered by users are either from Ensembl, GENCODE or RefSeq. Human Ensembl/GENCODE gene accession numbers start with ENSG followed by a number and version number separated by a dot, e.g. "ENSG00000012048.21" for latest BRCA1. Every ENSG-gene has at least one transcript assigned to it. The transcript identifiers start with with ENST and are likewise followed by a version number, e.g. "ENST00000619216.1". Additional details on Ensembl IDs can be found on the <a target="_blank" href="https://uswest.ensembl.org/Help/Faq?id=488">Ensembl FAQ page</a>.</p> @@ -301,31 +301,31 @@ display the complete "comprehensive" set, the box can be ticked at the top of the <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=knownGene">GENCODE track description page</a>.</p> <p class='text-center'> <img class='text-center' src="../images/ComprehensiveSet.png" alt="Turning on comprehensive gene set" width="750"> <a name="ncbiRefseq"></a> <h6>What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</h6> <p> RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create annotations and UCSC and NCBI create alignments with different software (BLAT and splign, respectively). The advantages of the UCSC alignments are that -they are updated constantly even for older assemblies, like GRCh37/hg19. +they are updated constantly even for older assemblies, such as GRCh37/hg19. The advantage of NCBI alignments are that they are placed manually to a chromosome location and are the official alignments, e.g. for databases and manuscripts. Therefore, we recommend working with the NCBI annotations and when an assembly has an "NCBI RefSeq" track, we show it by default and hide the "UCSC RefSeq" track. </p> <p>The UCSC alignments can differ from the NCBI alignments for two reasons:</p> <p><b>Very similar transcripts:</b> Let's take the case of two almost-identical transcripts sequences in RefSeq, with two genes in the genome where they could be placed. NCBI has a rule to place every transcript only once, and transcripts are manually tied to a chromosome band or location by NCBI, so each gene will get one and only one transcript of two. NCBI RefSeq will have two genes with one transcript each. UCSC RefSeq though places all