b3ee4097a9c129570ba2f8f7d7b376834d9ef4d0 lrnassar Wed Mar 20 11:20:23 2019 -0700 Minor modifications to FAQgenes ref#22696 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 0de5271..ea9a494 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -30,33 +30,33 @@ I should use?</a></li> </ul> <hr> <p> <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="gene"></a> <h2>The basics</h2> The genome browser contains many gene annotation tracks. Our users often wonder what these contain and where the information that we present comes from. <h6>What is a gene?</h6> <p> -The exact definition of "gene" depends on the context. In the context of +The exact definition of "gene" depends on the context. In the context of genome annotation, a gene has at least a name and is defined by a collection of -related mRNA transcript sequences ("isoforms"). The naming of genes and the +related mRNA transcript sequences ("isoforms"). The naming of genes and the assignment of the most important transcript sequences is often done manually by a group of biological literature curators. For human, genes names are created by the <a target=_blank href="https://www.genenames.org/">Human Gene Nomenclature Committee (HGNC, formerly HUGO)</a>. Non-human species have similar annotation groups, e.g. Mouse Genome Informatics, Wormbase, Flybase, etc. </p> <a name="genestrans"></a> <h6>What is a transcript and how is it related to a gene? </h6> <p> Transcripts are defined as RNA molecules that are made from a DNA template. Databases like the ones at the National Library of Medicine's NCBI or the European Bioinformatics Institute (EBI) collect these transcript sequences from biologists working on a gene. Every transcript has a @@ -65,31 +65,31 @@ Usually every transcript is assigned to only a single gene. In the Genome Browser, transcript tracks often end with the word "Genes", e.g. "Ensembl Genes", "NCBI RefSeq Genes" or "UCSC Genes", but they really represent transcripts on chromosomes of a genome assembly.</p> <p> For example, using the databases by NCBI, the gene with the gene symbol <a target=_blank href="https://www.ncbi.nlm.nih.gov/gene/672#">BRCA1</a> has 5 protein-coding transcripts or isoforms. The first transcript has the NCBI accession number <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/NM_007294.3">NM_007294.3</a> which produces the protein with the accession<a target=_blank href="https://www.ncbi.nlm.nih.gov/protein/NP_009225.1"> NP_009225.1</a>. In the human genome, it is located on chromosome 17, where it is comprised of <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/U14680">23 exons</a>. -On the version GRCh38 of the human genome, these exons cover the DNA +On the version hg38/GRCh38 of the human genome, these exons cover the DNA nucleotides 43044295 to 43125483.</p> <a name="genename"></a> <h6>What is a gene or transcript accession? </h6> <p> Gene symbols like BRCA1 are easy to remember but sometimes change and are not specific to an organism. Therefore most databases internally use unique identifiers to refer to sequences and some journals require authors to use these in manuscripts.</p> <p> The most common accession numbers encountered by users are either from Ensembl, GENCODE or RefSeq. Human Ensembl/GENCODE gene accession numbers start with ENSG, e.g. "ENSG00000012048" for BRCA1. Every ENSG-gene has at least