7e0c9e2676d69bfb086eaa1fa8c8c02274fbf347 max Fri Jul 2 06:40:08 2021 -0700 changes after doc review by Jonathan, refs #27778 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 84eb142..8dd1306 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -48,52 +48,51 @@ The exact definition of "gene" depends on the context. In the context of genome annotation, a gene has at least a name and is defined by a collection of related RNA transcript sequences ("isoforms"). The naming of genes and the assignment of the most important transcript sequences is often done manually by a group of biological literature curators. For human, genes names are created by the <a target=_blank href="https://www.genenames.org/">Human Gene Nomenclature Committee (HGNC, formerly HUGO)</a>. Non-human species have similar annotation groups, e.g. Mouse Genome Informatics, Wormbase, Flybase, etc. </p> <a name="genestrans"></a> <h6>What is a transcript and how is it related to a gene? </h6> <p> In the Genome Browser, transcript tracks often end with the word -"Genes", e.g. "Ensembl Genes", "NCBI RefSeq Genes" or "UCSC -Genes", but they really represent transcripts on chromosomes of a genome assembly.</p> +"Genes", e.g. "Ensembl Genes", "NCBI RefSeq Genes", or "UCSC +Genes". Despite the name, items in these tracks actually represent +transcripts on chromosomes of a genome assembly</p> <p> Transcripts are defined as RNA molecules that are made from a DNA template. Databases like the ones at the National Library of Medicine's NCBI or the European Bioinformatics Institute (EBI) collect these transcript sequences from biologists working on a gene. Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and a list of exon chrom/start/end coordinates on a chromosome. </p> -<p>A gene usually has multiple transcripts. Some of these differ in only the -"untranslated region" (UTR), and the coding sequence and protein stay the same. -Some of the transcripts may stop in the middle of a coding exon, so they -change the protein. -Some transcripts of the same gene differ in the way the exons are put together, -and some exons are skipped entirely, so the transcript contains parts of the -coding sequence of other transcripts, as a new combination in the same order. -</p> +<p>Most genes have multiple transcripts associated with them. Some of these transcripts +differ only in the "untranslated regions" (UTRs), while the coding sequence and resulting +protein stay the same. Some transcripts may instead stop in the middle of a coding exon, +which changes the protein. Some transcripts may even put the exons together in a different +way or skip some exons entirely.</p> <p> -So almost every human gene has multiple transcripts, but, at least in -databases, every transcript is assigned to only a single gene. For example, +While most genes are associated with multiple transcripts, however, each transcript is +only assigned to a single gene (at least in databases). In other words, different genes +never share the same transcript. For example, using the databases by NCBI, the gene with the gene symbol <a target=_blank href="https://www.ncbi.nlm.nih.gov/gene/672#">BRCA1</a> has 5 protein-coding transcripts or isoforms. The first transcript has the NCBI accession number <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/NM_007294.3">NM_007294.3</a> which produces the protein with the accession<a target=_blank href="https://www.ncbi.nlm.nih.gov/protein/NP_009225.1"> NP_009225.1</a>. In the human genome, it is located on chromosome 17, where it is comprised of <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/U14680">23 exons</a>. On the version hg38/GRCh38 of the human genome, these exons cover the DNA nucleotides 43044295 to 43125483.</p> <a name="genename"></a>