0c195e055727ab6baf38b80224f7cfd522c621ab max Thu Jun 17 02:30:09 2021 -0700 adding a bit of text for the user in ML 2;2D#7703 to explain transcripts diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 839a57c..84eb142 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -47,42 +47,55 @@ <p> The exact definition of "gene" depends on the context. In the context of genome annotation, a gene has at least a name and is defined by a collection of related RNA transcript sequences ("isoforms"). The naming of genes and the assignment of the most important transcript sequences is often done manually by a group of biological literature curators. For human, genes names are created by the <a target=_blank href="https://www.genenames.org/">Human Gene Nomenclature Committee (HGNC, formerly HUGO)</a>. Non-human species have similar annotation groups, e.g. Mouse Genome Informatics, Wormbase, Flybase, etc. </p> <a name="genestrans"></a> <h6>What is a transcript and how is it related to a gene? </h6> <p> +In the Genome Browser, transcript tracks often end with the word +"Genes", e.g. "Ensembl Genes", "NCBI RefSeq Genes" or "UCSC +Genes", but they really represent transcripts on chromosomes of a genome assembly.</p> +<p> Transcripts are defined as RNA molecules that are made from a DNA template. Databases like the ones at the National Library of Medicine's NCBI or the European Bioinformatics Institute (EBI) collect these transcript sequences from biologists working on a gene. Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and -a list of exon chrom/start/end coordinates on a chromosome. -Usually every transcript is assigned to only a single gene. In the Genome Browser, transcript -tracks often end with the word -"Genes", e.g. "Ensembl Genes", "NCBI RefSeq Genes" or "UCSC -Genes", but they really represent transcripts on chromosomes of a genome assembly.</p> +a list of exon chrom/start/end coordinates on a chromosome. </p> + +<p>A gene usually has multiple transcripts. Some of these differ in only the +"untranslated region" (UTR), and the coding sequence and protein stay the same. +Some of the transcripts may stop in the middle of a coding exon, so they +change the protein. +Some transcripts of the same gene differ in the way the exons are put together, +and some exons are skipped entirely, so the transcript contains parts of the +coding sequence of other transcripts, as a new combination in the same order. +</p> + <p> -For example, using the databases by NCBI, the gene +So almost every human gene has multiple transcripts, but, at least in +databases, every transcript is assigned to only a single gene. For example, +using the databases +by NCBI, the gene with the gene symbol <a target=_blank href="https://www.ncbi.nlm.nih.gov/gene/672#">BRCA1</a> has 5 protein-coding transcripts or isoforms. The first transcript has the NCBI accession number <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/NM_007294.3">NM_007294.3</a> which produces the protein with the accession<a target=_blank href="https://www.ncbi.nlm.nih.gov/protein/NP_009225.1"> NP_009225.1</a>. In the human genome, it is located on chromosome 17, where it is comprised of <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/U14680">23 exons</a>. On the version hg38/GRCh38 of the human genome, these exons cover the DNA nucleotides 43044295 to 43125483.</p> <a name="genename"></a> <h6>What is a gene or transcript accession? </h6>