580478f2ae0ce19b376e7285f4d6dd5638a37e41 kuhn Tue Jul 13 10:47:31 2021 -0700 minor language adjustments. Refs #27807 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index c931a0e..ecfe592 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -66,31 +66,31 @@ <p> Transcripts are defined as RNA molecules that are made from a DNA template. Databases like the ones at the National Library of Medicine's NCBI or the European Bioinformatics Institute (EBI) collect these transcript sequences from biologists working on a gene. Every transcript has a unique identifier (accession), a gene that it is assigned to, a sequence, and a list of exon chrom/start/end coordinates on a chromosome. </p> <p>Most genes have multiple transcripts associated with them. Some of these transcripts differ only in the "untranslated regions" (UTRs), while the coding sequence and resulting protein stay the same. Some transcripts may instead stop in the middle of a coding exon, which changes the protein. Some transcripts may even put the exons together in a different way or skip some exons entirely.</p> <p> -While most genes are associated with multiple transcripts, however, each transcript is +While most genes are associated with multiple transcripts, each transcript is only assigned to a single gene (at least in databases). In other words, different genes never share the same transcript. For example, using the databases by NCBI, the gene with the gene symbol <a target=_blank href="https://www.ncbi.nlm.nih.gov/gene/672#">BRCA1</a> has 5 protein-coding transcripts or isoforms. The first transcript has the NCBI accession number <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/NM_007294.3">NM_007294.3</a> which produces the protein with the accession<a target=_blank href="https://www.ncbi.nlm.nih.gov/protein/NP_009225.1"> NP_009225.1</a>. In the human genome, it is located on chromosome 17, where it is comprised of <a target=_blank href="https://www.ncbi.nlm.nih.gov/nuccore/U14680">23 exons</a>. On the version hg38/GRCh38 of the human genome, these exons cover the DNA nucleotides 43044295 to 43125483.</p> @@ -530,31 +530,31 @@ <b>NCBI RefSeq (hg19/hg38)</b>: On this track, there are three subtracks with slightly different aims, all of which show only a single transcript (or less) for each gene. <ul> <li> RefSeq Select: NCBI manually selects few, usually one, transcript per gene called "RefSeq Select", based on <a target=_blank href="https://www.ncbi.nlm.nih.gov/refseq/refseq_select/">various criteria</a>. Example use cases are comparative genomics and variant reporting. This subset is available in the RefSeq Select track under NCBI RefSeq. <li>MANE: RefSeq and the EBI also select one transcript for every protein coding gene that is annotated exactly the same in both Gencode and RefSeq, a project called <a href="https://ncbiinsights.ncbi.nlm.nih.gov/2019/03/12/mane-select-v0-5/" target=_blank>"MANE select"</a>, which is another subtrack of NCBI RefSeq. <li>HGMD: For the special case of clinical diagnostics where an even more reduced number of transcripts simplifies visual inspection, -we also provide another subtrack, "RefSeq HGMD". It contains +we provide another subtrack, "RefSeq HGMD". It contains (usually) a single transcript only for genes known to cause human genetic diseases and the transcript is the one to which all reported HGMD clinical variants can be mapped to. </ul> <a name="whatdo"></a> <h2>This is rather complicated. Can you tell me which gene transcript track I should use?</h2> <p> For automated analysis, if you are doing NGS analysis and you need to capture all possible transcripts, GENCODE provides one of the most comprehensive gene sets. For human genetics or variant annotation, a more restricted transcript set is usually sufficient and "NCBI RefSeq" is the standard. If you are only interested in protein-coding annotations, CCDS or UniProt may be an option, but this is rather unusual. If you are interested in the best splice site coverage, AceView is worth a look. </p>