dd22c78b55e57f9e6495a2a836b241e94d051a5c gperez2 Tue Feb 28 10:00:01 2023 -0800 code review edits, refs #30712 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 32a4376..86fad5b 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -103,31 +103,31 @@ <p> Gene symbols such as BRCA1 are easy to remember but sometimes change and are not specific to an organism. Therefore most databases internally use unique identifiers to refer to sequences and some journals require authors to use these in manuscripts.</p> <p> The most common accession numbers encountered by users are either from Ensembl, GENCODE or RefSeq. Human Ensembl/GENCODE gene accession numbers start with ENSG followed by a number and version number separated by a dot, e.g. "ENSG00000012048.21" for latest BRCA1. Every ENSG-gene has at least one transcript assigned to it. The transcript identifiers start with with ENST and are likewise followed by a version number, e.g. "ENST00000619216.1". Additional details on Ensembl IDs can be found on the <a target="_blank" -href="https://useast.ensembl.org/Help/Faq?id=488">Ensembl FAQ page</a>.</p> +href="https://www.ensembl.org/Help/Faq?id=488">Ensembl FAQ page</a>.</p> <p> NCBI refers to genes with plain numbers, e.g. 672 for BRCA1. Manually curated RefSeq transcript identifiers start with NM_ (coding) or NR_ (non-coding), followed by a number and version number separated by a dot, e.g. "NR_046018.2". If the transcript was predicted by the NCBI Gnomon software, the prefix is XM_ but these are rare in human. A table of these and other RefSeq prefixes can be found on the <a target=_blank href="https://www.ncbi.nlm.nih.gov/books/NBK21091/table/ch18.T.refseq_accession_numbers_and_mole/?report=objectonly"> NCBI website</a>. </p> <a name="mostCommon"></a> <h6>What are the most common gene transcript tracks?</h6> @@ -298,31 +298,31 @@ <a name="gencode"></a> <h6>What is the difference between "GENCODE Comprehensive" and "GENCODE Basic"?</h6> <p> The "<a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=knownGene">GENCODE</a>" track offers a "basic" gene set, and a "comprehensive" gene set. The "basic" gene set represents a subset of transcripts that GENCODE believes will be useful to the majority of users. The "basic" gene set is defined as follows in the <a target=_blank href="https://www.gencodegenes.org/pages/tags.html">GENCODE FAQ</a>:</p> <p><i> "Identifies a subset of representative transcripts for each gene; prioritises full-length protein coding transcripts over partial or non-protein coding transcripts within the same gene, and intends to highlight those transcripts that will be useful to the majority of users."</i></p> <p> A more comprehensive definition can also be found in the <a target=_blank -href="https://useast.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#basic"> +href="https://www.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#basic"> Ensembl FAQ</a>. By default, the track displays only the "basic" set. In order to display the complete "comprehensive" set, the box can be ticked at the top of the <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=knownGene">GENCODE track description page</a>.</p> <p class='text-center'> <img class='text-center' src="../images/ComprehensiveSet.png" alt="Turning on comprehensive gene set" width="750"> <a name="ncbiRefseq"></a> <h6>What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</h6> <p> RefSeq gene transcripts, unlike GENCODE/Ensembl/UCSC Genes, are sequences that can differ from the genome. They need to be aligned to the genome to create annotations and UCSC and NCBI create alignments with different software (BLAT and splign, respectively).