bbabbd5d2566d47d923d51dbe350634783455999 mspeir Sun Oct 26 12:14:52 2025 -0700 change soe to gi, refs #35031 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index e43effd5f95..b6acdff0764 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -454,31 +454,31 @@ Mitomap.org, which contains detailed documentation about the the history of this sequence. We also have a Mitomap track with gene annotations and variant information on both hg19 (chrMT) and hg38 (chrM).

Why chrMT? The assembly hg19 has two mitochondrial genomes, chrM (old) and chrMT (current). The reason is that for hg19, no mitochondrial sequence was in the GRCh37 sequence file. The UCSC Genome Browser originally added a chrM sequence when making hg19 that was not the mitochondrial genome sequence later selected by NCBI for GRCh37. This is why the current hg19 version contains two mitochondrial sequences, the old one called "chrM" and the current GRCh37 reference, called "chrMT". The issue is described in detail in our - + hg19 sequence download instructions. If you use hg19 today, chrMT should be considered the current mitochondrial sequence, chrM is only supported for backwards compatibility and legacy annotation files. Our hg19.fa.gz in the "current" download directory contains both sequences, the old hg19.fa.gz in the top level download directory has only chrM, for backwards compatibility for old pipelines and our analysisSet fasta file for aligners contains only chrMT. For most purposes when using hg19, we recommend using the analysis set fasta file.

For hg38, there is no issue, it has only chrM, and all mitochondrial annotations are present on chrM.

How shall I report a gene transcript in a manuscript?

@@ -566,50 +566,50 @@ gene track and every selection method has a different aim. For the knownGene tracks (UCSC genes on hg19, Gencode on hg38 and mm10), data tables called "knownCanonical" were built at UCSC. For both Gencode/Ensembl and RefSeq, the NCBI/EBI project MANE selects for each gene the most relevant transcript, as long as these are identical between Gencode and RefSeq. For NCBI RefSeq, the track RefSeqSelect also selects the most relevant transcript(s) for each gene and is not limited to transcripts that are identical between RefSeq and Ensembl. Therefore, the following gene tracks have "best-transcripts" tracks:

UCSC Genes on hg19: For hg19, the knownCanonical table is a subset of the UCSC Genes track. It was generated at UCSC by identifying a canonical isoform for each cluster ID, or gene. Generally, this is the longest isoform. It can be downloaded directly from the hg19 downloads database +href="http://hgdownload.gi.ucsc.edu/goldenPath/hg19/database/">hg19 downloads database or by using the Table Browser.

Gencode on hg38/mm10 - knownCanonical: For hg38, the knownCanonical table is a subset of the GENCODE v29 track. It was generated at UCSC. As opposed to the hg19 knownCanonical table, which used computationally generated gene clusters and generally chose the longest isoform as the canonical isoform, the hg38 table uses ENSEMBL gene IDs to define clusters (that is to say, one canonical isoform per ENSEMBL gene ID), and the method of choosing the isoform is described as such:

knownCanonical identifies the canonical isoform of each cluster ID or gene using the ENSEMBL gene IDs to define each cluster. The canonical transcript is chosen using the APPRIS principal transcript when available. If no APPRIS tag exists for any transcript associated with the cluster, then a transcript in the BASIC set is chosen. If no BASIC transcript exists, then the longest isoform is used.

It can be downloaded directly from the hg38 downloads database +href="http://hgdownload.gi.ucsc.edu/goldenPath/hg38/database/">hg38 downloads database or by using the Table Browser.

NCBI RefSeq (hg19/hg38): This track collection contains three subtracks that select the most relevant transcript for all or a subset of genes, with slightly different aims: