6120d02a6aa7faf052299967d8bace2d2d706ad3 hiram Mon Dec 15 13:06:28 2025 -0800 one additional color code definition for paralogous annotations refs #35776 diff --git src/hg/makeDb/trackDb/TOGAv2.html src/hg/makeDb/trackDb/TOGAv2.html index 995cce59d39..40a83b03d08 100644 --- src/hg/makeDb/trackDb/TOGAv2.html +++ src/hg/makeDb/trackDb/TOGAv2.html @@ -1,134 +1,139 @@
TOGA2
(Tool to infer Orthologs from Genome Alignments 2) [1]
is the next-generation version of the original TOGA method [2].
TOGA2 is a homology-based method that integrates gene annotation, inferring
orthologs and classifying genes as intact or lost.
As TOGA, TOGA2 uses as input the gene annotation of a well-annotated reference species and a pairwise whole genome alignment (alignment chains) between the reference and query genome. Orthologous genomic loci are inferred primarily by alignments of intronic and intergenic regions using machine learning to accurately distinguish orthologous from paralogous or processed pseudogene loci.
To annotate genes, CESAR 2.0 [3] is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species.
TOGA2 differs from TOGA1 in the following major aspects.
For placental mammals, TOGA2 uses as references
Each annotated transcript is named after the reference transcript, gene symbol and the chain identifier: transcriptID#geneID#chainID.
Transcripts ending with #retro are retrogene candidates (processed pseudogenes retaining an intact reading frame).
Transcripts ending with #paralog are classified as paralogous by TOGA2’s machine learning classifier; they only annotated if the respective query locus does not have an orthologous projection.
Each annotated transcript is shown in a color-coded classification as
Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the query's nucleotide/protein sequence, and protein/exon alignments.
This data was prepared by the Michael Hiller's Lab
The TOGA2 software is available from github.com/hillerlab/TOGA2
[1] Malovichko Y, Bein B, Hilgers L, Stephens A, Yi X, Stadager T, Hoppach L, Koch L, Maschiner M, Hiller M. TOGA2 improves speed and accuracy of comparative gene annotation and orthology inference. In preparation
[2] Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Zoonomia Consortium, Hiller M. Integrating gene annotation with orthology inference at scale. Science. 2023 Apr 28;380(6643):eabn3107. PMID: 37104600; PMC: PMC10193443
[3] Sharma V, Schwede P, Hiller M. CESAR 2.0 substantially improves speed and accuracy of comparative gene annotation. Bioinformatics. 2017 Dec 15;33(24):3985-3987. PMID: 28961744
[4] Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK-H. Predicting splicing from primary sequence with deep learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751