ad4e4b66fe4aeec66d13d5ce01bbaf03f148c9f8 hiram Fri Dec 12 17:46:52 2025 -0800 updated text for version 2 refs #35776 diff --git src/hg/makeDb/trackDb/TOGAv2.html src/hg/makeDb/trackDb/TOGAv2.html index 88e73527fc1..995cce59d39 100644 --- src/hg/makeDb/trackDb/TOGAv2.html +++ src/hg/makeDb/trackDb/TOGAv2.html @@ -1,85 +1,134 @@
-TOGA version 2.0
-(Tool to infer Orthologs from Genome Alignments)
-is a homology-based method that integrates gene annotation, inferring
+TOGA2
+(Tool to infer Orthologs from Genome Alignments 2) [1]
+is the next-generation version of the original TOGA method [2].
+TOGA2 is a homology-based method that integrates gene annotation, inferring
orthologs and classifying genes as intact or lost.
-As input, TOGA uses a gene annotation of a reference species -(human/hg38 for mammals, chicken/galGal6 for birds) and -a whole genome alignment between the reference and query genome. +As TOGA, TOGA2 uses as input the gene annotation of a well-annotated reference species and +a pairwise whole genome alignment (alignment chains) between the reference and query genome. +Orthologous genomic loci are inferred primarily by alignments of intronic +and intergenic regions using machine learning to accurately distinguish +orthologous from paralogous or processed pseudogene loci.
-TOGA implements a novel paradigm that relies on alignments of intronic -and intergenic regions and uses machine learning to accurately distinguish -orthologs from paralogs or processed pseudogenes. +To annotate genes, CESAR 2.0 [3] is used to determine the positions and boundaries of coding exons of a +reference transcript in the orthologous genomic locus in the query species.
+-To annotate genes, -CESAR 2.0 -is used to determine the positions and boundaries of coding exons of a -reference transcript in the orthologous genomic locus in the query species. +TOGA2 differs from TOGA1 in the following major aspects. +
For placental mammals, TOGA2 uses as references +
+Each annotated transcript is named after the reference transcript, gene symbol and the chain identifier: transcriptID#geneID#chainID.
+Transcripts ending with #retro are retrogene candidates (processed pseudogenes retaining an intact reading frame).
+Transcripts ending with #paralog are classified as paralogous by TOGA2’s machine learning classifier; they only annotated if the respective query locus does not have an orthologous projection.
+
Each annotated transcript is shown in a color-coded classification as
Clicking on a transcript provides additional information about the orthology -classification, inactivating mutations, the protein sequence and protein/exon +classification, inactivating mutations, the query's nucleotide/protein sequence, and protein/exon alignments.
-This data was prepared by the Michael Hiller Lab +This data was prepared by the Michael Hiller's Lab
-The TOGA software is available from -github.com/hillerlab/TOGA +The TOGA2 software is available from +github.com/hillerlab/TOGA2
-Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos -DG, Hilgers L et al. +[1] Malovichko Y, Bein B, Hilgers L, Stephens A, Yi X, Stadager T, Hoppach L, Koch L, Maschiner M, Hiller M. TOGA2 improves speed and accuracy of comparative gene annotation and orthology inference. In preparation +
++[2] Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos DG, Hilgers L, Lindblad-Toh K, Karlsson EK, Zoonomia Consortium, Hiller M. Integrating gene annotation with orthology inference at scale. Science. 2023 Apr 28;380(6643):eabn3107. PMID: 37104600; PMC: PMC10193443
++[3] +Sharma V, Schwede P, Hiller M. CESAR 2.0 substantially improves speed and accuracy of comparative gene annotation. Bioinformatics. 2017 Dec 15;33(24):3985-3987. PMID: 28961744
++[4] +Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB, Chow ED, Kanterakis E, Gao H, Kia A, Batzoglou S, Sanders SJ, Farh KK-H. Predicting splicing from primary sequence with deep learning. Cell. 2019 Jan 24;176(3):535-548.e24. PMID: 30661751 +
+ +