dc7e4fa869733e45e0abdb1f2b9e4508ef7b802a hiram Tue Sep 20 12:10:34 2022 -0700 correct bioRxiv reference for the paper in preprint status refs #29982 diff --git src/hg/makeDb/trackDb/TOGAannotation.html src/hg/makeDb/trackDb/TOGAannotation.html index c201476..8d1c6d0 100644 --- src/hg/makeDb/trackDb/TOGAannotation.html +++ src/hg/makeDb/trackDb/TOGAannotation.html @@ -1,82 +1,82 @@ <h2>Description</h2> <p> <b>TOGA</b> (<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments) is a homology-based method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost. </p> <h2>Methods</h2> <p> As input, <b>TOGA</b> uses a gene annotation of a reference species (human/hg38 for mammals, chicken/galGal6 for birds) and a whole genome alignment between the reference and query genome. </p> <p> <b>TOGA</b> implements a novel paradigm that relies on alignments of intronic and intergenic regions and uses machine learning to accurately distinguish orthologs from paralogs or processed pseudogenes. </p> <p> To annotate genes, <a href='https://academic.oup.com/bioinformatics/article/33/24/3985/4095639' target=blank>CESAR 2.0</a> is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species. </p> <h2>Display Conventions and Configuration</h2> <p> Each annotated transcript is shown in a color-coded classification as <ul> <li><span style='display:inline-block; width:40px; height:15px; background-color:blue;'> </span> <span style='color:blue'>"intact"</span>: middle 80% of the CDS (coding sequence) is present and exhibits no gene-inactivating mutation. These transcripts likely encode functional proteins.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:lightblue;'> </span> <span style='color:#7193a0'>"partially intact"</span>: 50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation. These transcripts may also encode functional proteins, but the evidence is weaker as parts of the CDS are missing, often due to assembly gaps.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:grey;'> </span> <span style='color:grey'>"missing"</span>: <50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:orange;'> </span> <span style='color:orange'>"uncertain loss"</span>: there is 1 inactivating mutation in the middle 80% of the CDS, but evidence is not strong enough to classify the transcript as lost. These transcripts may or may not encode a functional protein.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:red;'> </span> <span style='color:red'>"lost"</span>: typically several inactivating mutations are present, thus there is strong evidence that the transcript is unlikely to encode a functional protein.</li> </ul> </p> <p> Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the protein sequence and protein/exon alignments. </p> <h2>Credits</h2> <p> This data was prepared by the <a href='https://tbg.senckenberg.de/hillerlab/' target=_blank>Michael Hiller Lab</a> </p> <h2>References</h2> <p> The <b>TOGA</b> software is available from <a href='https://github.com/hillerlab/TOGA' target=_blank>github.com/hillerlab/TOGA</a> </p> <p> Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A, Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M. -Integrating gene annotation with orthology inference at scale. -<a href='https://math.mit.edu/seminars/compbiosem/spring22/hiller_michael.pdf' -target=_blank><em>Under Review</em></a> +<a href='https://www.biorxiv.org/content/10.1101/2022.09.08.507143v1' +target=_blank>TOGA integrates gene annotation with orthology inference +at scale</a>. <em>bioRxiv preprint September 2022</em> </p>