6d7150d4b11cf16b7b58d4c1eae3dbf6d240dffd max Wed Jan 31 09:25:51 2024 -0800 fixing track refs, email from M Hiller diff --git src/hg/makeDb/trackDb/TOGAannotation.html src/hg/makeDb/trackDb/TOGAannotation.html index 6237b21..0b73ea6 100644 --- src/hg/makeDb/trackDb/TOGAannotation.html +++ src/hg/makeDb/trackDb/TOGAannotation.html @@ -1,82 +1,85 @@ <h2>Description</h2> <p> <b>TOGA</b> (<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments) is a homology-based method that integrates gene annotation, inferring orthologs and classifying genes as intact or lost. </p> <h2>Methods</h2> <p> As input, <b>TOGA</b> uses a gene annotation of a reference species (human/hg38 for mammals, chicken/galGal6 for birds) and a whole genome alignment between the reference and query genome. </p> <p> <b>TOGA</b> implements a novel paradigm that relies on alignments of intronic and intergenic regions and uses machine learning to accurately distinguish orthologs from paralogs or processed pseudogenes. </p> <p> To annotate genes, <a href="https://academic.oup.com/bioinformatics/article/33/24/3985/4095639" target="blank">CESAR 2.0</a> is used to determine the positions and boundaries of coding exons of a reference transcript in the orthologous genomic locus in the query species. </p> <h2>Display Conventions and Configuration</h2> <p> Each annotated transcript is shown in a color-coded classification as <ul> <li><span style='display:inline-block; width:40px; height:15px; background-color:blue;'> </span> <span style='color:blue'>"intact"</span>: middle 80% of the CDS (coding sequence) is present and exhibits no gene-inactivating mutation. These transcripts likely encode functional proteins.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:lightblue;'> </span> <span style='color:#7193a0'>"partially intact"</span>: 50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation. These transcripts may also encode functional proteins, but the evidence is weaker as parts of the CDS are missing, often due to assembly gaps.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:grey;'> </span> <span style='color:grey'>"missing"</span>: <50% of the CDS is present in the query and the middle 80% of the CDS exhibits no inactivating mutation.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:orange;'> </span> <span style='color:orange'>"uncertain loss"</span>: there is 1 inactivating mutation in the middle 80% of the CDS, but evidence is not strong enough to classify the transcript as lost. These transcripts may or may not encode a functional protein.</li> <li><span style='display:inline-block; width:40px; height:15px; background-color:red;'> </span> <span style='color:red'>"lost"</span>: typically several inactivating mutations are present, thus there is strong evidence that the transcript is unlikely to encode a functional protein.</li> </ul> </p> <p> Clicking on a transcript provides additional information about the orthology classification, inactivating mutations, the protein sequence and protein/exon alignments. </p> <h2>Credits</h2> <p> This data was prepared by the <a href="https://tbg.senckenberg.de/hillerlab/" target="_blank">Michael Hiller Lab</a> </p> <h2>References</h2> <p> The <b>TOGA</b> software is available from <a href="https://github.com/hillerlab/TOGA" target="_blank">github.com/hillerlab/TOGA</a> </p> <p> -Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A, -Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M. -<a href="https://www.biorxiv.org/content/10.1101/2022.09.08.507143v1" -target="_blank">TOGA integrates gene annotation with orthology inference -at scale</a>. <em>bioRxiv preprint September 2022</em> +Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos +DG, Hilgers L <em>et al</em>. +<a href="https://www.science.org/doi/abs/10.1126/science.abn3107?url_ver=Z39.88-2003&rfr_id=ori: +rid:crossref.org&rfr_dat=cr_pub%20%200pubmed" target="_blank"> +Integrating gene annotation with orthology inference at scale</a>. +<em>Science</em>. 2023 Apr 28;380(6643):eabn3107. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37104600" target="_blank">37104600</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193443/" target="_blank">PMC10193443</a> </p>