src/hg/makeDb/trackDb/TOGAannotation.html d2ac08231c388dcae68ce7031d5a7dddb91beb97

d2ac08231c388dcae68ce7031d5a7dddb91beb97
hiram
  Mon Sep 12 10:55:19 2022 -0700
expanded documentation from Michael refs #29982

diff --git src/hg/makeDb/trackDb/TOGAannotation.html src/hg/makeDb/trackDb/TOGAannotation.html
index 839dafa..c201476 100644
--- src/hg/makeDb/trackDb/TOGAannotation.html
+++ src/hg/makeDb/trackDb/TOGAannotation.html
@@ -1,34 +1,82 @@
 <h2>Description</h2>
 <p>
-<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments
+<b>TOGA</b>
+(<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments)
+is a homology-based method that integrates gene annotation, inferring
+orthologs and classifying genes as intact or lost.
 </p>
+
+<h2>Methods</h2>
 <p>
-<b>TOGA</b> is a new method that integrates gene annotation, inferring orthologs
-and classifying genes as intact or lost.
+As input, <b>TOGA</b> uses a gene annotation of a reference species
+(human/hg38 for mammals, chicken/galGal6 for birds) and
+a whole genome alignment between the reference and query genome.
 </p>
 <p>
-<b>TOGA</b> implements a novel machine learning based paradigm to infer
-orthologous genes between related species and to accurately distinguish
+<b>TOGA</b> implements a novel paradigm that relies on alignments of intronic
+and intergenic regions and uses machine learning to accurately distinguish
 orthologs from paralogs or processed pseudogenes.
 </p>
+<p>
+To annotate genes,
+<a href='https://academic.oup.com/bioinformatics/article/33/24/3985/4095639'
+target=blank>CESAR 2.0</a>
+is used to determine the positions and boundaries of coding exons of a
+reference transcript in the orthologous genomic locus in the query species.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each annotated transcript is shown in a color-coded classification as
+<ul>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:blue;'>&nbsp;</span>
+    <span style='color:blue'>"intact"</span>: middle 80% of the CDS
+    (coding sequence) is present and exhibits no gene-inactivating mutation.
+    These transcripts likely encode functional proteins.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:lightblue;'>&nbsp;</span>
+    <span style='color:#7193a0'>"partially intact"</span>: 50% of the CDS
+     is present in the query and the middle 80% of the CDS exhibits no
+     inactivating mutation. These transcripts may also encode functional
+     proteins, but the evidence is weaker as parts of the CDS are missing,
+     often due to assembly gaps.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:grey;'>&nbsp;</span>
+    <span style='color:grey'>"missing"</span>: &lt;50% of the CDS is present
+     in the query and the middle 80% of the CDS exhibits no inactivating
+     mutation.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:orange;'>&nbsp;</span>
+    <span style='color:orange'>"uncertain loss"</span>: there is 1
+     inactivating mutation in the middle 80% of the CDS, but evidence is not
+     strong enough to classify the transcript as lost. These transcripts may
+     or may not encode a functional protein.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:red;'>&nbsp;</span>
+    <span style='color:red'>"lost"</span>: typically several inactivating
+     mutations are present, thus there is strong evidence that the transcript
+     is unlikely to encode a functional protein.</li>
+</ul>
+</p>
+<p>
+Clicking on a transcript provides additional information about the orthology
+classification, inactivating mutations, the protein sequence and protein/exon
+alignments.
+</p>
 
 <h2>Credits</h2>
 <p>
 This data was prepared by the <a href='https://tbg.senckenberg.de/hillerlab/'
 target=_blank>Michael Hiller Lab</a>
 </p>
 
 <h2>References</h2>
 <p>
 The <b>TOGA</b> software is available from
 <a href='https://github.com/hillerlab/TOGA'
 target=_blank>github.com/hillerlab/TOGA</a>
 </p>
 
 <p>
 Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales A,
 Ahmed AW, Kontopoulos DG, Hilgers L, Zoonomia Consortium, Hiller M.
 Integrating gene annotation with orthology inference at scale.
 <a href='https://math.mit.edu/seminars/compbiosem/spring22/hiller_michael.pdf'
 target=_blank><em>Under Review</em></a>
 </p>