d2ac08231c388dcae68ce7031d5a7dddb91beb97
hiram
  Mon Sep 12 10:55:19 2022 -0700
expanded documentation from Michael refs #29982

diff --git src/hg/makeDb/trackDb/TOGAannotation.html src/hg/makeDb/trackDb/TOGAannotation.html
index 839dafa..c201476 100644
--- src/hg/makeDb/trackDb/TOGAannotation.html
+++ src/hg/makeDb/trackDb/TOGAannotation.html
@@ -1,28 +1,76 @@
 <h2>Description</h2>
 <p>
-<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments
+<b>TOGA</b>
+(<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments)
+is a homology-based method that integrates gene annotation, inferring
+orthologs and classifying genes as intact or lost.
 </p>
+
+<h2>Methods</h2>
 <p>
-<b>TOGA</b> is a new method that integrates gene annotation, inferring orthologs
-and classifying genes as intact or lost.
+As input, <b>TOGA</b> uses a gene annotation of a reference species
+(human/hg38 for mammals, chicken/galGal6 for birds) and
+a whole genome alignment between the reference and query genome.
 </p>
 <p>
-<b>TOGA</b> implements a novel machine learning based paradigm to infer
-orthologous genes between related species and to accurately distinguish
+<b>TOGA</b> implements a novel paradigm that relies on alignments of intronic
+and intergenic regions and uses machine learning to accurately distinguish
 orthologs from paralogs or processed pseudogenes.
 </p>
+<p>
+To annotate genes,
+<a href='https://academic.oup.com/bioinformatics/article/33/24/3985/4095639'
+target=blank>CESAR 2.0</a>
+is used to determine the positions and boundaries of coding exons of a
+reference transcript in the orthologous genomic locus in the query species.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each annotated transcript is shown in a color-coded classification as
+<ul>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:blue;'>&nbsp;</span>
+    <span style='color:blue'>"intact"</span>: middle 80% of the CDS
+    (coding sequence) is present and exhibits no gene-inactivating mutation.
+    These transcripts likely encode functional proteins.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:lightblue;'>&nbsp;</span>
+    <span style='color:#7193a0'>"partially intact"</span>: 50% of the CDS
+     is present in the query and the middle 80% of the CDS exhibits no
+     inactivating mutation. These transcripts may also encode functional
+     proteins, but the evidence is weaker as parts of the CDS are missing,
+     often due to assembly gaps.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:grey;'>&nbsp;</span>
+    <span style='color:grey'>"missing"</span>: &lt;50% of the CDS is present
+     in the query and the middle 80% of the CDS exhibits no inactivating
+     mutation.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:orange;'>&nbsp;</span>
+    <span style='color:orange'>"uncertain loss"</span>: there is 1
+     inactivating mutation in the middle 80% of the CDS, but evidence is not
+     strong enough to classify the transcript as lost. These transcripts may
+     or may not encode a functional protein.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:red;'>&nbsp;</span>
+    <span style='color:red'>"lost"</span>: typically several inactivating
+     mutations are present, thus there is strong evidence that the transcript
+     is unlikely to encode a functional protein.</li>
+</ul>
+</p>
+<p>
+Clicking on a transcript provides additional information about the orthology
+classification, inactivating mutations, the protein sequence and protein/exon
+alignments.
+</p>
 
 <h2>Credits</h2>
 <p>
 This data was prepared by the <a href='https://tbg.senckenberg.de/hillerlab/'
 target=_blank>Michael Hiller Lab</a>
 </p>
 
 <h2>References</h2>
 <p>
 The <b>TOGA</b> software is available from
 <a href='https://github.com/hillerlab/TOGA'
 target=_blank>github.com/hillerlab/TOGA</a>
 </p>
 
 <p>