18327e879f9df894313dd3100084c94cc6dd4136
hiram
  Fri Dec 5 11:03:40 2025 -0800
track definitions for TOGA version 2 refs #35776

diff --git src/hg/makeDb/trackDb/TOGAv2.html src/hg/makeDb/trackDb/TOGAv2.html
new file mode 100644
index 00000000000..88e73527fc1
--- /dev/null
+++ src/hg/makeDb/trackDb/TOGAv2.html
@@ -0,0 +1,85 @@
+<h2>Description</h2>
+<p>
+<b>TOGA version 2.0</b>
+(<b>T</b>ool to infer <b>O</b>rthologs from <b>G</b>enome <b>A</b>lignments)
+is a homology-based method that integrates gene annotation, inferring
+orthologs and classifying genes as intact or lost.
+</p>
+
+<h2>Methods</h2>
+<p>
+As input, <b>TOGA</b> uses a gene annotation of a reference species
+(human/hg38 for mammals, chicken/galGal6 for birds) and
+a whole genome alignment between the reference and query genome.
+</p>
+<p>
+<b>TOGA</b> implements a novel paradigm that relies on alignments of intronic
+and intergenic regions and uses machine learning to accurately distinguish
+orthologs from paralogs or processed pseudogenes.
+</p>
+<p>
+To annotate genes,
+<a href="https://academic.oup.com/bioinformatics/article/33/24/3985/4095639"
+target="blank">CESAR 2.0</a>
+is used to determine the positions and boundaries of coding exons of a
+reference transcript in the orthologous genomic locus in the query species.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each annotated transcript is shown in a color-coded classification as
+<ul>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:blue;'>&nbsp;</span>
+    <span style='color:blue'>"intact"</span>: middle 80% of the CDS
+    (coding sequence) is present and exhibits no gene-inactivating mutation.
+    These transcripts likely encode functional proteins.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:lightblue;'>&nbsp;</span>
+    <span style='color:#7193a0'>"partially intact"</span>: 50% of the CDS
+     is present in the query and the middle 80% of the CDS exhibits no
+     inactivating mutation. These transcripts may also encode functional
+     proteins, but the evidence is weaker as parts of the CDS are missing,
+     often due to assembly gaps.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:grey;'>&nbsp;</span>
+    <span style='color:grey'>"missing"</span>: &lt;50% of the CDS is present
+     in the query and the middle 80% of the CDS exhibits no inactivating
+     mutation.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:orange;'>&nbsp;</span>
+    <span style='color:orange'>"uncertain loss"</span>: there is 1
+     inactivating mutation in the middle 80% of the CDS, but evidence is not
+     strong enough to classify the transcript as lost. These transcripts may
+     or may not encode a functional protein.</li>
+<li><span style='display:inline-block; width:40px; height:15px; background-color:red;'>&nbsp;</span>
+    <span style='color:red'>"lost"</span>: typically several inactivating
+     mutations are present, thus there is strong evidence that the transcript
+     is unlikely to encode a functional protein.</li>
+</ul>
+</p>
+<p>
+Clicking on a transcript provides additional information about the orthology
+classification, inactivating mutations, the protein sequence and protein/exon
+alignments.
+</p>
+
+<h2>Credits</h2>
+<p>
+This data was prepared by the <a href="https://tbg.senckenberg.de/hillerlab/"
+target="_blank">Michael Hiller Lab</a>
+</p>
+
+<h2>References</h2>
+<p>
+The <b>TOGA</b> software is available from
+<a href="https://github.com/hillerlab/TOGA"
+target="_blank">github.com/hillerlab/TOGA</a>
+</p>
+
+<p>
+Kirilenko BM, Munegowda C, Osipova E, Jebb D, Sharma V, Blumer M, Morales AE, Ahmed AW, Kontopoulos
+DG, Hilgers L <em>et al</em>.
+<a href="https://www.science.org/doi/abs/10.1126/science.abn3107?url_ver=Z39.88-2003&amp;rfr_id=ori:
+rid:crossref.org&amp;rfr_dat=cr_pub%20%200pubmed" target="_blank">
+Integrating gene annotation with orthology inference at scale</a>.
+<em>Science</em>. 2023 Apr 28;380(6643):eabn3107.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37104600" target="_blank">37104600</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10193443/" target="_blank">PMC10193443</a>
+</p>