src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html cef070ed6c72e0247012855fa88e3e82144a7a9b

cef070ed6c72e0247012855fa88e3e82144a7a9b
markd
  Fri Nov 4 13:45:47 2022 -0700
added all gencode transcript ranks to the track descriptions

diff --git src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
index d1999be..7cf2a6a 100644
--- src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
+++ src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
@@ -32,30 +32,38 @@
         chromosomal coordinates.  When multiple PseudoPipe
         predictions map to a single RetroFinder prediction, only one match is kept
         for the 2-way consensus set.
     </li>
 </ul>
 
 <dl>
     <dt><i>PolyA</i></dt>
 </dl>
 <ul>
 <li><em>GENCODE PolyA</em> contains polyA signals and sites manually annotated on
     the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of
     transcripts containing at least 3 A's not matching the genome.</li>
 </ul>
 
+<p>
+<b>Maximum number of transcripts to display</b>
+is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks.
+Starting with the GENCODE human V42 and mouse VM31 releases, 
+transcripts are assigned rank within the gene. The ranks may be used to filter the number of transcripts
+displayed in a principled manner.  Transcript ranking is not available in the <em>lift37</em> releases.
+See <a href="#Methods">Methods</a> for details of rank assignment.
+</p>
 
 <p><b>Filtering</b> is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks
 using the following criteria:</p>
 <ul>
   <li> Transcript class: filter by the basic biological function of a transcript
     annotation
     <ul>
      <li> All - don't filter by transcript class</li>
      <li> coding - display protein coding transcripts, including polymorphic pseudogenes</li>
      <li> nonCoding - display non-protein coding transcripts</li>
      <li> pseudo - display pseudogene transcript annotations</li>
      <li> problem - display problem transcripts (Biotypes of <em>retained_intron</em>, <em>TEC</em>, or <em>disrupted_domain</em>)
    </ul>
   </li>
 
@@ -75,31 +83,31 @@
   <li> Transcript Biotype: filter transcripts by
        <a href="https://www.gencodegenes.org/pages/biotypes.html" target="_blank">Biotype</a></li>
   <li> Support Level: filter transcripts by <a href="#tsl">transcription support level</a></li>
 </ul>
 
 <p><b>Coloring</b> for the gene annotations is based on the annotation type: </p>
 <ul>
   <li><font color="#0c0c78"><b>coding</b></font> 
   <li><font color="#006400"><b>non-coding</b></font> 
   <li><font color="#ff33ff"><b>pseudogene</b></font> 
   <li><font color="#fe0000"><b>problem</b></font>
   <li><font color="#ff33ff"><b>all 2-way pseudogenes</b></font>
   <li><font color="#000000"><b>all polyA annotations</b></font>
 </ul>
 
-<h2>Methods</h2>
+<h2 id="Methods">Methods</h2>
 
 <p>
 The GENCODE project aims to annotate all evidence-based gene features on the 
 human and mouse reference sequence with high accuracy by integrating 
 computational approaches (including comparative methods), manual
 annotation and targeted experimental verification. This goal includes identifying 
 all protein-coding loci with associated alternative variants, non-coding
 loci which have transcript evidence, and pseudogenes. 
 For a detailed description of the methods and references used, see
 Harrow <em>et al.</em> (2006).
 </p>
 
 <p>
 <b><a name="basicSetSelection">GENCODE <em>Basic Set</em> selection:</a></b>
 The GENCODE <em>Basic Set</em> is intended to provide a simplified subset of
@@ -132,32 +140,65 @@
     problem transcript is included.
   </li>
 </ul>
 
 <P>
 <b>Non-coding transcript categorization:</b> 
 Non-coding transcripts are categorized using
 their <a href="https://www.gencodegenes.org/gencode_biotypes.html" target="_blank">Biotype</a>
 and the following criteria:
 </p>
 <ul>
   <li> well characterized: <em>antisense, Mt_rRNA, Mt_tRNA, miRNA, rRNA, snRNA, snoRNA</em></li>
   <li> poorly characterized: <em>3prime_overlapping_ncrna, lincRNA, misc_RNA, non_coding, processed_transcript, sense_intronic, sense_overlapping</em></li>
 </ul>
 
+<p><b>Transcript ranking:</b>
+Within each gene, transcripts have been ranked according to the 
+following criteria.  The ranking approach is preliminary and will
+change is future releases.
+</p>
+
+<ul>
+  <li> Protein_coding genes
+    <ol>
+      <li> MANE or Ensembl canonical<br>
+        -1st: MANE Select / Ensembl canonical<br>
+        -2nd: MANE Plus Clinical<br>
+      <li>Coding biotypes<br>
+        -1st: protein_coding and protein_coding_LoF<br>
+        -2nd: NMDs and NSDs<br>
+        -3rd: retained intron and protein_coding_CDS_not_defined<br>
+      <li>Completeness<br>
+        -1st: full length<br>
+        -2nd: CDS start/end not found<br>
+      <li> CARS score (only for coding transcripts)<br>
+      <li> Transcript genomic span and length (only for non-coding transcripts)<br>
+    </ol>
+<li> Non-coding genes
+  <ol>
+    <li> Transcript biotype<br>
+      1st: transcript biotype identical to gene biotype
+    <li> Ensembl canonical
+    <li> GENCODE basic
+    <li> Transcript genomic span
+    <li> Transcript length
+  </ol>
+</ul>
+
 <p>
-<b><a name="tsl">Transcription Support Level (TSL):</a></b>
+<a name="tsl"><b>Transcription Support Level (TSL):</b></a>
 It is important that users understand how to assess transcript annotations
 that they see in GENCODE. While some transcript models have a high level of
 support through the full length of their exon structure, there are also
 transcripts that are poorly supported and that should be considered
 speculative. The Transcription Support Level (TSL) is a method to highlight the
 well-supported and poorly-supported transcript models for users. The method
 relies on the primary data that can support full-length transcript
 structure: mRNA and EST alignments supplied by UCSC and Ensembl.</p>
 
 <p>The mRNA and EST alignments are compared to the GENCODE transcripts and the
 transcripts are scored according to how well the alignment matches over its
 full length. 
 The GENCODE TSL provides a consistent method of evaluating the
 level of support that a GENCODE transcript annotation is
 actually expressed in mouse.  Mouse transcript sequences from the