36c22bace06f07f8d4f59ef8a18659e926f9d410
jcasper
  Thu Apr 10 10:30:30 2025 -0700
Gencode V48 knownGene trackDb

diff --git src/hg/makeDb/trackDb/human/hg38/knownGeneV48.html src/hg/makeDb/trackDb/human/hg38/knownGeneV48.html
new file mode 100644
index 00000000000..4e1462b1675
--- /dev/null
+++ src/hg/makeDb/trackDb/human/hg38/knownGeneV48.html
@@ -0,0 +1,168 @@
+<h2>Description</h2>
+<p>
+The GENCODE Genes track (version 48, April 2025) shows high-quality manual
+annotations merged with evidence-based automated annotations across the entire
+human genome generated by the
+<a href="https://www.gencodegenes.org/" target="_blank">GENCODE project</a>.
+By default, only the basic gene set is
+displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts
+that GENCODE believes will be useful to the majority of users.</p>
+
+<p>
+The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes
+are not displayed by default. It contains annotations on the reference chromosomes as well as
+assembly patches and alternative loci (haplotypes).</p>
+
+<p>
+The v48 release was derived from the GTF file that contains annotations only on the main
+chromosomes. Statistics for this build and information on how they were generated can be found on
+the <a target="_blank"
+href="https://www.gencodegenes.org/human/stats_48.html">GENCODE site</a>.</p>
+
+<p>
+For more information on the different gene tracks, see our <a target="_blank"
+href="/FAQ/FAQgenes.html">Genes FAQ</a>.</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes.
+It includes options to display the entire GENCODE set and pseudogenes. To customize these
+options, the respective boxes can be checked or unchecked at the top of this description page. 
+
+<p>
+This track also includes a variety of labels which identify the transcripts when visibility is set
+to &quot;full&quot; or &quot;pack&quot;. Gene symbols (e.g. NIPA1) are displayed by default, but
+additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID
+(uc001yve.4), UniProt Display ID (Q7RTP0). Additional information about gene
+and transcript names can be found in our
+<a target="_blank" href="/FAQ/FAQgenes.html#genename">FAQ</a>.</p>
+
+<p>
+This track, in general, follows the display conventions for <a target="_blank"
+href="../goldenPath/help/hgTracksHelp.html#GeneDisplay">gene prediction tracks</a>. The exons for
+putative non-coding genes and untranslated regions are represented by relatively thin blocks, while
+those for coding open reading frames are thicker. 
+<p><b>Coloring</b> for the gene annotations is mostly based on the annotation type: </p>
+<ul>
+  <li><font color="#0C6DAD"><b>MANE</b></font>: MANE Select Plus Clinical transcripts.
+       For non-MANE transcripts, the following conventions apply.
+  <li><font color="#0c0c78"><b>coding</b></font>: protein coding transcripts, including polymorphic
+       pseudogenes
+  <li><font color="#006400"><b>non-coding</b></font>: non-protein coding transcripts
+  <li><font color="#ff33ff"><b>pseudogene</b></font>: pseudogene transcript annotations
+  <li><font color="#fe0000"><b>problem</b></font>: problem transcripts (Biotypes of
+       retained_intron, TEC, or disrupted_domain)</li>
+</ul>
+
+<p>
+This track contains an optional <a target="_blank"
+href="../goldenPath/help/hgCodonColoring.html">codon coloring feature</a> that allows users to
+quickly validate and compare gene predictions. There is also an option to display the data as
+a <a target="_blank" href="../goldenPath/help/hgWiggleTrackHelp.html">density graph</a>, which
+can be helpful for visualizing the distribution of items over a region.</p>
+
+<a name="squishyPack"></a>
+<h3>Squishy-pack Display</h3>
+<p>
+Within a gene using the <b>pack</b> display mode, transcripts below a specified rank will be
+condensed into a view similar to <b>squish</b> mode. The <b>transcript ranking</b> approach is
+preliminary and will change in future releases. The transcripts rankings are defined by the
+following criteria for protein-coding and non-coding genes:</p>
+<b>Protein_coding genes</b>
+<ol>
+  <li>MANE or Ensembl canonical
+    <ul>
+      <li>1st: MANE Select / Ensembl canonical</li>
+      <li>2nd: MANE Plus Clinical</li>
+    </ul>
+  </li>
+  <li>Coding biotypes
+    <ul>
+      <li>1st: protein_coding and protein_coding_LoF</li>
+      <li>2nd: NMDs and NSDs</li>
+      <li>3rd: retained intron and protein_coding_CDS_not_defined</li>
+    </ul>
+  </li>
+  <li>Completeness
+    <ul>
+      <li>1st: full length</li>
+      <li>2nd: CDS start/end not found</li>
+    </ul>
+  </li>
+  <li>CARS score (only for coding transcripts)</li>
+  <li>Transcript genomic span and length (only for non-coding transcripts)</li>
+</ol>
+<b>Non-coding genes</b>
+<ol>
+  <li> Transcript biotype
+    <ul>
+      <li>1st: transcript biotype identical to gene biotype</li>
+    </ul>
+  </li>
+  <li>Ensembl canonical</li>
+  <li>GENCODE basic</li>
+  <li>Transcript genomic span</li>
+  <li>Transcript length</li>
+</ol>
+
+
+<h2>Methods</h2>
+<p>
+The GENCODE v48 track was built from the <a href="https://www.gencodegenes.org/human/"
+target="_blank">GENCODE downloads</a> file 
+<code>gencode.v48.chr_patch_hapl_scaff.annotation.gff3.gz</code>. Data from other sources
+were correlated with the GENCODE data to build association tables.</p>
+
+<h2>Related Data</h2>
+<p>
+The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a
+<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/" target="_blank">downloadable
+file</a>.
+
+<p>
+One can see a full list of the associated tables in the <a href="/cgi-bin/hgTables"
+target="_blank">Table Browser</a> by selecting GENCODE Genes from the <b>track</b> menu; this list
+is then available on the <b>table</b> menu.
+</ul>
+
+<h2>Data access</h2>
+<p>
+GENCODE Genes and its associated tables can be explored interactively using the
+<a href="../goldenPath/help/api.html" target="_blank">REST API</a>, the
+<a href="/cgi-bin/hgTables" target="_blank">Table Browser</a> or the
+<a href="/cgi-bin/hgIntegrator" target="_blank">Data Integrator</a>. 
+The genePred format files for hg38 are available from our 
+<a target="_blank" href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/">
+downloads directory</a> or in our
+<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/" target="_blank">
+GTF download directory</a>. 
+All the tables can also be queried directly from our public MySQL
+servers, with more information available on our
+<a target="_blank" href="/goldenPath/help/mysql.html">help page</a> as well as on
+<a target="_blank" href="http://genome.ucsc.edu/blog/tag/mysql/">our blog</a>.</p>
+
+<h2>Credits</h2>
+<p>
+The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a
+computational pipeline developed by Jim Kent and Brian Raney.  This version of the track was
+generated by Jonathan Casper.</p>
+
+<h2>References</h2>
+
+<p>
+Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, Loveland JE, Arnan C,
+Barnes I, Bennett R <em>et al</em>.
+<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkae1078" target="_blank">
+GENCODE 2025: reference gene annotation for human and mouse</a>.
+<em>Nucleic Acids Res</em>. 2025 Jan 6;53(D1):D966-D975.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39565199" target="_blank">39565199</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701607/" target="_blank">PMC11701607</a>
+</p>
+
+<p>A full list of GENCODE publications is available
+at <a href="https://www.gencodegenes.org/pages/publications.html" target="_blank">The GENCODE
+Project web site</a>.
+</p>
+
+<h2>Data Release Policy</h2>
+<p>GENCODE data are available for use without restrictions.</p>