75711ca6eda92d718879304fde2251bcdedd8534 jcasper Sat Apr 12 17:18:29 2025 -0700 V48lift37 knownGene trackDb, refs #35187 diff --git src/hg/makeDb/trackDb/human/hg19/knownGeneV48lift37.html src/hg/makeDb/trackDb/human/hg19/knownGeneV48lift37.html new file mode 100644 index 00000000000..d5b6260d9ca --- /dev/null +++ src/hg/makeDb/trackDb/human/hg19/knownGeneV48lift37.html @@ -0,0 +1,167 @@ +<h2>Description</h2> +<p> +The GENCODE Genes track (version 48, April 2025) shows high-quality manual +annotations merged with evidence-based automated annotations across the entire +human genome generated by the +<a href="https://www.gencodegenes.org/" target="_blank">GENCODE project</a>. +By default, only the basic gene set is +displayed, which is a subset of the comprehensive gene set. The basic set represents transcripts +that GENCODE believes will be useful to the majority of users.</p> + +<p> +The track includes protein-coding genes, non-coding RNA genes, and pseudo-genes, though pseudo-genes +are not displayed by default. It contains annotations on the reference chromosomes as well as +assembly patches and alternative loci (haplotypes).</p> + +<p> +Statistics for the v48 release can be found in the +<a target="_blank" href="https://www.gencodegenes.org/human/stats_48.html">GENCODE site</a> for this build.</p> + +<p> +For more information on the different gene tracks, see our <a target="_blank" +href="/FAQ/FAQgenes.html">Genes FAQ</a>.</p> + +<h2>Display Conventions and Configuration</h2> +<p> +By default, this track displays only the basic GENCODE set, splice variants, and non-coding genes. +It includes options to display the entire GENCODE set and pseudogenes. To customize these +options, the respective boxes can be checked or unchecked at the top of this description page. + +<p> +This track also includes a variety of labels which identify the transcripts when visibility is set +to "full" or "pack". Gene symbols (e.g. NIPA1) are displayed by default, but +additional options include GENCODE Transcript ID (ENST00000561183.5), UCSC Known Gene ID +(uc001yve.4), UniProt Display ID (Q7RTP0). Additional information about gene +and transcript names can be found in our +<a target="_blank" href="/FAQ/FAQgenes.html#genename">FAQ</a>.</p> + +<p> +This track, in general, follows the display conventions for <a target="_blank" +href="../goldenPath/help/hgTracksHelp.html#GeneDisplay">gene prediction tracks</a>. The exons for +putative non-coding genes and untranslated regions are represented by relatively thin blocks, while +those for coding open reading frames are thicker. +<p><b>Coloring</b> for the gene annotations is based on the annotation type: </p> +<ul> + <li><font color="#0c0c78"><b>coding</b></font>: protein coding transcripts, including polymorphic + pseudogenes + <li><font color="#006400"><b>non-coding</b></font>: non-protein coding transcripts + <li><font color="#ff33ff"><b>pseudogene</b></font>: pseudogene transcript annotations + <li><font color="#fe0000"><b>problem</b></font>: problem transcripts (Biotypes of + retained_intron, TEC, or disrupted_domain)</li> +</ul> + +<p> +This track contains an optional <a target="_blank" +href="../goldenPath/help/hgCodonColoring.html">codon coloring feature</a> that allows users to +quickly validate and compare gene predictions. There is also an option to display the data as +a <a target="_blank" href="../goldenPath/help/hgWiggleTrackHelp.html">density graph</a>, which +can be helpful for visualizing the distribution of items over a region.</p> + +<a name="squishyPack"></a> +<h3>Squishy-pack Display</h3> +<p> +Within a gene using the <b>pack</b> display mode, transcripts below a specified rank will be +condensed into a view similar to <b>squish</b> mode. The <b>transcript ranking</b> approach is +preliminary and will change in future releases. The transcripts rankings are defined by the +following criteria for protein-coding and non-coding genes:</p> +<b>Protein_coding genes</b> +<ol> + <li>MANE or Ensembl canonical + <ul> + <li>1st: MANE Select / Ensembl canonical</li> + <li>2nd: MANE Plus Clinical</li> + </ul> + </li> + <li>Coding biotypes + <ul> + <li>1st: protein_coding and protein_coding_LoF</li> + <li>2nd: NMDs and NSDs</li> + <li>3rd: retained intron and protein_coding_CDS_not_defined</li> + </ul> + </li> + <li>Completeness + <ul> + <li>1st: full length</li> + <li>2nd: CDS start/end not found</li> + </ul> + </li> + <li>CARS score (only for coding transcripts)</li> + <li>Transcript genomic span and length (only for non-coding transcripts)</li> +</ol> +<b>Non-coding genes</b> +<ol> + <li> Transcript biotype + <ul> + <li>1st: transcript biotype identical to gene biotype</li> + </ul> + </li> + <li>Ensembl canonical</li> + <li>GENCODE basic</li> + <li>Transcript genomic span</li> + <li>Transcript length</li> +</ol> + + +<h2>Methods</h2> +<p> +The GENCODE v48 track was built from the <a href="https://www.gencodegenes.org/human/" +target="_blank">GENCODE downloads</a> file +<code>gencode.v48.chr_patch_hapl_scaff.annotation.gff3.gz</code>. Data from other sources +were correlated with the GENCODE data to build association tables. The lift to GRCh37/hg19 +made use of the lift mechanism described +<a href="ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/_README_GRCh37_mapping.txt" +target="_blank">here</a>.</p> + +<h2>Related Data</h2> +<p> +The GENCODE Genes transcripts are annotated in numerous tables, each of which is also available as a +<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/" target="_blank">downloadable +file</a>. + +<p> +One can see a full list of the associated tables in the <a href="/cgi-bin/hgTables" +target="_blank">Table Browser</a> by selecting GENCODE Genes from the <b>track</b> menu; this list +is then available on the <b>table</b> menu. +</ul> + +<h2>Data access</h2> +<p> +GENCODE Genes and its associated tables can be explored interactively using the +<a href="../goldenPath/help/api.html" target="_blank">REST API</a>, the +<a href="/cgi-bin/hgTables" target="_blank">Table Browser</a> or the +<a href="/cgi-bin/hgIntegrator" target="_blank">Data Integrator</a>. +The genePred format files for hg38 are available from our +<a target="_blank" href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/database/"> +downloads directory</a> or in our +<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/genes/" target="_blank"> +GTF download directory</a>. +All the tables can also be queried directly from our public MySQL +servers, with more information available on our +<a target="_blank" href="/goldenPath/help/mysql.html">help page</a> as well as on +<a target="_blank" href="http://genome.ucsc.edu/blog/tag/mysql/">our blog</a>.</p> + +<h2>Credits</h2> +<p> +The GENCODE Genes track was produced at UCSC from the GENCODE comprehensive gene set using a +computational pipeline developed by Jim Kent and Brian Raney. This version of the track was +generated by Jonathan Casper.</p> + +<h2>References</h2> + +<p> +Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, Loveland JE, Arnan C, +Barnes I, Bennett R <em>et al</em>. +<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkae1078" target="_blank"> +GENCODE 2025: reference gene annotation for human and mouse</a>. +<em>Nucleic Acids Res</em>. 2025 Jan 6;53(D1):D966-D975. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39565199" target="_blank">39565199</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701607/" target="_blank">PMC11701607</a> +</p> + +<p>A full list of GENCODE publications is available +at <a href="https://www.gencodegenes.org/pages/publications.html" target="_blank">The GENCODE +Project web site</a>. +</p> + +<h2>Data Release Policy</h2> +<p>GENCODE data are available for use without restrictions.</p>