194efd4a95a8ac1704d5283bfd70fb616002bed8 dschmelt Wed Jul 10 17:13:21 2019 -0700 Making html edits for gencode VM21 #23792 diff --git src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html index 7acb3ff..cc5b884 100644 --- src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html +++ src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html @@ -11,36 +11,38 @@ <dd> The gene annotations in this view are divided into three subtracks:</dd> </dl> <ul> <li><em>GENCODE Basic set</em> is a subset of the <em>Comprehensive set</em>. The selection criteria are described in the <a href="#basicSetSelection">methods section</a>.</li> <li><em>GENCODE Comprehensive set</em> contains all GENCODE coding and non-coding transcript annotations, including polymorphic pseudogenes. This includes both manual and automatic annotations. This is a super-set of the <em>Basic set</em>.</li> <li><em>GENCODE Pseudogenes</em> include all annotations except polymorphic pseudogenes.</li> </ul> <dl> <dt><i>2-way</i></dt> </dl> <ul> - <li><em>GENCODE 2-way Pseudogenes</em> contains pseudogenes predicted by both the Yale - Pseudopipe and UCSC Retrofinder pipelines. - The set was derived by looking for 50 base pairs + <li><em>GENCODE 2-way Pseudogenes</em> contains pseudogenes predicted by both the + <a href="https://academic.oup.com/bioinformatics/article-abstract/22/12/1437/207326">Yale + PseudoPipe</a> and + <a href="https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-466"> + UCSC RetroFinder</a> pipelines. The set was derived by looking for 50 base pairs of overlap between pseudogenes derived from both sets based on their - chromosomal coordinates. When multiple Pseudopipe - predictions map to a single Retrofinder prediction, only one match is kept + chromosomal coordinates. When multiple PseudoPipe + predictions map to a single RetroFinder prediction, only one match is kept for the 2-way consensus set. </li> </ul> <dl> <dt><i>PolyA</i></dt> </dl> <ul> <li><em>GENCODE PolyA</em> contains polyA signals and sites manually annotated on the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of transcripts containing at least 3 A's not matching the genome.</li> </ul> <p><b>Filtering</b> is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks @@ -59,31 +61,31 @@ <li> Transcript Annotation Method: filter by the method used to create the annotation <ul> <li> All - don't filter by transcript class</li> <li> manual - display manually created annotations, including those that are also created automatically</li> <li> automatic - display automatically created annotations, including those that are also created manually</li> <li> manual_only - display manually created annotations that were not annotated by the automatic method</li> <li> automatic_only - display automatically created annotations that were not annotated by the manual method</li> </ul> </li> <li> Transcript Biotype: filter transcripts by - <a href="http://www.gencodegenes.org/gencode_biotypes.html" target="_blank">biotype</a></li> + <a href="https://www.gencodegenes.org/pages/biotypes.html" target="_blank">Biotype</a></li> <li> Support Level: filter transcripts by <a href="#tsl">transcription support level</a></li> </ul> <p><b>Coloring</b> for the gene annotations is based on the annotation type: </p> <ul> <li><font color="#0c0c78"><b>coding</b></font> <li><font color="#006400"><b>non-coding</b></font> <li><font color="#ff33ff"><b>pseudogene</b></font> <li><font color="#fe0000"><b>problem</b></font> <li><font color="#ff33ff"><b>all 2-way pseudogenes</b></font> <li><font color="#000000"><b>all polyA annotations</b></font> </ul> <h2>Methods</h2> @@ -100,53 +102,53 @@ <p> <b><a name="basicSetSelection">GENCODE <em>Basic Set</em> selection:</a></b> The GENCODE <em>Basic Set</em> is intended to provide a simplified subset of the GENCODE transcript annotations that will be useful to the majority of users. The goal was to have a high-quality basic set that also covered all loci. Selection of GENCODE annotations for inclusion in the <em>basic set</em> was determined independently for the coding and non-coding transcripts at each gene locus. </p> <ul> <li> Criteria for selection of coding transcripts (including polymorphic pseudogenes) at a given locus: <ul> <li> All full-length coding transcripts (except problem transcripts or transcripts that are - nonsense-mediated decay) was included in the basic set.</li> + nonsense-mediated decay) were included in the basic set.</li> <li> If there were no transcripts meeting the above criteria, then the partial coding transcript with the largest CDS was included in the basic set (excluding problem transcripts).</li> </ul> </li> <li> Criteria for selection of non-coding transcripts at a given locus: <ul> <li> All full-length non-coding transcripts (except problem transcripts) - with a well characterized biotype (see below) were included in the + with a well characterized Biotype (see below) were included in the basic set.</li> <li> If there were no transcripts meeting the above criteria, then the largest non-coding transcript was included in the basic set (excluding problem transcripts).</li> </ul> </li> - <li> If no transcripts were included by either the above criteria, the longest + <li> If no transcripts were included by either of the above criteria, the longest problem transcript is included. </li> </ul> <P> <b>Non-coding transcript categorization:</b> Non-coding transcripts are categorized using -their <a href="http://www.gencodegenes.org/gencode_biotypes.html" target="_blank">biotype</a> +their <a href="http://www.gencodegenes.org/gencode_biotypes.html" target="_blank">Biotype</a> and the following criteria: </p> <ul> <li> well characterized: <em>antisense, Mt_rRNA, Mt_tRNA, miRNA, rRNA, snRNA, snoRNA</em></li> <li> poorly characterized: <em>3prime_overlapping_ncrna, lincRNA, misc_RNA, non_coding, processed_transcript, sense_intronic, sense_overlapping</em></li> </ul> <p> <b><a name="tsl">Transcription Support Level (TSL):</a></b> It is important that users understand how to assess transcript annotations that they see in GENCODE. While some transcript models have a high level of support through the full length of their exon structure, there are also transcripts that are poorly supported and that should be considered speculative. The Transcription Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users. The method