e863249eae18325e041c2948722b98cb5a5643e4 markd Fri Nov 4 14:47:29 2022 -0700 Dropped GENCODE 2-way pseudogenes from latest releases. These should have never been included diff --git src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html index 7cf2a6a..e86dba6 100644 --- src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html +++ src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html @@ -8,46 +8,30 @@ <b>Views</b> available on this track are: <dl> <dt><i>Genes</i></dt> <dd> The gene annotations in this view are divided into three subtracks:</dd> </dl> <ul> <li><em>GENCODE Basic set</em> is a subset of the <em>Comprehensive set</em>. The selection criteria are described in the <a href="#basicSetSelection">methods section</a>.</li> <li><em>GENCODE Comprehensive set</em> contains all GENCODE coding and non-coding transcript annotations, including polymorphic pseudogenes. This includes both manual and automatic annotations. This is a super-set of the <em>Basic set</em>.</li> <li><em>GENCODE Pseudogenes</em> include all annotations except polymorphic pseudogenes.</li> </ul> <dl> - <dt><i>2-way</i></dt> -</dl> -<ul> - <li><em>GENCODE 2-way Pseudogenes</em> contains pseudogenes predicted by both the - <a href="https://academic.oup.com/bioinformatics/article-abstract/22/12/1437/207326">Yale - PseudoPipe</a> and - <a href="https://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-9-466"> - UCSC RetroFinder</a> pipelines. The set was derived by looking for 50 base pairs - of overlap between pseudogenes derived from both sets based on their - chromosomal coordinates. When multiple PseudoPipe - predictions map to a single RetroFinder prediction, only one match is kept - for the 2-way consensus set. - </li> -</ul> - -<dl> <dt><i>PolyA</i></dt> </dl> <ul> <li><em>GENCODE PolyA</em> contains polyA signals and sites manually annotated on the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of transcripts containing at least 3 A's not matching the genome.</li> </ul> <p> <b>Maximum number of transcripts to display</b> is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks. Starting with the GENCODE human V42 and mouse VM31 releases, transcripts are assigned rank within the gene. The ranks may be used to filter the number of transcripts displayed in a principled manner. Transcript ranking is not available in the <em>lift37</em> releases. See <a href="#Methods">Methods</a> for details of rank assignment. @@ -79,31 +63,30 @@ <li> automatic_only - display automatically created annotations that were not annotated by the manual method</li> </ul> </li> <li> Transcript Biotype: filter transcripts by <a href="https://www.gencodegenes.org/pages/biotypes.html" target="_blank">Biotype</a></li> <li> Support Level: filter transcripts by <a href="#tsl">transcription support level</a></li> </ul> <p><b>Coloring</b> for the gene annotations is based on the annotation type: </p> <ul> <li><font color="#0c0c78"><b>coding</b></font> <li><font color="#006400"><b>non-coding</b></font> <li><font color="#ff33ff"><b>pseudogene</b></font> <li><font color="#fe0000"><b>problem</b></font> - <li><font color="#ff33ff"><b>all 2-way pseudogenes</b></font> <li><font color="#000000"><b>all polyA annotations</b></font> </ul> <h2 id="Methods">Methods</h2> <p> The GENCODE project aims to annotate all evidence-based gene features on the human and mouse reference sequence with high accuracy by integrating computational approaches (including comparative methods), manual annotation and targeted experimental verification. This goal includes identifying all protein-coding loci with associated alternative variants, non-coding loci which have transcript evidence, and pseudogenes. For a detailed description of the methods and references used, see Harrow <em>et al.</em> (2006). </p>