194efd4a95a8ac1704d5283bfd70fb616002bed8 dschmelt Wed Jul 10 17:13:21 2019 -0700 Making html edits for gencode VM21 #23792 diff --git src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html index 7acb3ff..cc5b884 100644 --- src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html +++ src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html @@ -11,36 +11,38 @@

The gene annotations in this view are divided into three subtracks:

GENCODE Basic set is a subset of the Comprehensive set. The selection criteria are described in the methods section.
GENCODE Comprehensive set contains all GENCODE coding and non-coding transcript annotations, including polymorphic pseudogenes. This includes both manual and automatic annotations. This is a super-set of the Basic set.
GENCODE Pseudogenes include all annotations except polymorphic pseudogenes.

2-way

GENCODE 2-way Pseudogenes contains pseudogenes predicted by both the Yale - Pseudopipe and UCSC Retrofinder pipelines. - The set was derived by looking for 50 base pairs +
GENCODE 2-way Pseudogenes contains pseudogenes predicted by both the + Yale + PseudoPipe and + + UCSC RetroFinder pipelines. The set was derived by looking for 50 base pairs of overlap between pseudogenes derived from both sets based on their - chromosomal coordinates. When multiple Pseudopipe - predictions map to a single Retrofinder prediction, only one match is kept + chromosomal coordinates. When multiple PseudoPipe + predictions map to a single RetroFinder prediction, only one match is kept for the 2-way consensus set.

PolyA

GENCODE PolyA contains polyA signals and sites manually annotated on the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of transcripts containing at least 3 A's not matching the genome.

Filtering is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks @@ -59,31 +61,31 @@

Transcript Annotation Method: filter by the method used to create the annotation

All - don't filter by transcript class
manual - display manually created annotations, including those that are also created automatically
automatic - display automatically created annotations, including those that are also created manually
manual_only - display manually created annotations that were not annotated by the automatic method
automatic_only - display automatically created annotations that were not annotated by the manual method

Transcript Biotype: filter transcripts by - biotype

+ Biotype

Support Level: filter transcripts by transcription support level

Coloring for the gene annotations is based on the annotation type:

coding
non-coding
pseudogene
problem
all 2-way pseudogenes
all polyA annotations

Methods

@@ -100,53 +102,53 @@

GENCODE Basic Set selection: The GENCODE Basic Set is intended to provide a simplified subset of the GENCODE transcript annotations that will be useful to the majority of users. The goal was to have a high-quality basic set that also covered all loci. Selection of GENCODE annotations for inclusion in the basic set was determined independently for the coding and non-coding transcripts at each gene locus.

Criteria for selection of coding transcripts (including polymorphic pseudogenes) at a given locus:
- All full-length coding transcripts (except problem transcripts or transcripts that are - nonsense-mediated decay) was included in the basic set.
- If there were no transcripts meeting the above criteria, then the partial coding transcript with the largest CDS was included in the basic set (excluding problem transcripts).
Criteria for selection of non-coding transcripts at a given locus:
- All full-length non-coding transcripts (except problem transcripts) - with a well characterized biotype (see below) were included in the + with a well characterized Biotype (see below) were included in the basic set.
- If there were no transcripts meeting the above criteria, then the largest non-coding transcript was included in the basic set (excluding problem transcripts).
If no transcripts were included by either the above criteria, the longest +
If no transcripts were included by either of the above criteria, the longest problem transcript is included.

Non-coding transcript categorization: Non-coding transcripts are categorized using -their biotype +their Biotype and the following criteria:

well characterized: antisense, Mt_rRNA, Mt_tRNA, miRNA, rRNA, snRNA, snoRNA
poorly characterized: 3prime_overlapping_ncrna, lincRNA, misc_RNA, non_coding, processed_transcript, sense_intronic, sense_overlapping

Transcription Support Level (TSL): It is important that users understand how to assess transcript annotations that they see in GENCODE. While some transcript models have a high level of support through the full length of their exon structure, there are also transcripts that are poorly supported and that should be considered speculative. The Transcription Support Level (TSL) is a method to highlight the well-supported and poorly-supported transcript models for users. The method