194efd4a95a8ac1704d5283bfd70fb616002bed8
dschmelt
Wed Jul 10 17:13:21 2019 -0700
Making html edits for gencode VM21 #23792
diff --git src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
index 7acb3ff..cc5b884 100644
--- src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
+++ src/hg/makeDb/trackDb/wgEncodeGencodeDisplay1.shared.html
@@ -11,36 +11,38 @@
The gene annotations in this view are divided into three subtracks:
- GENCODE Basic set is a subset of the Comprehensive set.
The selection criteria are described in the methods section.
- GENCODE Comprehensive set contains all GENCODE coding and non-coding transcript annotations,
including polymorphic pseudogenes. This includes both manual and
automatic annotations. This is a super-set of the Basic set.
- GENCODE Pseudogenes include all annotations except polymorphic pseudogenes.
- 2-way
- - GENCODE 2-way Pseudogenes contains pseudogenes predicted by both the Yale
- Pseudopipe and UCSC Retrofinder pipelines.
- The set was derived by looking for 50 base pairs
+
- GENCODE 2-way Pseudogenes contains pseudogenes predicted by both the
+ Yale
+ PseudoPipe and
+
+ UCSC RetroFinder pipelines. The set was derived by looking for 50 base pairs
of overlap between pseudogenes derived from both sets based on their
- chromosomal coordinates. When multiple Pseudopipe
- predictions map to a single Retrofinder prediction, only one match is kept
+ chromosomal coordinates. When multiple PseudoPipe
+ predictions map to a single RetroFinder prediction, only one match is kept
for the 2-way consensus set.
- PolyA
- GENCODE PolyA contains polyA signals and sites manually annotated on
the genome based on transcribed evidence (ESTs and cDNAs) of 3' end of
transcripts containing at least 3 A's not matching the genome.
Filtering is available for the items in the GENCODE Basic, Comprehensive and Pseudogene tracks
@@ -59,31 +61,31 @@
Transcript Annotation Method: filter by the method used to create the annotation
- All - don't filter by transcript class
- manual - display manually created annotations, including those that are
also created automatically
- automatic - display automatically created annotations, including those that are
also created manually
- manual_only - display manually created annotations that were
not annotated by the automatic method
- automatic_only - display automatically created annotations that were
not annotated by the manual method
Transcript Biotype: filter transcripts by
- biotype
+ Biotype
Support Level: filter transcripts by transcription support level
Coloring for the gene annotations is based on the annotation type:
- coding
- non-coding
- pseudogene
- problem
- all 2-way pseudogenes
- all polyA annotations
Methods
@@ -100,53 +102,53 @@
GENCODE Basic Set selection:
The GENCODE Basic Set is intended to provide a simplified subset of
the GENCODE transcript annotations that will be useful to the majority of
users. The goal was to have a high-quality basic set that also covered all loci.
Selection of GENCODE annotations for inclusion in the basic set
was determined independently for the coding and non-coding transcripts at each
gene locus.
- Criteria for selection of coding transcripts (including polymorphic pseudogenes) at a given
locus:
- All full-length coding transcripts (except problem transcripts or transcripts that are
- nonsense-mediated decay) was included in the basic set.
+ nonsense-mediated decay) were included in the basic set.
- If there were no transcripts meeting the above criteria, then the partial coding
transcript with the largest CDS was included in the basic set (excluding problem transcripts).
Criteria for selection of non-coding transcripts at a given locus:
- All full-length non-coding transcripts (except problem transcripts)
- with a well characterized biotype (see below) were included in the
+ with a well characterized Biotype (see below) were included in the
basic set.
- If there were no transcripts meeting the above criteria, then the largest non-coding
transcript was included in the basic set (excluding problem transcripts).
- If no transcripts were included by either the above criteria, the longest
+ If no transcripts were included by either of the above criteria, the longest
problem transcript is included.
Non-coding transcript categorization:
Non-coding transcripts are categorized using
-their biotype
+their Biotype
and the following criteria:
- well characterized: antisense, Mt_rRNA, Mt_tRNA, miRNA, rRNA, snRNA, snoRNA
- poorly characterized: 3prime_overlapping_ncrna, lincRNA, misc_RNA, non_coding, processed_transcript, sense_intronic, sense_overlapping
Transcription Support Level (TSL):
It is important that users understand how to assess transcript annotations
that they see in GENCODE. While some transcript models have a high level of
support through the full length of their exon structure, there are also
transcripts that are poorly supported and that should be considered
speculative. The Transcription Support Level (TSL) is a method to highlight the
well-supported and poorly-supported transcript models for users. The method