src/hg/makeDb/trackDb/human/encodeEgaspFull.html 8c2f7318d8d821de9b2a25750586a94ab5e8c1bb

8c2f7318d8d821de9b2a25750586a94ab5e8c1bb
lrnassar
  Fri Nov 15 18:50:19 2024 -0800
Giving the UI link cronjob some love by fixing all the 301 redirects. These are the bulk of the items listed on the cron. No RM.

diff --git src/hg/makeDb/trackDb/human/encodeEgaspFull.html src/hg/makeDb/trackDb/human/encodeEgaspFull.html
index bc1708c..f8e1a08 100644
--- src/hg/makeDb/trackDb/human/encodeEgaspFull.html
+++ src/hg/makeDb/trackDb/human/encodeEgaspFull.html
@@ -1,286 +1,286 @@
 <H2>Description</H2>
 <P>
 This track shows full sets of gene predictions covering all 44 ENCODE regions 
 originally submitted for the ENCODE Gene Annotation Assessment Project 
 (<A HREF="https://genome.crg.es/gencode/workshop2005.html"
 TARGET=_blank>EGASP</A>) Gene Prediction Workshop 2005. 
 The following gene predictions are included:
 <UL>
 <LI>
 <A HREF="https://www.ncbi.nlm.nih.gov/ieb/research/acembly/" 
 TARGET=_blank>AceView</A></LI>
 <LI>
 DOGFISH-C</LI>
 <LI>
 <A HREF="http://www.ensembl.org" TARGET=_blank>Ensembl</A></LI>
 <LI>
 <A HREF="https://genomebiology.biomedcentral.com/articles/10.1186/gb-2006-7-s1-s7"
 TARGET=_blank>Exogean</A></LI>
 <LI>
 <A HREF="http://compbio.fmph.uniba.sk/exonhunter/"
 TARGET=_blank>ExonHunter</A></LI>
 <LI>
 <A HREF="http://linux1.softberry.com/berry.phtml" TARGET=_blank>Fgenesh Pseudogenes</A></LI>
 <LI>
 <A HREF="http://linux1.softberry.com/berry.phtml" TARGET=_blank>Fgenesh++</A></LI>
 <LI>
 <A HREF="https://genome.crg.es/software/geneid/index.html" 
 TARGET=_blank>GeneID-U12</A> </LI>
 <LI>
 <A HREF="http://opal.biology.gatech.edu/GeneMark/" 
 TARGET=_blank>GeneMark</A></LI>
 <LI>
-<A HREF="http://www.cbcb.umd.edu/software/jigsaw/" TARGET=_blank>JIGSAW</A></LI>
+<A HREF="https://www.cbcb.umd.edu/software/jigsaw/" TARGET=_blank>JIGSAW</A></LI>
 <LI>
 <A HREF="https://mblab.wustl.edu/software/" TARGET=_blank>Pairagon/N-SCAN</A></LI>
 <LI>
 <A HREF="https://genome.crg.es/software/sgp2/index.html" 
 TARGET=_blank>SGP2-U12</A></LI>
 <LI>
 SPIDA</LI>
 <LI>
 <A HREF="https://mblab.wustl.edu/" TARGET=_blank>Twinscan-MARS</A></LI>
 </UL>
 The EGASP Partial companion track shows original gene prediction submissions 
 for a partial set of the 44 ENCODE regions; the EGASP Update track 
 shows updated versions of the submitted predictions.  These annotations
 were originally produced using the hg17 assembly.  </P>
 
 <H2>Display Conventions and Configuration</H2>
 <P>
 Data for each gene prediction method within this composite annotation track 
 are displayed in a separate subtrack.  See the top of the track description 
 page for configuration options allowing display of selected subsets of gene
 predictions.  To remove a subtrack from the display,
 uncheck the appropriate box.
 <P>
 The individual subtracks within this annotation follow the display conventions 
 for <A HREF="../goldenPath/help/hgTracksHelp.html#GeneDisplay">gene prediction
 tracks</A>. Display characteristics specific to individual subtracks are 
 described in the Methods section. The track description page offers the option 
 to color and label codons in a zoomed-in display of the subtracks to facilitate 
 validation and comparison of gene predictions. To enable this feature, select 
 the <em>genomic codons</em> option from the &quot;Color track by codons&quot;
 menu. Click the
 <A HREF="../goldenPath/help/hgCodonColoring.html">Help on codon coloring</A>
 link for more information about this feature. </P>
 <P>
 Color differences among the subtracks are arbitrary. They provide a
 visual cue for distinguishing the different gene prediction methods.</P>
 
 <H2>Methods</H2>
 
 <H3>AceView</H3>
 <P>
 These annotations were generated using AceView. All mRNAs
 and cDNAs available in GenBank, excluding NMs, were co-aligned on the Gencode
 sections. The results were then examined and filtered to resemble Havana. 
 The very restrictive view of Havana on CDS was not reproduced, due to a lack of
 experimental data. </P>
 
 <H3>DOGFISH-C</H3>
 <P>
 Candidate splice sites and coding starts/stops were evaluated using DNA
 alignments between the human assembly and seven other vertebrate species 
 (UCSC multiz alignments, adding the frog and removing the chimp). Genes
 (single transcripts only) were then predicted using dynamic programming.</P>
 
 <H3>Ensembl</H3>
 <P>
 The Ensembl annotation includes two types of predictions: protein-coding 
 genes  (the Ensembl Gene Predictions subtrack)
 and  pseudogenes of protein-coding genes 
 (the Ensembl Pseudogene Predictions subtrack). 
 The Ensembl Pseudo track is not intended as a comprehensive annotation of 
 pseudogenes, but rather
 an attempt to identify and label those gene predictions made by the Ensembl 
 pipeline that have pseudogene characteristics. Exons that lie partially outside 
 the ENCODE region are not included in the data set. The &quot;Alternate 
 Name&quot; field on the subtrack details page shows the Ensembl ID for the 
 selected gene or transcript.  </P>
 
 <H3>ExonHunter</H3>
 <P>
 ExonHunter is a comprehensive gene-finder based on hidden Markov models (HMMs)
 allowing the use of a variety of additional sources of information (ESTs, 
 proteins, genome-genome comparisons). </P>
 
 <H3>Exogean</H3>
 <P>
 Exogean annotates protein coding genes by combining mRNA and cross-species
 protein alignments in directed acyclic colored multigraphs where nodes and
 edges respectively represent biological objects and human expertise.
 Additional predictions and methods for this subtrack are available in the
 EGASP Updates track.</P>
 
 <H3>Fgenesh Pseudogenes</H3>
 <P>
 Fgenesh is an HMM gene structure prediction program.
 This data set shows predictions of potential pseudogenes.</P> 
 
 <H3>Fgenesh++</H3>
 <P>
 These gene predictions were generated by Fgenesh++, a gene-finding program that
 uses both HMMs and protein similarity to find 
 genes in a completely automated manner. </P>
 
 <H3>GeneID-U12</H3>
 <P>
 The GeneID-U12 gene prediction set, generated using a version of GeneID modified
 to detect U12-dependent introns (both GT-AG and AT-AC subtypes) when present,
 employs a single-genome <em>ab initio</em> method.
 This modified version of GeneID uses matrices for U12 donor,
 acceptor and branch sites constructed from examples of published U12 
 intron splice junctions 
 (both experimentally confirmed and expressed-sequence-validated predictions). 
 Two GeneID-U12 subtracks are 
 included: GeneID Gene Predictions and GeneID U12 Intron Predictions. The U12
 splice sites for features in the U12 Intron Predictions track are displayed
 on the track details pages. 
 Additional predictions and methods for this subtrack are available in the
 EGASP Updates track.</P>
 
 <H3>GeneMark</H3>
 <P>
 The eukaryotic version of the GeneMark.hmm (release 2.2) gene prediction
 program utilizes the HMM statistical model with duration or hidden
 semi-Markov model (HSMM). The HMM includes hidden states for initial, 
 internal and terminal exons, introns, intergenic regions and single exon genes. 
 It also includes the &quot;border&quot; states, such as start site (initiation 
 codon), stop site (termination codons), and donor and acceptor splice sites. 
 Sequences of all protein-coding regions were modeled by three periodic 
 inhomogeneous Markov chains; sequences of non-coding regions were modeled by 
 homogeneous Markov chains.  Nucleotide sequences corresponding to the site 
 states were modeled by position-specific inhomogeneous Markov chains. 
 Parameters of the gene models were derived from the set of genes obtained by 
 cDNA mapping to genomic DNA. To reflect variations in G+C composition of the
 genome, the gene model parameters were estimated separately for the three G+C 
 regions. </P>
 
 <H3>JIGSAW</H3>
 <P>
 JIGSAW uses the output from gene-finders, splice-site prediction programs and 
 sequence alignments to predict gene models. Annotation data downloaded from 
 the UCSC Genome Browser and TIGR gene-finder output was used as input for these
 predictions. JIGSAW predicts both partial and complete genes. 
 Additional predictions and methods for this subtrack are available in the
 EGASP Updates track.</P>
 
 <H3>Pairagon/N-SCAN</H3>
 <P>
 The pairHMM-based alignment program, Pairagon, was used to align
 high-quality mRNA sequences to the ENCODE regions.  These were
 supplemented with N-SCAN EST predictions which are displayed in the
 Pairgn/NSCAN-E subtrack, and extended further with additional
 transcripts from the Brent Lab to produce the predictions
 displayed as the Pairgn/NSCAN-E/+ subtrack. The NSCAN subtrack 
 contains only predictions from the N-SCAN program. 
 </P> 
 
 <H3>SGP2-U12</H3>
 <P>
 The SGP2-U12 gene prediction set, generated using a version of GeneID modified 
 to detect U12-dependent introns (both AT-AC and GT-AG subtypes) when present,
 employs a dual-genome method (SGP2) that utilizes similarity (tblastx) to 
 mouse genomic sequence syntenic to the ENCODE regions (Oct. 2004 MSA freeze). 
 This modified version of GeneID uses matrices for U12 
 donor, acceptor and branch sites constructed from examples of published U12 
 intron splice junctions (both experimentally confirmed and 
 expressed-sequence-validated predictions). Two SGP2-U12 subtracks are 
 included: SGP2 Gene Predictions and SGP2 U12 Intron Predictions.
 The U12 splice sites for features in the U12 Intron Predictions track are 
 displayed on the track details pages. 
 Additional predictions and methods for this subtrack are available in the
 EGASP Updates track.</P>
 
 <H3>SPIDA</H3>
 <P>
 This exon-only prediction set was produced using SPIDA (Substitution Periodicity
 Index and Domain Analysis). Exons derived by mapping ESTs to the genome were
 validated by seeking periodic substitution patterns in the aligned informant 
 DNA sequences. First, all
 available ESTs were mapped to the genome using Exonerate. The resulting
 transcript structures were &quot;flattened&quot; to remove redundancy. Each 
 exon of the flattened transcripts was subjected to SPI analysis, which involves
 identifying periodicity in the pattern of mutations occurring between the human
 and an informant species DNA sequence (the informant sequences and their TBA
 alignments were provided by  Elliott Margulies). SPI was calculated for all 
 available human-informant pairs for whole exons and in a sliding 48 bp window. 
 SPI analysis requires that a threshold level of periodicity be identified in at
 least two of the informant species if the exon is to be accepted. If accepted,
 SPI provides the correct frame for translation of the exon. This exon was used 
 as a starting point for extending the ORF coding region of the flattened
 transcript from which it came. This gave a full or partial CDS; different exons
 may give different CDSs. The CDSs were translated and searched for domains using
 hmmpfam and Pfam_fs. Only transcripts with a domain hit with e > 1.0 were
 retained. Heuristics were applied to the retained CDSs to identify problems with
 the transcript structure, particularly frame-shifts. Many transcripts may
 identify the same exon, but only a single instance of each exon has been 
 retained. </P>
 
 <H3>Twinscan-MARS</H3>
 <P>
 This gene prediction set was produced by a version of Twinscan that employs 
 multiple pairwise genome comparisons to identify protein-coding genes (including
 alternative splices) using nucleotide homology information. No expression or 
 protein data were used.</P>
 
 <H2>Credits</H2>
 <P>
 The following individuals and institutions provided the data for the subtracks 
 in this annotation:
 <UL>
 <LI>
 <B>AceView:</B> Danielle and Jean Thierry-Mieg, 
 <A HREF="https://www.ncbi.nlm.nih.gov/" TARGET=_blank>NCBI</A>, National 
 Institutes of Health.
 <LI>
 <B>DOGFISH-C:</B> David Carter, Informatics Dept., 
 <A HREF="https://www.sanger.ac.uk/" TARGET=_blank>Wellcome Trust Sanger 
 Institute</A>.
 <LI>
 <B>Ensembl:</B> Stephen Searle, Wellcome Trust Sanger Institute (joint 
 <A HREF="https://www.sanger.ac.uk/" 
 TARGET=_blank>Sanger</A>/<A HREF="https://www.ebi.ac.uk/"
 TARGET=_blank>EBI</A> project).
 <LI>
 <B>Exogean:</B> Sarah Djebali, Dyogen Lab, 
 <A HREF="https://www.ens.psl.eu/?lang=en" TARGET=_blank>Ecole Normale 
 Supérieure</A> (Paris, France). 
 <LI>
 <B>ExonHunter:</B> Tomas Vinar, <A HREF="https://cs.uwaterloo.ca/research/research-areas/bioinformatics"
 TARGET=_blank>Waterloo Bioinformatics</A>, School of Computer Science, 
 University of Waterloo.
 <LI>
 <B>Fgenesh, Fgenesh++:</B> Victor Solovyev, 
 <A HREF="https://www.royalholloway.ac.uk/computerscience/home.aspx"
 TARGET=_blank>Department of Computer Science</A>,
 Royal Holloway, London University.
 <LI>
 <B>GeneID-U12, SGP2-U12:</B> Tyler Alioto, 
 Grup de Recerca en Informàtica Biomèdica 
-(<A HREF="http://grib.imim.es" TARGET=_blank>GRIB</A>) at 
+(<A HREF="https://grib.upf.edu/" TARGET=_blank>GRIB</A>) at 
 the Institut Municipal d'Investigació Mèdica (IMIM), Barcelona.
 <LI>
 <B>GeneMark:</B> Mark Borodovsky, Alex Lomsadze and Alexander Lukashin, 
 <A HREF="https://biosciences.gatech.edu/" TARGET=_blank>Department of 
 Biology</A>, Georgia Institute of Technology.
 <LI>
 <B>JIGSAW:</B>  Jonathan Allen,  Steven Salzberg group, The Institute for 
 Genomic Research (<A HREF="https://www.jcvi.org/" TARGET=_blank>TIGR</A>)
 and the Center for Bioinformatics and Computational Biology 
-(<A HREF="http://www.cbcb.umd.edu" TARGET=_blank>CBCB</A>) at the 
+(<A HREF="https://www.cbcb.umd.edu/" TARGET=_blank>CBCB</A>) at the 
 University of Maryland, College Park.
 <LI>
-<B>Pairagon/N-SCAN:</B> Randall Brown, <A HREF="http://genetics.wustl.edu/"
+<B>Pairagon/N-SCAN:</B> Randall Brown, <A HREF="https://genetics.wustl.edu/"
 TARGET=_blank>Laboratory for Computational Genomics</A>, Washington University
 in St. Louis.
 <LI>
 <B>SPIDA:</B> Damian Keefe, Birney Group, <A HREF="https://www.ebi.ac.uk/"
 TARGET=_blank>EMBL-EBI</A>.
 <LI>
 <B>Twinscan:</B> Paul Flicek, Brent Lab, 
 <A HREF="https://mblab.wustl.edu/" TARGET=_blank>Washington University
 in St. Louis</A>.
 </UL></P>