src/hg/makeDb/trackDb/transMapTailerV5.html bb98a72134bf91feda97ed3cb163a7b6880f2eeb

bb98a72134bf91feda97ed3cb163a7b6880f2eeb
lrnassar
  Fri Jul 5 10:41:19 2019 -0700
Adding versioning files for transMap track #23729

diff --git src/hg/makeDb/trackDb/transMapTailerV5.html src/hg/makeDb/trackDb/transMapTailerV5.html
new file mode 100644
index 0000000..2d96382
--- /dev/null
+++ src/hg/makeDb/trackDb/transMapTailerV5.html
@@ -0,0 +1,139 @@
+
+<H2>Display Conventions and Configuration</H2>
+
+<P>
+This track follows the display conventions for 
+<A HREF="../goldenPath/help/hgTracksHelp.html#PSLDisplay" 
+TARGET=_blank>PSL alignment tracks</A>. </P>
+<P>
+This track may also be configured to display codon coloring, a feature that
+allows the user to quickly compare cDNAs against the genomic sequence. For more 
+information about this option, click 
+<A HREF="../goldenPath/help/hgCodonColoringMrna.html" TARGET=_blank>here</A>.
+Several types of alignment gap may also be colored; 
+for more information, click 
+<A HREF="../goldenPath/help/hgIndelDisplay.html" TARGET=_blank>here</A>.
+
+<H2>Methods</H2>
+
+<P>
+  <ol>
+    <li> Source transcript alignments were obtained from vertebrate organisms
+    in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank 
+    mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes,
+    were used as available.
+    <li> For all vertebrate assemblies that had BLASTZ alignment chains and
+      nets to the $organism ($db) genome, a subset of the alignment chains were
+      selected as follows:
+      <ul>
+      <li> For organisms whose branch distance was no more than 0.5
+        (as computed by <tt>phyloFit</tt>, see Conservation track description for details),
+        syntenic filtering was used.  Reciprocal best nets were used if available;
+        otherwise, nets were selected with the <tt>netfilter -syn</tt> command.
+        The chains corresponding to the selected nets were used for mapping.
+      <li> For more distant species, where the determination of synteny is difficult,
+        the full set of chains was used for mapping. This allows for more genes to
+        map at the expense of some mapping to paralogous regions.  The
+        post-alignment filtering step removes some of the duplications.
+    </ul>
+    <li> The <tt>pslMap</tt> program was used to do a base-level projection of
+      the source transcript alignments via the selected chains
+      to the $organism genome, resulting in pairwise alignments of the source transcripts to
+      the genome.
+    <li> The resulting alignments were filtered with <tt>pslCDnaFilter</tt>
+      with a global near-best criteria of 0.5% in finished genomes
+      (human and mouse) and 1.0% in other genomes.  Alignments
+      where less than 20% of the transcript mapped were discarded.
+  </ol>
+</P>
+
+<P>
+To ensure unique identifiers for each alignment, cDNA and gene accessions were
+made unique by appending a suffix for each location in the source genome and
+again for each mapped location in the destination genome.  The format is:
+<pre>
+   accession.version-srcUniq.destUniq
+</pre>
+
+Where <tt>srcUniq</tt> is a number added to make each source alignment unique, and
+<tt>destUniq</tt> is added to give the subsequent TransMap alignments unique
+identifiers.
+</P>
+<P>
+For example, in the cow genome, there are two alignments of mRNA <tt>BC149621.1</tt>.
+These are assigned the identifiers <tt>BC149621.1-1</tt> and <tt>BC149621.1-2</tt>.
+When these are mapped to the human genome, <tt>BC149621.1-1</tt> maps to a single
+location and is given the identifier <tt>BC149621.1-1.1</tt>.  However, <tt>BC149621.1-2</tt>
+maps to two locations, resulting in <tt>BC149621.1-2.1</tt> and <tt>BC149621.1-2.2</tt>.  Note
+that multiple TransMap mappings are usually the result of tandem duplications, where both
+chains are identified as syntenic.
+</P>
+
+<h2>Data Access</h2>
+
+<p>
+The raw data for these tracks can be accessed interactively through the
+<a href="hgTables">Table Browser</a> or the
+<a href="hgIntegrator">Data Integrator</a>.
+For automated analysis, the annotations are stored in
+<a href="../goldenPath/help/bigPsl.html">bigPsl</a> files (containing a
+number of extra columns) and can be downloaded from our
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/transMap/">download server</a>, 
+or queried using our <a href="../goldenPath/help/api.html">API</a>. For more 
+information on accessing track data see our 
+<a href="../FAQ/FAQdownloads.html#download36">Track Data Access FAQ</a>.
+The files are associated with these tracks in the following way:
+<ul>
+<li>TransMap Ensembl - <tt>$db.ensembl.transMapV5.bigPsl</tt></li>
+<li>TransMap RefGene - <tt>$db.refseq.transMapV5.bigPsl</tt></li>
+<li>TransMap RNA - <tt>$db.rna.transMapV5.bigPsl</tt></li>
+<li>TransMap ESTs - <tt>$db.est.transMapV5.bigPsl</tt></li>
+</ul>
+Individual regions or the whole genome annotation can be obtained using our tool
+<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as
+a precompiled binary for your system. Instructions for downloading source code and
+binaries can be found
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.
+The tool can also be used to obtain only features within a given range, for example:
+<p><tt>
+bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/$db/transMap/V5/$db.refseq.transMapV5.bigPsl
+-chrom=chr6 -start=0 -end=1000000 stdout
+</tt>
+
+<H2>Credits</H2>
+
+<P>
+This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data
+submitted to the international public sequence databases by 
+scientists worldwide and annotations produced by the RefSeq,
+Ensembl, and GENCODE annotations projects.</P>
+
+<H2>References</H2>
+<p>
+Siepel A, Diekhans M, Brejov&#225; B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S,
+Lau C <em>et al</em>.
+<a href="https://genome.cshlp.org/content/17/12/1763.long" target="_blank">
+Targeted discovery of novel human exons by comparative genomics</a>.
+<em>Genome Res</em>. 2007 Dec;17(12):1763-73.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/17989246" target="_blank">17989246</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099585/" target="_blank">PMC2099585</a>
+</p>
+
+<p>
+Stanke M, Diekhans M, Baertsch R, Haussler D.
+<a href="https://academic.oup.com/bioinformatics/article/24/5/637/202844/Using-native-and-syntenically-mapped-cDNA"
+target="_blank">
+Using native and syntenically mapped cDNA alignments to improve de novo gene finding</a>.
+<em>Bioinformatics</em>. 2008 Mar 1;24(5):637-44.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/18218656" target="_blank">18218656</a>
+</p>
+
+<p>
+Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D.
+<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030247"
+target="_blank">
+Comparative genomics search for losses of long-established genes on the human lineage</a>.
+<em>PLoS Comput Biol</em>. 2007 Dec;3(12):e247.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/18085818" target="_blank">18085818</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2134963/" target="_blank">PMC2134963</a>
+</p>