bb98a72134bf91feda97ed3cb163a7b6880f2eeb lrnassar Fri Jul 5 10:41:19 2019 -0700 Adding versioning files for transMap track #23729 diff --git src/hg/makeDb/trackDb/transMapTailerV5.html src/hg/makeDb/trackDb/transMapTailerV5.html new file mode 100644 index 0000000..2d96382 --- /dev/null +++ src/hg/makeDb/trackDb/transMapTailerV5.html @@ -0,0 +1,139 @@ + +<H2>Display Conventions and Configuration</H2> + +<P> +This track follows the display conventions for +<A HREF="../goldenPath/help/hgTracksHelp.html#PSLDisplay" +TARGET=_blank>PSL alignment tracks</A>. </P> +<P> +This track may also be configured to display codon coloring, a feature that +allows the user to quickly compare cDNAs against the genomic sequence. For more +information about this option, click +<A HREF="../goldenPath/help/hgCodonColoringMrna.html" TARGET=_blank>here</A>. +Several types of alignment gap may also be colored; +for more information, click +<A HREF="../goldenPath/help/hgIndelDisplay.html" TARGET=_blank>here</A>. + +<H2>Methods</H2> + +<P> + <ol> + <li> Source transcript alignments were obtained from vertebrate organisms + in the UCSC Genome Browser Database. BLAT alignments of RefSeq Genes, GenBank + mRNAs, and GenBank Spliced ESTs to the cognate genome, along with UCSC Genes, + were used as available. + <li> For all vertebrate assemblies that had BLASTZ alignment chains and + nets to the $organism ($db) genome, a subset of the alignment chains were + selected as follows: + <ul> + <li> For organisms whose branch distance was no more than 0.5 + (as computed by <tt>phyloFit</tt>, see Conservation track description for details), + syntenic filtering was used. Reciprocal best nets were used if available; + otherwise, nets were selected with the <tt>netfilter -syn</tt> command. + The chains corresponding to the selected nets were used for mapping. + <li> For more distant species, where the determination of synteny is difficult, + the full set of chains was used for mapping. This allows for more genes to + map at the expense of some mapping to paralogous regions. The + post-alignment filtering step removes some of the duplications. + </ul> + <li> The <tt>pslMap</tt> program was used to do a base-level projection of + the source transcript alignments via the selected chains + to the $organism genome, resulting in pairwise alignments of the source transcripts to + the genome. + <li> The resulting alignments were filtered with <tt>pslCDnaFilter</tt> + with a global near-best criteria of 0.5% in finished genomes + (human and mouse) and 1.0% in other genomes. Alignments + where less than 20% of the transcript mapped were discarded. + </ol> +</P> + +<P> +To ensure unique identifiers for each alignment, cDNA and gene accessions were +made unique by appending a suffix for each location in the source genome and +again for each mapped location in the destination genome. The format is: +<pre> + accession.version-srcUniq.destUniq +</pre> + +Where <tt>srcUniq</tt> is a number added to make each source alignment unique, and +<tt>destUniq</tt> is added to give the subsequent TransMap alignments unique +identifiers. +</P> +<P> +For example, in the cow genome, there are two alignments of mRNA <tt>BC149621.1</tt>. +These are assigned the identifiers <tt>BC149621.1-1</tt> and <tt>BC149621.1-2</tt>. +When these are mapped to the human genome, <tt>BC149621.1-1</tt> maps to a single +location and is given the identifier <tt>BC149621.1-1.1</tt>. However, <tt>BC149621.1-2</tt> +maps to two locations, resulting in <tt>BC149621.1-2.1</tt> and <tt>BC149621.1-2.2</tt>. Note +that multiple TransMap mappings are usually the result of tandem duplications, where both +chains are identified as syntenic. +</P> + +<h2>Data Access</h2> + +<p> +The raw data for these tracks can be accessed interactively through the +<a href="hgTables">Table Browser</a> or the +<a href="hgIntegrator">Data Integrator</a>. +For automated analysis, the annotations are stored in +<a href="../goldenPath/help/bigPsl.html">bigPsl</a> files (containing a +number of extra columns) and can be downloaded from our +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/transMap/">download server</a>, +or queried using our <a href="../goldenPath/help/api.html">API</a>. For more +information on accessing track data see our +<a href="../FAQ/FAQdownloads.html#download36">Track Data Access FAQ</a>. +The files are associated with these tracks in the following way: +<ul> +<li>TransMap Ensembl - <tt>$db.ensembl.transMapV5.bigPsl</tt></li> +<li>TransMap RefGene - <tt>$db.refseq.transMapV5.bigPsl</tt></li> +<li>TransMap RNA - <tt>$db.rna.transMapV5.bigPsl</tt></li> +<li>TransMap ESTs - <tt>$db.est.transMapV5.bigPsl</tt></li> +</ul> +Individual regions or the whole genome annotation can be obtained using our tool +<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as +a precompiled binary for your system. Instructions for downloading source code and +binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool can also be used to obtain only features within a given range, for example: +<p><tt> +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/$db/transMap/V5/$db.refseq.transMapV5.bigPsl +-chrom=chr6 -start=0 -end=1000000 stdout +</tt> + +<H2>Credits</H2> + +<P> +This track was produced by Mark Diekhans at UCSC from cDNA and EST sequence data +submitted to the international public sequence databases by +scientists worldwide and annotations produced by the RefSeq, +Ensembl, and GENCODE annotations projects.</P> + +<H2>References</H2> +<p> +Siepel A, Diekhans M, Brejová B, Langton L, Stevens M, Comstock CL, Davis C, Ewing B, Oommen S, +Lau C <em>et al</em>. +<a href="https://genome.cshlp.org/content/17/12/1763.long" target="_blank"> +Targeted discovery of novel human exons by comparative genomics</a>. +<em>Genome Res</em>. 2007 Dec;17(12):1763-73. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/17989246" target="_blank">17989246</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2099585/" target="_blank">PMC2099585</a> +</p> + +<p> +Stanke M, Diekhans M, Baertsch R, Haussler D. +<a href="https://academic.oup.com/bioinformatics/article/24/5/637/202844/Using-native-and-syntenically-mapped-cDNA" +target="_blank"> +Using native and syntenically mapped cDNA alignments to improve de novo gene finding</a>. +<em>Bioinformatics</em>. 2008 Mar 1;24(5):637-44. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/18218656" target="_blank">18218656</a> +</p> + +<p> +Zhu J, Sanborn JZ, Diekhans M, Lowe CB, Pringle TH, Haussler D. +<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.0030247" +target="_blank"> +Comparative genomics search for losses of long-established genes on the human lineage</a>. +<em>PLoS Comput Biol</em>. 2007 Dec;3(12):e247. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/18085818" target="_blank">18085818</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2134963/" target="_blank">PMC2134963</a> +</p>