src/hg/makeDb/trackDb/t2g.html 1.3
1.3 2010/05/21 23:25:39 hiram
fixing error
Index: src/hg/makeDb/trackDb/t2g.html
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/hg/makeDb/trackDb/t2g.html,v
retrieving revision 1.2
retrieving revision 1.3
diff -b -B -U 1000000 -r1.2 -r1.3
--- src/hg/makeDb/trackDb/t2g.html 21 May 2010 23:18:25 -0000 1.2
+++ src/hg/makeDb/trackDb/t2g.html 21 May 2010 23:25:39 -0000 1.3
@@ -1,61 +1,62 @@
<H2>Description</H2>
<P>
This track indicates the location of sequences in publications
-mapped back to the genome. It is based on data from >22.000 articles with DNA
-sequences from the <A HREF="http://www.pubmedcentral.com/"
-TARGET=_blank>Pubmed Central</A> <A HREF="http://www.ncbi.nlm.nih.gov/pmc/about/openftlist.html"
+mapped back to the genome. It is based on data from more than 22.000 articles
+with DNA sequences from the <A HREF="http://www.pubmedcentral.com/"
+TARGET=_blank>Pubmed Central</A>
+<A HREF="http://www.ncbi.nlm.nih.gov/pmc/about/openftlist.html"
TARGET=_blank>Open-Access archive</A>, which consists of ~130.000 free
research articles (Feb 2010)</P>
<H2>Methods</H2>
<P>
Articles were downloaded from PubMed Central. Depending on availability, XML,
raw ASCII or text extracted from PDFs was used. The results were then
processed by the <A HREF="http://sourceforge.net/projects/text2genome/"
TARGET=_blank>text2genome.org</A> software. It searches for stretches of
separated nucleotide-like letters that are longer than 19bp or words that
contain more than 40% nucleotide-like letters. The DNA resulting sequences
were mapped with BLAST to all genomes that are part of
Ensembl/EnsemblGenomes version 56 and filtered with the text2genome pipeline:
<UL>
<LI>Hits to NCBI Univec are completely removed</LI>
<LI>Only matches on the most plausible genome are kept. This is the
genome with the most matching sequences which either is mentioned in the
text and recognized by <A HREF="http://www.sf.net/projects/linnaeus/"
TARGET=_blank>LINNAEUS</A> or a well-known model organism.</LI>
<LI>Hits from the same paper that are closer than 50kbp are
chained (shown as exon-blocks on the browser)</LI>
<LI>Non-unique hits are only kept in the chain with the most members</LI>
</UL>
</P>
<H2>Credits</H2>
<P>
Data was processed by Maximilian Haussler, Martin Gerner and Casey Bergman.
Import into UCSC by Hiram Clawson. For questions or feedback on this data
track, please send an email to
<A HREF="mailto:text2genome@manchester.
ac.
uk">
text2genome@manchester.
ac.
uk</A>.
</P>
<H2>References</H2>
<P>
Haeussler M, Bergman CM. Annotating genes and genomes with sequences
extracted from biomedical articles, <em>in prep.</em>, see also
<A HREF="http://text2genome.org/" TARGET=_blank>www.text2genome.org</A>
</P>
<P>
Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJM,
Montgomery SB, Bergman CM, The Open Regulatory Annotation Consortium.
<A HREF="http://www.ncbi.nlm.nih.gov/pubmed/18271954"
TARGET=_blank>Text-mining assisted regulatory annotation</A>.
Genome Biol. 2008;9(2):R31.
</P>