src/hg/makeDb/trackDb/t2g.html 1.3

1.3 2010/05/21 23:25:39 hiram
fixing error
Index: src/hg/makeDb/trackDb/t2g.html
===================================================================
RCS file: /projects/compbio/cvsroot/kent/src/hg/makeDb/trackDb/t2g.html,v
retrieving revision 1.2
retrieving revision 1.3
diff -b -B -U 1000000 -r1.2 -r1.3
--- src/hg/makeDb/trackDb/t2g.html	21 May 2010 23:18:25 -0000	1.2
+++ src/hg/makeDb/trackDb/t2g.html	21 May 2010 23:25:39 -0000	1.3
@@ -1,61 +1,62 @@
 <H2>Description</H2>
 <P>
 This track indicates the location of sequences in publications
-mapped back to the genome. It is based on data from &gt;22.000 articles with DNA
-sequences from the <A HREF="http://www.pubmedcentral.com/"
-TARGET=_blank>Pubmed Central</A> <A HREF="http://www.ncbi.nlm.nih.gov/pmc/about/openftlist.html"
+mapped back to the genome. It is based on data from more than 22.000 articles
+with DNA sequences from the <A HREF="http://www.pubmedcentral.com/"
+TARGET=_blank>Pubmed Central</A>
+<A HREF="http://www.ncbi.nlm.nih.gov/pmc/about/openftlist.html"
 TARGET=_blank>Open-Access archive</A>, which consists of ~130.000 free
 research articles (Feb 2010)</P>
 
 <H2>Methods</H2>
 <P>
 Articles were downloaded from PubMed Central. Depending on availability, XML,
 raw ASCII or text extracted from PDFs was used.  The results were then
 processed by the <A HREF="http://sourceforge.net/projects/text2genome/"
 TARGET=_blank>text2genome.org</A> software.  It searches for stretches of
 separated nucleotide-like letters that are longer than 19bp or words that
 contain more than 40% nucleotide-like letters. The DNA resulting sequences
 were mapped with BLAST to all genomes that are part of
 Ensembl/EnsemblGenomes version 56 and filtered with the text2genome pipeline: 
 
 <UL>
 <LI>Hits to NCBI Univec are completely removed</LI>
 <LI>Only matches on the most plausible genome are kept. This is the
 genome with the most matching sequences which either is mentioned in the
 text and recognized by <A HREF="http://www.sf.net/projects/linnaeus/"
 TARGET=_blank>LINNAEUS</A> or a well-known model organism.</LI>
 <LI>Hits from the same paper that are closer than 50kbp are
 chained (shown as exon-blocks on the browser)</LI>
 <LI>Non-unique hits are only kept in the chain with the most members</LI>
 </UL>
 
 </P>
 	
 <H2>Credits</H2>
 <P>
 Data was processed by Maximilian Haussler, Martin Gerner and Casey Bergman.
 Import into UCSC by Hiram Clawson. For questions or feedback on this data
 track, please send an email to
 <A HREF="mailto:&#116;&#101;x&#116;&#50;g&#101;&#110;o&#109;&#101;&#64;&#109;&#97;&#110;c&#104;&#101;&#115;te&#114;.
 &#97;&#99;.
 &#117;&#107;">
 &#116;&#101;x&#116;&#50;g&#101;&#110;o&#109;&#101;&#64;&#109;&#97;&#110;c&#104;&#101;&#115;te&#114;.
 &#97;&#99;.
 &#117;&#107;</A>.
 </P>
 
 <H2>References</H2>
 
 <P> 
 Haeussler M, Bergman CM. Annotating genes and genomes with sequences
 extracted from biomedical articles, <em>in prep.</em>, see also
 <A HREF="http://text2genome.org/" TARGET=_blank>www.text2genome.org</A>
 </P>
 
 <P> 
 Aerts S, Haeussler M, van Vooren S, Griffith OL, Hulpiau P, Jones SJM,
 Montgomery SB, Bergman CM, The Open Regulatory Annotation Consortium.
 <A HREF="http://www.ncbi.nlm.nih.gov/pubmed/18271954"
 TARGET=_blank>Text-mining assisted regulatory annotation</A>.
 Genome Biol. 2008;9(2):R31.
 </P>