539de8508531fab6a834a300830e17b6b8e64afa dschmelt Tue Sep 24 15:11:14 2019 -0700 Adding a first draft of the searchable Track hub documentaion, still needing to correct paths #20881 diff --git src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html new file mode 100755 index 0000000..1287d5f --- /dev/null +++ src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html @@ -0,0 +1,173 @@ +<!DOCTYPE html> +<!--#set var="TITLE" value="Track Hub Quick Start" --> +<!--#set var="ROOT" value="../.." --> + +<!-- Relative paths to support mirror sites with non-standard GB docs install --> +<!--#include virtual="$ROOT/inc/gbPageStart.html" --> + +<h1>Searchable Track Hub Quick Start Guide</h1> +<p> +Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any +UCSC assembly or remotely-hosted sequence with Assembly Hubs. Making your annotation data searchable +is an important improvement to the usability of your hub, especially if your annotations are not +otherwise represented on the Browser. This Quick Start Guide will +go through making a searchable track hub from a GFF3 file, converting to a genePred, bed, and +bigBed, then creating a trix search index file. This example will be made with the new +"useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files.</p> +<p> +<strong>STEP 1: Downloads</strong> In a publicly-accessible directory (such as a university server, +CyVerse, or GitHub) copy the hub.txt file using the following command: +<pre><code>wget http://genome.ucsc.edu/goldenPath/help/examples/ADD PATH HERE/</code></pre> +<p> +Alternatively, you can use curl or copy and paste the hub.txt file manually in a text editor:<br> +<pre><code>curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/PATH</code></pre> +Download some example gene data from Gencode: +<pre><code>wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.basic.annotation.gff3.gz</code></pre> +<p> +Finally, you will need to download four Genome Browser utilities to convert the GFF3 file to a +binary indexed bigBed format and run the search index command.</p> +<table> + <tr> + <th>Utility Name</th> + <th>MacOS Download</th> + <th>Linux Download</th> + </tr> + <tr> + <td>gff3ToGenePred</td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/gff3ToGenePred">MacOS Download</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/gff3ToGenePred">Linux Download</a></td> + </tr> + <tr> + <td>genePredToBed</td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/genePredToBed">MacOS Download</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/genePredToBed">Linux Download</a></td> + </tr> + <tr> + <td>bedToBigBed</td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/bedToBigBed">MacOS Download</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bedToBigBed">Linux Download</a></td> + </tr> + <tr> + <td>IxIxx</td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/macOSX.x86_64/ixIxx">MacOS Download</a></td> + <td><a href="http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/ixIxx">Linux Download</a></td> + </tr> +</table> + +<p> +<strong>STEP 2: Format Data</strong> +In order to format the data, you will need to run a command to make those commands executable. +<pre><code>chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx</code></pre> +gene symbol instead of ID number, and sorting by chromosome and position. +<pre><code>gff3ToGenePred -geneNameAttr=gene_name gencode.v31.basic.annotation.gff3.gz stdout \ +| sort -k2,2 -k4n,4n > gencode.v31.basic.genePred </code></pre> + +Convert that genePred file to a bed file with the following command: +<pre><code>genePredToBed gencode.v31.basic.genePred gencode.v31.basic.bed</code></pre> + +Compress and index that bed file into a bigBed format, adding the extraIndex to allow name +(gene symbol) searches: +<pre><code>bedToBigBed -extraIndex=name gencode.v31.basic.bedSorted https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v31.basic.bb</code></pre> + +<strong>STEP 3: Create Search Index</strong> +This step is only neccesary if you want to link your annotation names to anything other that +what was mentioned in the extraIndex command, in this case name (gene symbol). +We will make an index file which will link one identifier in the file with search terms +composed of gene IDs and partial versions of the gene symbols. This is the input file for the +search indexing command: +<pre><code>cat gencode.v31.basic.genePred | awk '{print $1, " " substr ($12, 0, 3), substr ($12, 0, 4), substr ($12, 0, 5), substr ($12, 0, 6), substr ($12, 0, 7), substr ($12, 0, 8)}' > index.txt</code></pre> +To examine this file or to skip this step, you can click the following link. Note that the first +word is the key referenced in the bed file and the following terms are associated values that +you want to be searchable to the location of the key. +<a href="PATH TO index.txt">index.txt</a> +Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the +return search results quickly even in large files. +<pre><code>ixIxx index.txt out.ix out.ixx</code></pre> + +<strong>STEP 4: View and Search</strong> Enter the URL to your hub on the My Hubs tab of the +<a href="../../cgi-bin/hgHubConnect#unlistedHubs">Track Data Hubs</a> page. Alternately, you can +enter your hub.txt URL in the following URL: +LINK +If you would like to look at an already-made example, click the following link: +LINK + +IMAGE + +Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results +page until you see your search results. + + +<p> +If you are having problems, be sure all your files are publicly-accessible and that your server +accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:</p> +<pre><code>curl -IL http://yourURL/hub.txt</code></pre> + +<p> +Note that the Browser waits 5 minutes before checking for any changes to these files. <strong>When +editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding +<code>udcTimeout=1</code> to your URL.</strong> For more information, see the +<a href="hgTrackHubHelp.html#Debug" target="_blank">Debugging and Updating Track Hubs</a> section of +the <a href="hgTrackHubHelp.html" target="_blank">Track Hub User Guide</a>.</p> +<p> +<strong>For more detailed instructions on setting up a hub, refer to the +<a href="hgTrackHubHelp.html#Setup" target="_blank">Setting Up Your Own Track Hub</a> section of the +Track Hub User Guide.</strong> + + +<!-- ========== hub.txt ============================== --> +<a name="hub.txt"></a> +<h2>Understanding hub.txt with useOneFile</h2> +<p> +The hub.txt file is a configuration file with names, descriptions, and paths to other files, +The example below uses the setting "useOneFile on" to indicate that all the settings and paths +appear in only the hub.txt file as opposed to having two additional settings files (genome.txt and +trackDb.txt).</p> +</br> +<p> +The most important settings to make the hub searchable appear in the third section, in what would +formerly be the trackDb.txt files. The settings searchIndex and searchTrix indicate which fields +are indexed in the bigBed file and where to find the .ix file respectively.</p> + +<pre><code><strong>hub</strong> <em>MyHubsNameWithoutSpaces</em> +<strong>shortLabel</strong> <em>My Hub's Name</em> +<strong>longLabel</strong> <em>Name up to 80 characters versus shortLabel limited to 17 characters</em> +<strong>genomesFile</strong> <em>genomes.txt</em> +<strong>email</strong> <em>myEmail@address</em> +<strong>descriptionUrl</strong> <em>aboutMyHub.html</em> +<strong>useOneFile</strong> <em>on</em> +<br> +<strong>genome</strong> <em>assembly_database_2</em> +<br> +<strong>track</strong> <em>uniqueNameNoSpacesOrDots</em> +<strong>type</strong> <em>track_type</em> +<strong>bigDataUrl</strong> <em>track_data_url</em> +<strong>shortLabel</strong> <em>label 17 chars</em> +<strong>longLabel</strong> <em>long label up to 80 chars</em> +<strong>visibiltiy</strong> <em>hide/dense/squish/pack/full</em> +<strong>searchIndex</strong> <em>field,field2</em> +<strong>searchTrix</strong> <em>path to .ix file</em> + + +<h2>Additional Resources</h2> +<ul> + <li> + <strong><a href="hgTrackHubHelp.html" target="_blank">Track Hub User +Guide</a></strong></li> + <li> + <strong><a href="trackDb/trackDbHub.html" target="_blank">Track Database (trackDb) Definition + Document</a></strong></li> + <li> + <strong><a href="http://genomewiki.ucsc.edu/index.php/Assembly_Hubs" target="_blank">Assembly Hubs + Wiki</a></strong></li> + <li> + <strong><a href="http://genomewiki.ucsc.edu/index.php/Public_Hub_Guidelines" + target="_blank">Public Hub Guidelines Wiki</a></strong></li> + <li> + <strong><a href="hubQuickStartGroups.html" target="_blank">Quick Start Guide to Organizing Track + Hubs into Groupings</a></strong></li> + <li> + <strong><a href="hubQuickStartAssembly.html" target="_blank">Quick Start Guide to Assembly Track + Hubs</a></strong></li> +</ul> + +<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->