539de8508531fab6a834a300830e17b6b8e64afa dschmelt Tue Sep 24 15:11:14 2019 -0700 Adding a first draft of the searchable Track hub documentaion, still needing to correct paths #20881 diff --git src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html new file mode 100755 index 0000000..1287d5f --- /dev/null +++ src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html @@ -0,0 +1,173 @@ + + + + + + + +

Searchable Track Hub Quick Start Guide

+

+Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any +UCSC assembly or remotely-hosted sequence with Assembly Hubs. Making your annotation data searchable +is an important improvement to the usability of your hub, especially if your annotations are not +otherwise represented on the Browser. This Quick Start Guide will +go through making a searchable track hub from a GFF3 file, converting to a genePred, bed, and +bigBed, then creating a trix search index file. This example will be made with the new +"useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files.

+

+STEP 1: Downloads In a publicly-accessible directory (such as a university server, +CyVerse, or GitHub) copy the hub.txt file using the following command: +

wget http://genome.ucsc.edu/goldenPath/help/examples/ADD PATH HERE/
+

+Alternatively, you can use curl or copy and paste the hub.txt file manually in a text editor:
+

curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/PATH
+Download some example gene data from Gencode: +
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.basic.annotation.gff3.gz
+

+Finally, you will need to download four Genome Browser utilities to convert the GFF3 file to a +binary indexed bigBed format and run the search index command.

+ + + + + + + + + + + + + + + + + + + + + + + + + + +
Utility NameMacOS DownloadLinux Download
gff3ToGenePredMacOS DownloadLinux Download
genePredToBedMacOS DownloadLinux Download
bedToBigBedMacOS DownloadLinux Download
IxIxxMacOS DownloadLinux Download
+ +

+STEP 2: Format Data +In order to format the data, you will need to run a command to make those commands executable. +

chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx
+gene symbol instead of ID number, and sorting by chromosome and position. +
gff3ToGenePred -geneNameAttr=gene_name gencode.v31.basic.annotation.gff3.gz stdout \
+| sort -k2,2 -k4n,4n > gencode.v31.basic.genePred 
+ +Convert that genePred file to a bed file with the following command: +
genePredToBed gencode.v31.basic.genePred gencode.v31.basic.bed
+ +Compress and index that bed file into a bigBed format, adding the extraIndex to allow name +(gene symbol) searches: +
bedToBigBed -extraIndex=name gencode.v31.basic.bedSorted  https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v31.basic.bb
+ +STEP 3: Create Search Index +This step is only neccesary if you want to link your annotation names to anything other that +what was mentioned in the extraIndex command, in this case name (gene symbol). +We will make an index file which will link one identifier in the file with search terms +composed of gene IDs and partial versions of the gene symbols. This is the input file for the +search indexing command: +
cat gencode.v31.basic.genePred | awk '{print $1, " " substr ($12, 0, 3), substr ($12, 0, 4), substr ($12, 0, 5), substr ($12, 0, 6), substr ($12, 0, 7), substr ($12, 0, 8)}' > index.txt
+To examine this file or to skip this step, you can click the following link. Note that the first +word is the key referenced in the bed file and the following terms are associated values that +you want to be searchable to the location of the key. +index.txt +Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the +return search results quickly even in large files. +
ixIxx index.txt out.ix out.ixx
+ +STEP 4: View and Search Enter the URL to your hub on the My Hubs tab of the +Track Data Hubs page. Alternately, you can +enter your hub.txt URL in the following URL: +LINK +If you would like to look at an already-made example, click the following link: +LINK + +IMAGE + +Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results +page until you see your search results. + + +

+If you are having problems, be sure all your files are publicly-accessible and that your server +accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:

+
curl -IL http://yourURL/hub.txt
+ +

+Note that the Browser waits 5 minutes before checking for any changes to these files. When +editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding +udcTimeout=1 to your URL. For more information, see the +Debugging and Updating Track Hubs section of +the Track Hub User Guide.

+

+For more detailed instructions on setting up a hub, refer to the +Setting Up Your Own Track Hub section of the +Track Hub User Guide. + + + + +

Understanding hub.txt with useOneFile

+

+The hub.txt file is a configuration file with names, descriptions, and paths to other files, +The example below uses the setting "useOneFile on" to indicate that all the settings and paths +appear in only the hub.txt file as opposed to having two additional settings files (genome.txt and +trackDb.txt).

+
+

+The most important settings to make the hub searchable appear in the third section, in what would +formerly be the trackDb.txt files. The settings searchIndex and searchTrix indicate which fields +are indexed in the bigBed file and where to find the .ix file respectively.

+ +
hub MyHubsNameWithoutSpaces
+shortLabel My Hub's Name
+longLabel Name up to 80 characters versus shortLabel limited to 17 characters
+genomesFile genomes.txt
+email myEmail@address
+descriptionUrl aboutMyHub.html
+useOneFile on
+
+genome assembly_database_2 +
+track uniqueNameNoSpacesOrDots +type track_type +bigDataUrl track_data_url +shortLabel label 17 chars +longLabel long label up to 80 chars +visibiltiy hide/dense/squish/pack/full +searchIndex field,field2 +searchTrix path to .ix file + + +

Additional Resources

+ + +