539de8508531fab6a834a300830e17b6b8e64afa dschmelt Tue Sep 24 15:11:14 2019 -0700 Adding a first draft of the searchable Track hub documentaion, still needing to correct paths #20881 diff --git src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html new file mode 100755 index 0000000..1287d5f --- /dev/null +++ src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html @@ -0,0 +1,173 @@ + + + + + + + +
+Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any +UCSC assembly or remotely-hosted sequence with Assembly Hubs. Making your annotation data searchable +is an important improvement to the usability of your hub, especially if your annotations are not +otherwise represented on the Browser. This Quick Start Guide will +go through making a searchable track hub from a GFF3 file, converting to a genePred, bed, and +bigBed, then creating a trix search index file. This example will be made with the new +"useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files.
++STEP 1: Downloads In a publicly-accessible directory (such as a university server, +CyVerse, or GitHub) copy the hub.txt file using the following command: +
wget http://genome.ucsc.edu/goldenPath/help/examples/ADD PATH HERE/
+
+Alternatively, you can use curl or copy and paste the hub.txt file manually in a text editor:
+
curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/PATH
+Download some example gene data from Gencode:
+wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.basic.annotation.gff3.gz
++Finally, you will need to download four Genome Browser utilities to convert the GFF3 file to a +binary indexed bigBed format and run the search index command.
+Utility Name | +MacOS Download | +Linux Download | +
---|---|---|
gff3ToGenePred | +MacOS Download | +Linux Download | +
genePredToBed | +MacOS Download | +Linux Download | +
bedToBigBed | +MacOS Download | +Linux Download | +
IxIxx | +MacOS Download | +Linux Download | +
+STEP 2: Format Data +In order to format the data, you will need to run a command to make those commands executable. +
chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx
+gene symbol instead of ID number, and sorting by chromosome and position.
+gff3ToGenePred -geneNameAttr=gene_name gencode.v31.basic.annotation.gff3.gz stdout \
+| sort -k2,2 -k4n,4n > gencode.v31.basic.genePred
+
+Convert that genePred file to a bed file with the following command:
+genePredToBed gencode.v31.basic.genePred gencode.v31.basic.bed
+
+Compress and index that bed file into a bigBed format, adding the extraIndex to allow name
+(gene symbol) searches:
+bedToBigBed -extraIndex=name gencode.v31.basic.bedSorted https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v31.basic.bb
+
+STEP 3: Create Search Index
+This step is only neccesary if you want to link your annotation names to anything other that
+what was mentioned in the extraIndex command, in this case name (gene symbol).
+We will make an index file which will link one identifier in the file with search terms
+composed of gene IDs and partial versions of the gene symbols. This is the input file for the
+search indexing command:
+cat gencode.v31.basic.genePred | awk '{print $1, " " substr ($12, 0, 3), substr ($12, 0, 4), substr ($12, 0, 5), substr ($12, 0, 6), substr ($12, 0, 7), substr ($12, 0, 8)}' > index.txt
+To examine this file or to skip this step, you can click the following link. Note that the first
+word is the key referenced in the bed file and the following terms are associated values that
+you want to be searchable to the location of the key.
+index.txt
+Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the
+return search results quickly even in large files.
+ixIxx index.txt out.ix out.ixx
+
+STEP 4: View and Search Enter the URL to your hub on the My Hubs tab of the
+Track Data Hubs page. Alternately, you can
+enter your hub.txt URL in the following URL:
+LINK
+If you would like to look at an already-made example, click the following link:
+LINK
+
+IMAGE
+
+Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results
+page until you see your search results.
+
+
++If you are having problems, be sure all your files are publicly-accessible and that your server +accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:
+curl -IL http://yourURL/hub.txt
+
+
+Note that the Browser waits 5 minutes before checking for any changes to these files. When
+editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding
+udcTimeout=1
to your URL. For more information, see the
+Debugging and Updating Track Hubs section of
+the Track Hub User Guide.
+For more detailed instructions on setting up a hub, refer to the +Setting Up Your Own Track Hub section of the +Track Hub User Guide. + + + + +
+The hub.txt file is a configuration file with names, descriptions, and paths to other files, +The example below uses the setting "useOneFile on" to indicate that all the settings and paths +appear in only the hub.txt file as opposed to having two additional settings files (genome.txt and +trackDb.txt).
+ ++The most important settings to make the hub searchable appear in the third section, in what would +formerly be the trackDb.txt files. The settings searchIndex and searchTrix indicate which fields +are indexed in the bigBed file and where to find the .ix file respectively.
+ +hub MyHubsNameWithoutSpaces
+shortLabel My Hub's Name
+longLabel Name up to 80 characters versus shortLabel limited to 17 characters
+genomesFile genomes.txt
+email myEmail@address
+descriptionUrl aboutMyHub.html
+useOneFile on
+
+genome assembly_database_2
+
+track uniqueNameNoSpacesOrDots
+type track_type
+bigDataUrl track_data_url
+shortLabel label 17 chars
+longLabel long label up to 80 chars
+visibiltiy hide/dense/squish/pack/full
+searchIndex field,field2
+searchTrix path to .ix file
+
+
+Additional Resources
+
+
+