1c9a945ecbd1b022c8d70a3c047c08cc6e9f8f57 dschmelt Fri Sep 27 13:51:34 2019 -0700 Committing final draft of searchable hub guide #20881 diff --git src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html index 1287d5f..facf10d 100755 --- src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html +++ src/hg/htdocs/goldenPath/help/hubQuickStartSearch.html @@ -1,173 +1,212 @@

Searchable Track Hub Quick Start Guide

Track Hubs are a method of displaying remotely-hosted annotation data quickly and flexibly on any -UCSC assembly or remotely-hosted sequence with Assembly Hubs. Making your annotation data searchable +UCSC assembly or remotely-hosted sequence. Making your annotation data searchable is an important improvement to the usability of your hub, especially if your annotations are not otherwise represented on the Browser. This Quick Start Guide will -go through making a searchable track hub from a GFF3 file, converting to a genePred, bed, and +go through making a searchable track hub from a GFF3 file; converting to a genePred, bed, and bigBed, then creating a trix search index file. This example will be made with the new "useOneFile" feature to avoid any need for separate genome.txt and trackDb.txt files.

-STEP 1: Downloads In a publicly-accessible directory (such as a university server, -CyVerse, or GitHub) copy the hub.txt file using the following command: -

wget http://genome.ucsc.edu/goldenPath/help/examples/ADD PATH HERE/
+

STEP 1: Downloads

-Alternatively, you can use curl or copy and paste the hub.txt file manually in a text editor:
-

curl -O http://genome.ucsc.edu/goldenPath/help/examples/hubDirectory/PATH
-Download some example gene data from Gencode: -
wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_31/gencode.v31.basic.annotation.gff3.gz
+Gather our settings and data files in a publicly-accessible directory (such as a university +web-server, CyVerse, or +Github). For more information on this, please see the +hosting guide.

-Finally, you will need to download four Genome Browser utilities to convert the GFF3 file to a -binary indexed bigBed format and run the search index command.

+Copy the hub.txt file using + wget, curl, or copy-paste: +
wget http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt

+

+Download some example GFF3 data from Gencode. This file happens to be long non-coding RNAs (lncRNAs): +

wget ftp://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_32/gencode.v32.long_noncoding_RNAs.gff3.gz

+

+Next, you will need to download four Genome Browser utilities to convert the GFF3 file to +bigBed format and run the search index command. Similar commands exist to convert other file types. +These are operating system specific: - - + + - - + + - - + + - - + +
Utility Name MacOS Download Linux Download
gff3ToGenePredMacOS DownloadLinux DownloadDownloadDownload
genePredToBedMacOS DownloadLinux DownloadDownloadDownload
bedToBigBedMacOS DownloadLinux DownloadDownloadDownload
IxIxxMacOS DownloadLinux DownloadDownloadDownload
- +

+

STEP 2: Format Data

-STEP 2: Format Data -In order to format the data, you will need to run a command to make those commands executable. +In order to format the data, you will need to run a command to make those commands executable:

chmod +x gff3ToGenePred genePredToBed bedToBigBed IxIxx
-gene symbol instead of ID number, and sorting by chromosome and position. -
gff3ToGenePred -geneNameAttr=gene_name gencode.v31.basic.annotation.gff3.gz stdout \
-| sort -k2,2 -k4n,4n > gencode.v31.basic.genePred 
- -Convert that genePred file to a bed file with the following command: -
genePredToBed gencode.v31.basic.genePred gencode.v31.basic.bed
- -Compress and index that bed file into a bigBed format, adding the extraIndex to allow name -(gene symbol) searches: -
bedToBigBed -extraIndex=name gencode.v31.basic.bedSorted  https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v31.basic.bb
- -STEP 3: Create Search Index -This step is only neccesary if you want to link your annotation names to anything other that -what was mentioned in the extraIndex command, in this case name (gene symbol). -We will make an index file which will link one identifier in the file with search terms -composed of gene IDs and partial versions of the gene symbols. This is the input file for the -search indexing command: -
cat gencode.v31.basic.genePred | awk '{print $1, " " substr ($12, 0, 3), substr ($12, 0, 4), substr ($12, 0, 5), substr ($12, 0, 6), substr ($12, 0, 7), substr ($12, 0, 8)}' > index.txt
-To examine this file or to skip this step, you can click the following link. Note that the first -word is the key referenced in the bed file and the following terms are associated values that -you want to be searchable to the location of the key. -index.txt -Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the -return search results quickly even in large files. -
ixIxx index.txt out.ix out.ixx
-STEP 4: View and Search Enter the URL to your hub on the My Hubs tab of the -Track Data Hubs page. Alternately, you can -enter your hub.txt URL in the following URL: -LINK -If you would like to look at an already-made example, click the following link: -LINK +

+Then run the first conversion from GFF3 to genePred, making sure to include +-geneNameAttr=gene_name so that gene symbol is used as the name2 instead of +ID number, and sorting by chromosome and position:

+
gff3ToGenePred -geneNameAttr=gene_name gencode.v32.long_noncoding_RNAs.gff3.gz stdout | sort -k2,2 -k4n,4n > gencode.v32.lncRNAs.genePred
-IMAGE +

+Convert that genePred file to a bed file:

+
genePredToBed gencode.v32.lncRNAs.genePred gencode.v32.lncRNAs.bed
-Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results -page until you see your search results. +

+Compress and index that bed file into a bigBed format, adding the +-extraIndex=name to allow EnstID searches:

+
bedToBigBed -extraIndex=name gencode.v32.lncRNAs.bed https://genome.ucsc.edu/goldenPath/help/hg38.chrom.sizes gencode.v32.lncRNAs.bb
+

+If you would like to stop here, you will be able to display your bigBed hub and search for the +names that were indexed into the bigBed file (EnstID). You will not be able to use the +searchIndex and searchTrix trackDb setting, which require creating a +key and value search index for your file as shown below.

+ +

STEP 3: Create Search Index

+

+If you want to link your annotation names to anything other than +the field referrenced in the -extraIndex command, you will need to make and index +file. We will make an input file which will link one identifier (EnstID) +with search terms composed of gene symbols and EnstIDs. Below is one example of a command to +create an input file for the search indexing command:

+
cat gencode.v32.lncRNAs.genePred | awk '{print $1, $12, $1}' > in.txt
+

+To examine or download that file, you can click + +here. Note that the first word is the key referenced in the BED file and the following +search terms are associated aliases will be searchable to the location of the key. +These search terms are case insensitive and allow partial word searches.

+

+Finally you will make the index file (.ix) and the index of that index (.ixx) which helps the +search run quickly even in large files.

+
ixIxx in.txt out.ix out.ixx
+

STEP 4: View and Search

+

+Enter the URL to your hub on the My Hubs tab of the +Track Data Hubs page. Alternately, you can +enter your hub.txt URL in the following web address:

+
genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=YourUrlHere
+

+If you would like to look at an already-made example, click the following link which includes +hideTracks=1 to hide other tracks:

+
genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&hubUrl=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubSearchable/hub.txt
+ +

+ A display of the Searchable hub track +

This is an example of what your Track Hub data should look like.

+

+ +

Once your hub displays, you should be able to type in a gene symbol or Enst ID and scroll down the results +page until you see your search results.

+ +

+ Typing a search term in the search box +

You can type your search term (fam87b) in the box above +the ideogram and press . Note that it is not case sensitive.

+

+ +

+ Search hit for fam87b + Search results for fam87b

Scrolling to the bottom of the search results page, you will +see your searchable hub keyword that was linked with your search term. Clicking into it will bring +you to the position of your search term.

+

If you are having problems, be sure all your files are publicly-accessible and that your server accepts byte-ranges. You can check using the following command to verify "Accept-Ranges: bytes" displays:

curl -IL http://yourURL/hub.txt

-Note that the Browser waits 5 minutes before checking for any changes to these files. When +Note that the Browser waits 5 minutes before checking for any changes to these files. When editing hub.txt, genomes.txt,and trackDb.txt, you can shorten this delay by adding -udcTimeout=1 to your URL. For more information, see the +udcTimeout=1 to your URL. For more information, see the Debugging and Updating Track Hubs section of the Track Hub User Guide.

-For more detailed instructions on setting up a hub, refer to the -Setting Up Your Own Track Hub section of the -Track Hub User Guide. -

Understanding hub.txt with useOneFile

The hub.txt file is a configuration file with names, descriptions, and paths to other files, -The example below uses the setting "useOneFile on" to indicate that all the settings and paths +The example below uses the setting useOneFile on to indicate that all the settings and paths appear in only the hub.txt file as opposed to having two additional settings files (genome.txt and -trackDb.txt).

-
+trackDb.txt). To see the actual hub.txt file for the above example, click here.

The most important settings to make the hub searchable appear in the third section, in what would -formerly be the trackDb.txt files. The settings searchIndex and searchTrix indicate which fields -are indexed in the bigBed file and where to find the .ix file respectively.

+formerly be the trackDb.txt file. The searchIndex and searchTrix +indicate which fields are indexed in the bigBed file and where to find the .ix file respectively. +

hub MyHubsNameWithoutSpaces
 shortLabel My Hub's Name
 longLabel Name up to 80 characters versus shortLabel limited to 17 characters
 genomesFile genomes.txt
 email myEmail@address
 descriptionUrl aboutMyHub.html
 useOneFile on
-
+ genome assembly_database_2 -
+ track uniqueNameNoSpacesOrDots type track_type bigDataUrl track_data_url shortLabel label 17 chars longLabel long label up to 80 chars visibiltiy hide/dense/squish/pack/full searchIndex field,field2 -searchTrix path to .ix file - +searchTrix path/to/.ix/file +

Additional Resources