55f13dc0c9d9cec2e3b95521e22b191e134273df gperez2 Mon Mar 16 16:12:26 2026 -0700 Fixing 404 links for the 2026-03-15 Static Page Cronjob, No RM diff --git src/hg/htdocs/goldenPath/help/assemblyHubGuidelines.html src/hg/htdocs/goldenPath/help/assemblyHubGuidelines.html index b0dc6588d8c..4c175e32ea4 100755 --- src/hg/htdocs/goldenPath/help/assemblyHubGuidelines.html +++ src/hg/htdocs/goldenPath/help/assemblyHubGuidelines.html @@ -186,31 +186,31 @@

The .2bit commands can function with the .2bit file at a URL:

 twoBitInfo -udcDir=http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit stdout | sort -k2nr > ricCom1.chrom.sizes

Sequence can be extracted from the .2bit file with the twoBitToFa command, for example:

 twoBitToFa -seq=chrCp -udcDir=http://genome-test.gi.ucsc.edu/~hiram/hubs/Plants/ricCom1/ricCom1.2bit stdout > ricCom1.chrCp.fa

groups.txt

The groups.txt file defines the grouping of track controls under the primary genome browser image display. The example referenced here has the usual definitions as found in the UCSC Genome Browser.

Each group is defined, for example the Mapping group:

 name map
 label Mapping
 priority 2
 defaultIsClosed 0

The name is used in the trackDb.txt track definition group, to assign a particular track to this group.
The label is displayed on the genome browser as the title of this group of track controls.
The priority orders this track group with the other track groups.
The defaultIsClosed determines if this track group is expanded or closed by default. Values to use are 0 or 1.

@@ -228,39 +228,39 @@ group map html ../trackDescriptions/gap

For more informations about the syntax of the trackDb.txt file, use UCSC's Hub Track Database Definition page. It helps to have a cluster super computer to process the genomes to construct tracks. It can be done for small genomes on single computers that have multiple cores. The process for each track is unique. Please note the continuing document: Browser Track Construction for a discussion of constructing tracks for your assembly hub.

Cytoband Track

Assembly hubs can have a Cytoband track that can allow for quicker navigation of individual chromosomes and display banding pattern information if known.

A quick version of the track can be built using the existing chrom.sizes files for your assembly (the banding options include gneg, gpos25, gpos50, gpos75, gpos100, acen, gvar, or stalk):

 cat araTha1.chrom.sizes | sort -k1,1 -k2,2n | awk '{print $1,0,$2,$1,"gneg"}' > cytoBandIdeo.bed

The resulting bed file can be turned into a big bed and given a .as file (example here) to inform the browser it is not a normal bed.

 bedToBigBed -type=bed4 cytoBandIdeo.bed -as=cytoBand.as araTha1.chrom.sizes cytoBandIdeo.bigBed

In the trackDb, as long as the track is named cytoBandIdeo (track cytoBandIdeo example) it will load in the assembly hub.

In the trackDb, as long as the track is named cytoBandIdeo (track cytoBandIdeo) it will load in the assembly hub.

Assembly Hub Resources

There are resources for automatically building assembly hubs available from G-OnRamp and MakeHub.

There is also a collection of Example NCBI assembly hubs that are already working and can either be used or copied as a template to build further hubs.

G-OnRamp

- G-OnRamp is a Galaxy workflow that turns a genome assembly and RNA-Seq data into a Genome Browser with multiple evidence tracks. Because G-OnRamp is based on the Galaxy platform, developing some familiarity with the key concepts and functionalities of Galaxy would be beneficial prior to using G-OnRamp. Here is a link to their instruction page that gives an overview of their process. + G-OnRamp is a Galaxy workflow that turns a genome assembly and RNA-Seq data into a Genome Browser with multiple evidence tracks. Because G-OnRamp is based on the Galaxy platform, developing some familiarity with the key concepts and functionalities of Galaxy would be beneficial prior to using G-OnRamp. Visit the G-OnRamp website for an overview of their process.

MakeHub

MakeHub is a command line tool for the fully automatic generation of track data hubs for visualizing genomes with the UCSC genome browser. More information can be found on their GitHub page.

Example loading African bush elephant assembly hub and looking at the related genomes.txt and trackDb.txt

Here are some quick steps to load an example hub from this collection, and an attempt to explain how to look at the files behind the hub.

Click the above Vertebrate Mammalian assembly hub link.
Scroll down and find the "common name" column and click the hyperlink for "African bush elephant" after looking at the other information on that row.
Note that you have arrived at a gateway page that has "African bush elephant Genome Browser - GCA_000001905.1_Loxafr3.0" displayed, where you can see a "Download files for this assembly hub:" section if you desired to access these specific files and notably a link.
Click "Go" or the top "Genome Browser" blue bar menu to arrive at viewing this assembly hub (note this is on our genome-test site).
To load this hub on our public site, at the earlier step you can copy the hyperlink for "African bush elephant" and paste it in a browser and change the very first "http://genome-test.gi.ucsc.edu/gbdb/..." to "http://genome.ucsc.edu/cgi-bin/..." instead.

@@ -269,45 +269,45 @@

Click the link found in the "Download files for this assembly hub:" section on a loaded assembly hub's gateway page.

Note the "GCA_000001905.1_Loxafr3.0.ncbi2bit" file, this is the binary indexed remote file that is allowing the Browser to display this genome.

Find the "GCA_000001905.1_Loxafr3.0.genomes.ncbi.txt" file and click the link to look at it.

Review this genomes.txt file, which defines each track in a new hub to show where to find the above 2bit on the "twoBitPath" line and also defines where to find all track database to display data on this genome in the "trackDb" line (the real genomes.txt for this massive hub is up one directory as this hub has 204 assemblies - where you will find this stanza included).

From the earlier link to all the files, click the GCA_000001905.1_Loxafr3.0.trackDb.ncbi.txt link.

Review this trackDb.txt file which defines the tracks to display on this hub, and also has "bigDataUrl" lines to tell the Browser where to find the data to display for each track, as well as other features such on some tracks as "searchIndex" and "searchTrix" lines to help support finding data in the hub and "url" and "urlLabel" lines on some tracks to help create links out on items in the hub to other external resources and "html" lines to a file that will have information to display about the data for users who click into tracks.

Adding BLAT servers

BLAT servers (gfServer) are configured as either dedicated or dynamic servers. Dedicated BLAT serves index a genome when started and remain running in memory to quickly respond to request. Dynamic BLAT servers pre-index genomes to files and are run on demand to handle a BLAT request and then exit.

Dedicated gfServer are easier to configure and faster to respond. However, the server continually uses memory. A dynamic gfServer is more appropriate with multiple assemblies and infrequent use. Their response time is usually acceptable; however, it varies with the speed of the disk containing the index. With repeated access, the operating system will cache the indexes in memory, improving response time.

Configuring assembly hubs to use a dedicated gfServer

By running your own BLAT server, you can add lines to the genomes.txt file of your assembly hub to enable the browser to access the server and activate blat searches.

Please see Running your own gfServer for details on installing and configuring both dedicated and dynamic gfServers.

Please see the gfServer documentation for details on installing and configuring both dedicated and dynamic gfServers.

Next edit your genomes.txt stanza that references yourAssembly to have two lines to inform the browser of where the blat servers are located and what ports to use. See an example of commented out lines here. Please note the capital "B" in transBlat.
Next edit your genomes.txt stanza that references yourAssembly to have two lines to inform the browser of where the blat servers are located and what ports to use. Please note the capital "B" in transBlat.

 transBlat yourServer.yourInstitution.edu 17777
 blat yourServer.yourInstitution.edu 17779
 isPcr yourServer.yourInstitution.edu 17779

You should now be able to load and perform blat and PCR operations on your assembly. For example, a URL such as the following would bring up the blat CGI and have your assembly listed at the bottom of the "Genome:" drop-down menu: http://genome.ucsc.edu/cgi-bin/hgBlat?hubUrl=http://yourServer.yourInstitution.edu/myHub/hub.txt. Also note the separate isPcr line provides the option to use a different gfServer than the blat host if desired.
Some institutions have firewalls that will prevent the browser from sending multiple inquiries to your blat servers, in which case you may need to request your admins add this IP range as exceptions that are not limited: 128.114.119.* That will cover the U.S. genome.ucsc.edu site. In case you may wish the requests to work from our European Mirror genome-euro.ucsc.edu site, you would want to include 129.70.40.120 also to the exception list.

Please see more about configuring your blat gfServer to replicate the UCSC Browser's settings, which will also have information about optimizing PCR results. The Source Downloads page offers access to utilities with pre-compiled binaries such as gfserver found in a blat/ directory for your machine type here and further blat documentation here, and the gfServer usage statement for further options.

Please see the gfServer documentation for configuring your blat gfServer to replicate the UCSC Browser's settings, which will also have information about optimizing PCR results. The Source Downloads page offers access to utilities with pre-compiled binaries such as gfserver found in a blat/ directory for your machine type, and the gfServer usage statement for further options.

Please also know you can set up gfservers on docker and run it locally.

Note: You can stop your instance of gfServer with a command. For example:

 gfServer stop localhost 17860

Troubleshooting BLAT servers

You can see this error if you have the translatedBlat / nucleotideBlat port numbers the wrong way around:

 Expecting 6 words from server got 2

The following is an example of an error message when attempting to run a DNA sequence query via the web-based BLAT tool after loading a hub, after starting a gfServer instance (from the same dir as the 2bit file). For example, a command to start an instance of gfServer:

 gfServer start localhost 17779 -stepSize=5 contigsRenamed.2bit &