e06433e5a691534449ed3cb9ec070b2be6cf07b7 jnavarr5 Mon Dec 28 15:09:39 2020 -0800 Catching up the github README for GBiC to match the gbic.html documentation. refs #23297 diff --git src/hg/htdocs/goldenPath/help/gbic.html src/hg/htdocs/goldenPath/help/gbic.html index 931fc7c..4caf45b 100755 --- src/hg/htdocs/goldenPath/help/gbic.html +++ src/hg/htdocs/goldenPath/help/gbic.html @@ -1,181 +1,187 @@ <!DOCTYPE HTML> <!-- DO NOT EDIT THE HTDOCS VERSION OF THIS FILE. THIS FILE IS AUTOMATICALLY GENERATED FROM A MARKDOWN FILE IN kent/src/product. MAKE ANY EDITS TO THIS PAGE THERE, RUN MAKE, AND FOLLOW THE INSTRUCTIONS TO EDIT THIS PAGE. --> <!--#set var="TITLE" value="GBIC" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> -<h1>Genome Browser in the Cloud User's Guide</h1> +<h1>Genome Browser in the Cloud User’s Guide</h1> <h2>Contents</h2> <h6><a href='#what-is-genome-browser-in-the-cloud'>What is Genome Browser in the Cloud?</a></h6> <h6><a href='#quick-start-instructions'>Quick Start Instructions</a></h6> <h6><a href='#how-does-the-gbic-program-work'>How does the GBiC program work?</a></h6> -<h6><a href='#gbic-commands'>GBiC commands</a></h6> +<h6><a href='#gbic-commands'>GBiC Commands</a></h6> <h6><a href='#all-gbic-options'>All GBiC options</a></h6> <h6><a href='#credits'>Credits</a></h6> <a name='what-is-genome-browser-in-the-cloud'></a> <h2>What is Genome Browser in the Cloud?</h2> <p> The Genome Browser in the Cloud (GBiC) program is a convenient tool that automates the setup of a UCSC Genome Browser mirror. The GBiC program is for users who want to set up a full mirror of the UCSC Genome Browser on their server/cloud instance, rather than using <a href='gbib.html' title=''>Genome Browser in a Box</a> (GBIB) or our public website. Please see the -<a href='mirror.html#considerations-before-installing-a-genome-browser' title=''>Installation of a UCSC Genome Browser on a local machine (mirror)</a> -page for a summary of installation options, including the pros and cons of using a mirror installation -via the GBiC program vs. using GBiB. +<a href='mirror.html#considerations-before-installing-a-genome-browser' title=''>Installation of a +UCSC Genome Browser on a local machine (mirror)</a> +page for a summary of installation options, including the pros and cons of using a mirror +installation via the GBiC program vs. using GBiB. </p> <p> -The program works by setting up MySQL (MariaDB), Apache, and Ghostscript, and then copying the Genome -Browser CGIs onto the machine under <code>/usr/local/apache/</code>. Because it also deactivates the default -Apache htdocs/cgi folders, it is best run on a new machine, or at least a host that is not -already used as a web server. The tool can also download full or partial assembly databases, -update the Genome Browser CGIs, and remove temporary files (aka "trash cleaning"). +The program works by setting up MySQL (MariaDB), Apache, and Ghostscript, and then copying the +Genome Browser CGIs onto the machine under <code>/usr/local/apache/</code>. Because it also +deactivates the default Apache htdocs/cgi folders, it is best run on a new machine, or at least a +host that is not already used as a web server. The tool can also download full or partial assembly +databases, update the Genome Browser CGIs, and remove temporary files (aka "trash +cleaning"). </p> <p> The GBiC program has been tested with Ubuntu 14/16 LTS, Centos 6/6.7/7.2, and Fedora 20. </p> <p> It has also been tested on virtual machines in Amazon EC2 (Centos 6 and Ubuntu 14) and Microsoft Azure (Ubuntu). If you want to load data on the fly from UCSC, you need to select the -data centers "US West (N. California)" (Amazon) or "West US" (Microsoft) for best performance. -Other data centers (e.g. East Coast) will require a local copy of the genome assembly, which -requires 2TB-7TB of storage for the hg19 assembly. Note that this exceeds the current maximum -size of a single Amazon EBS volume. +data centers "US West (N. California)" (Amazon) or "West US" (Microsoft) for +best performance. Other data centers (e.g. East Coast) will require a local copy of the genome +assembly, which requires 2TB-7TB of storage for the hg19 assembly. Note that this exceeds the +current maximum size of a single Amazon EBS volume. </p> <a name='quick-start-instructions'></a> <h2>Quick Start Instructions</h2> <p> -Download the GBiC program from the <a href='https://genome-store.ucsc.edu/' title=''>UCSC Genome Browser store</a>. +Download the GBiC program from the <a href='https://genome-store.ucsc.edu/' title=''>UCSC Genome +Browser store</a>. </p> <p> Run the program as root, like this: </p> <pre><code>sudo bash browserSetup.sh install</code></pre> <p> -The <code>install</code> command downloads and configures Apache, MySQL (MariaDB) and Ghostscript, copies the Genome Browser -CGIs, and configures the mirror to load data remotely from UCSC. The <code>install</code> command must be -run before any other command is used. +The <code>install</code> command downloads and configures Apache, MySQL (MariaDB) and Ghostscript, +copies the Genome Browser CGIs, and configures the mirror to load data remotely from UCSC. The +<code>install</code> command must be run before any other command is used. </p> <p> -For mirror-specific help, please contact the Mirror Forum as listed on our <a href='../../contacts.html' title=''>contact page</a>. +For mirror-specific help, please contact the Mirror Forum as listed on our +<a href='../../contacts.html' title=''>contact page</a>. </p> <p> -For an installation demonstration, see the <a href='https://www.youtube.com/watch?v=dcJERBVnjio' title=''>Genome Browser in the Cloud (GBiC) Introduction</a> -video: +For an installation demonstration, see the <a href='https://www.youtube.com/watch?v=dcJERBVnjio' +title=''>Genome Browser in the Cloud (GBiC) Introduction</a> video: </p> + <p> -<iframe width="560" height="315" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" frameborder="0" -allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> +<iframe width="560" height="315" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" +frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" +allowfullscreen></iframe> </p> <a name='how-does-the-gbic-program-work'></a> <h2>How does the GBiC program work?</h2> <p> -The GBiC program downloads the Genome Browser CGIs and sets up the central MySQL (MariaDB) database. All -potentially destructive steps require confirmation by the user (unless the <code>-b</code> +The GBiC program downloads the Genome Browser CGIs and sets up the central MySQL (MariaDB) database. +All potentially destructive steps require confirmation by the user (unless the <code>-b</code> batch mode option is specified). </p> <p> -In particular, MySQL (MariaDB) and Apache are installed and set up with the right package -manager (yum or apt-get). A default random password is set for the -MySQL (MariaDB) root user and added to the <code>~/.my.cnf</code> file of the Unix root account. -If you have already set up MySQL (MariaDB), you must create the -<code>~/.my.cnf</code> file. The program will detect this and create a template file for you. -The program also performs some minor tasks such as placing symlinks, detecting -MariaDB, deactivating SELinux, finding the correct path for your Apache install -and adapting the MySQL (MariaDB) socket config. +In particular, MySQL (MariaDB) and Apache are installed and set up with the right package manager +(yum or apt-get). A default random password is set for the MySQL (MariaDB) root user and added to +the <code>~/.my.cnf</code> file of the Unix root account. If you have already set up MySQL +(MariaDB), you must create the <code>~/.my.cnf</code> file. The program will detect this and create +a template file for you. The program also performs some minor tasks such as placing symlinks, +detecting MariaDB, deactivating SELinux, finding the correct path for your Apache install and +adapting the MySQL (MariaDB) socket config. </p> <p> -This will result in a Genome Browser accessible on localhost that loads its data -through genome-mysql.soe.ucsc.edu:3306 and hgdownload.soe.ucsc.edu:80. If -your geographic location is not on the US West Coast, the performance will be too slow for normal -use, though sufficient to test that the setup is functional. A special MySQL (MariaDB) server is -set up in Germany for users in Europe. You can change the <code>/usr/local/apache/cgi-bin/hg.conf</code> -genome-mysql.soe.ucsc.edu lines to genome-euro-mysql.soe.ucsc.edu in order to get better -performance. You can then use the program to download -assemblies of interest to your local Genome Browser, which will result in performance at least -as fast as the UCSC site. +This will result in a Genome Browser accessible on localhost that loads its data through +genome-mysql.soe.ucsc.edu:3306 and hgdownload.soe.ucsc.edu:80. If your geographic location is not on +the US West Coast, the performance will be too slow for normal use, though sufficient to test that +the setup is functional. A special MySQL (MariaDB) server is set up in Germany for users in Europe. +You can change the <code>/usr/local/apache/cgi-bin/hg.conf</code> genome-mysql.soe.ucsc.edu lines to +genome-euro-mysql.soe.ucsc.edu in order to get better performance. You can then use the program to +download assemblies of interest to your local Genome Browser, which will result in performance at +least as fast as the UCSC site. </p> -<h3>Network requirements</h3> +<h3 id="network-requirements">Network requirements</h3> + <p> Your network firewall must allow outgoing connections to the following servers and ports: <ul> <li>MySQL (MariaDB) connections, used to load tracks not local to your computer: <ul> <li>US server: Port 3306 on genome-mysql.soe.ucsc.edu (128.114.119.174)</li> <li>European server: Port 3306 on genome-euro-mysql.soe.ucsc.edu (129.70.40.120)</li> </ul></li> <li>Rsync, used to download track data: <ul> <li>US server: TCP port 873 on hgdownload.soe.ucsc.edu (128.114.119.163)</li> <li>European server: TCP port 873 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li> </ul></li> <li>Download HTML descriptions on the fly: <ul> <li>US server: TCP port 80 on hgdownload.soe.ucsc.edu (128.114.119.163)</li> <li>European server: TCP port 80 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li> </ul></li> </ul></p> -<a name="partition"></a> -<h3>Root file system too small for all data</h3> +<h3 id="root-file-system-too-small-for-all-data">Root file system too small for all data</h3> + <p> -If you need to move data to another partition because the root file system is too small for all -of the assembly's data, the following steps will help complete the installation. First, do a minimal +If you need to move data to another partition because the root file system is too small for all of +the assembl's data, the following steps will help complete the installation. First, do a minimal installation with the browserSetup.sh script as described below, using just the "install" -argument. Then make symlinks to the directory that will contain the data, e.g. if your biggest -filesystem is called "/big":</p> -<pre> -sudo mv /var/lib/mysql /big/ +argument. Then make symlinks to the directory that will contain the data, e.g. if your biggest +filesystem is called "/big": +</p> + +<pre><code>sudo mv /var/lib/mysql /big/ sudo mv /gbdb /big/ sudo ln -s /big/mysql /var/lib/mysql -sudo ln -s /big/gbdb /gbdb -</pre> +sudo ln -s /big/gbdb /gbdb</code></pre> + <p> Then use the "mirror" or "minimal" arguments to browserSetup.sh to rsync over the majority of the data. </p> <a name='gbic-commands'></a> <h2>GBiC Commands</h2> <p> -The first argument of the program is called <code>command</code> in the following section of this document. -The first command that you will need is <code>install</code>, which installs the Genome Browser dependencies, -binary files and basic MySQL (MariaDB) infrastructure: +The first argument of the program is called <code>command</code> in the following section of this +document. The first command that you will need is <code>install</code>, which installs the Genome +Browser dependencies, binary files and basic MySQL (MariaDB) infrastructure: </p> <pre><code>sudo bash browserSetup.sh install</code></pre> <p> There are a number of options supported by the GBiC program. In all cases, options must be specified before the command. </p> <p> The following example correctly specifies the batch mode option to the program: </p> <pre><code>sudo bash browserSetup.sh -b install</code></pre> @@ -203,94 +209,96 @@ the remote on-the-fly loading, specify the option <code>-o</code> (offline) or <code>-f</code> (on-the-fly). If you are planning to keep sensitive data on your mirror, you will want to disable on-the-fly loading, like so: </p> <pre><code>sudo bash browserSetup.sh -o</code></pre> <p> The full assembly download for hg19 is >7TB. Limit this to 2TB or less with the <code>-t</code> option: </p> <pre><code>sudo bash browserSetup.sh -t noEncode mirror hg19</code></pre> <p> -For a full list of <code>-t</code> options, see the <a href='#all-gbic-options' title=''>All GBiC options</a> section or run the -program with no arguments. +For a full list of <code>-t</code> options, see the <a href='#all-gbic-options' title=''>All GBiC +options</a> section or run the program with no arguments. </p> <p> To update all CGIs and fully mirrored assemblies, call the tool with the <code>update</code> parameter like this: </p> <pre><code>sudo bash browserSetup.sh update</code></pre> <p> Minimal mirror sites (those that have partially mirrored an assembly) should not -use the <code>update</code> command, but rather just rerun the <code>minimal</code> command, so that only the minimal -tables are updated. For instance, if you have partially mirrored the hg19 and hg38 databases, -you may want to add this command to your crontab, perhaps running it every day, to keep your local -tables in sync with those at UCSC: +use the <code>update</code> command, but rather just rerun the <code>minimal</code> command, so that +only the minimal tables are updated. For instance, if you have partially mirrored the hg19 and hg38 +databases, you may want to add this command to your crontab, perhaps running it every day, to keep +your local tables in sync with those at UCSC: </p> <pre><code>sudo bash browserSetup.sh minimal hg19 hg38</code></pre> <p> To update only the Genome Browser software and not the data, use the <code>cgiUpdate</code> command: </p> <pre><code>sudo bash browserSetup.sh cgiUpdate</code></pre> <p> Software may break or not work correctly if the necessary data is not available. -Thus in most circumstances we recommend you use the <code>mirror</code>, <code>update</code>, or <code>minimal</code> commands instead -of <code>cgiUpdate</code>. +Thus in most circumstances, we recommend you use the <code>mirror</code>, <code>update</code>, or +<code>minimal</code> commands instead of <code>cgiUpdate</code>. </p> <p> You will also want to add a cleaning command to your crontab to remove the temporary files that are created during normal Genome Browser usage. These accumulate in <code>/usr/local/apache/trash</code> and can quickly consume significant space. A command like this should be added to your crontab file: </p> <pre><code>sudo bash browserSetup.sh clean</code></pre> <p> If you find that you need the Kent command line utilities in addition to the Genome Browser, the <code>addTools</code> command will install all the utilities into <code>/usr/local/bin</code>: </p> <pre><code>sudo bash browserSetup.sh addTools</code></pre> <p> A majority of these utilities require an <code>.hg.conf</code> file in the users home directory. For an example of a minimal <code>.hg.conf</code> file, click -<a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/minimal.hg.conf' title=''>here</a>. +<a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/minimal.hg.conf' +title=''>here</a>. </p> <p> If you find a bug, or if your Linux distribution is not supported, please contact <a href='mailto::genome-mirror@soe.ucsc.edu' title=''>genome-mirror@soe.ucsc.edu</a>. </p> <p> More details about the Genome Browser installation are available -<a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product' title=''>here</a>. +<a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product' +title=''>here</a>. </p> <a name='all-gbic-options'></a> <h2>All GBiC options</h2> <p> Here is the full listing of commands and options supported by the GBiC program: </p> <pre><code>browserSetup.sh [options] [command] [assemblyList] - UCSC genome browser install script command is one of: install - install the genome browser on this machine. This is usually required before any other commands are run. minimal - download only a minimal set of tables. Missing tables are