be4311c07e14feb728abc6425ee606ffaa611a58 markd Fri Jan 22 06:46:58 2021 -0800 merge with master diff --git src/hg/htdocs/goldenPath/help/gbic.html src/hg/htdocs/goldenPath/help/gbic.html index 931fc7c..d7c83cd 100755 --- src/hg/htdocs/goldenPath/help/gbic.html +++ src/hg/htdocs/goldenPath/help/gbic.html @@ -3,170 +3,179 @@ GENERATED FROM A MARKDOWN FILE IN kent/src/product. MAKE ANY EDITS TO THIS PAGE THERE, RUN MAKE, AND FOLLOW THE INSTRUCTIONS TO EDIT THIS PAGE. --> <!--#set var="TITLE" value="GBIC" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>Genome Browser in the Cloud User's Guide</h1> <h2>Contents</h2> <h6><a href='#what-is-genome-browser-in-the-cloud'>What is Genome Browser in the Cloud?</a></h6> <h6><a href='#quick-start-instructions'>Quick Start Instructions</a></h6> <h6><a href='#how-does-the-gbic-program-work'>How does the GBiC program work?</a></h6> -<h6><a href='#gbic-commands'>GBiC commands</a></h6> +<h6><a href='#gbic-commands'>GBiC Commands</a></h6> <h6><a href='#all-gbic-options'>All GBiC options</a></h6> <h6><a href='#credits'>Credits</a></h6> <a name='what-is-genome-browser-in-the-cloud'></a> <h2>What is Genome Browser in the Cloud?</h2> <p> The Genome Browser in the Cloud (GBiC) program is a convenient tool that automates the setup of a UCSC Genome Browser mirror. The GBiC program is for users who want to set up a full mirror of the UCSC Genome Browser on their server/cloud instance, rather than using <a href='gbib.html' title=''>Genome Browser in a Box</a> (GBIB) or our public website. Please see the <a href='mirror.html#considerations-before-installing-a-genome-browser' title=''>Installation of a UCSC Genome Browser on a local machine (mirror)</a> -page for a summary of installation options, including the pros and cons of using a mirror installation -via the GBiC program vs. using GBiB. +page for a summary of installation options, including the pros and cons of using a mirror +installation via the GBiC program vs. using GBiB. </p> <p> -The program works by setting up MySQL (MariaDB), Apache, and Ghostscript, and then copying the Genome -Browser CGIs onto the machine under <code>/usr/local/apache/</code>. Because it also deactivates the default -Apache htdocs/cgi folders, it is best run on a new machine, or at least a host that is not +The program works by setting up MySQL (MariaDB), Apache, and Ghostscript, and then copying the +Genome Browser CGIs onto the machine under <code>/usr/local/apache/</code>. Because it also deactivates the +default Apache htdocs/cgi folders, it is best run on a new machine, or at least a host that is not already used as a web server. The tool can also download full or partial assembly databases, -update the Genome Browser CGIs, and remove temporary files (aka "trash cleaning"). +update the Genome Browser CGIs, and remove temporary files (aka “trash cleaning”). </p> <p> The GBiC program has been tested with Ubuntu 14/16 LTS, Centos 6/6.7/7.2, and Fedora 20. </p> <p> It has also been tested on virtual machines in Amazon EC2 (Centos 6 and Ubuntu 14) and Microsoft Azure (Ubuntu). If you want to load data on the fly from UCSC, you need to select the -data centers "US West (N. California)" (Amazon) or "West US" (Microsoft) for best performance. +data centers “US West (N. California)” (Amazon) or “West US” (Microsoft) for best performance. Other data centers (e.g. East Coast) will require a local copy of the genome assembly, which requires 2TB-7TB of storage for the hg19 assembly. Note that this exceeds the current maximum size of a single Amazon EBS volume. </p> <a name='quick-start-instructions'></a> <h2>Quick Start Instructions</h2> <p> Download the GBiC program from the <a href='https://genome-store.ucsc.edu/' title=''>UCSC Genome Browser store</a>. </p> <p> Run the program as root, like this: </p> <pre><code>sudo bash browserSetup.sh install</code></pre> <p> -The <code>install</code> command downloads and configures Apache, MySQL (MariaDB) and Ghostscript, copies the Genome Browser -CGIs, and configures the mirror to load data remotely from UCSC. The <code>install</code> command must be -run before any other command is used. +The <code>install</code> command downloads and configures Apache, MySQL (MariaDB) and Ghostscript, copies the +Genome Browser CGIs, and configures the mirror to load data remotely from UCSC. The <code>install</code> +command must be run before any other command is used. </p> <p> -For mirror-specific help, please contact the Mirror Forum as listed on our <a href='../../contacts.html' title=''>contact page</a>. +For mirror-specific help, please contact the Mirror Forum as listed on our <a href='https://genome.ucsc.edu/contacts.html' title=''>contact page</a>. </p> <p> For an installation demonstration, see the <a href='https://www.youtube.com/watch?v=dcJERBVnjio' title=''>Genome Browser in the Cloud (GBiC) Introduction</a> video: </p> + <p> -<iframe width="560" height="315" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" frameborder="0" -allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> + +<iframe width="560" height="315" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen> + +</iframe> + </p> <a name='how-does-the-gbic-program-work'></a> <h2>How does the GBiC program work?</h2> <p> -The GBiC program downloads the Genome Browser CGIs and sets up the central MySQL (MariaDB) database. All -potentially destructive steps require confirmation by the user (unless the <code>-b</code> -batch mode option is specified). +The GBiC program downloads the Genome Browser CGIs and sets up the central MySQL (MariaDB) database. +All potentially destructive steps require confirmation by the user (unless the <code>-b</code> batch mode +option is specified). </p> <p> -In particular, MySQL (MariaDB) and Apache are installed and set up with the right package -manager (yum or apt-get). A default random password is set for the -MySQL (MariaDB) root user and added to the <code>~/.my.cnf</code> file of the Unix root account. -If you have already set up MySQL (MariaDB), you must create the -<code>~/.my.cnf</code> file. The program will detect this and create a template file for you. -The program also performs some minor tasks such as placing symlinks, detecting -MariaDB, deactivating SELinux, finding the correct path for your Apache install -and adapting the MySQL (MariaDB) socket config. +In particular, MySQL (MariaDB) and Apache are installed and set up with the right package manager +(yum or apt-get). A default random password is set for the MySQL (MariaDB) root user and added to +the <code>~/.my.cnf</code> file of the Unix root account. If you have already set up MySQL (MariaDB), you must +create the <code>~/.my.cnf</code> file. The program will detect this and create a template file for you. The +program also performs some minor tasks such as placing symlinks, detecting MariaDB, deactivating +SELinux, finding the correct path for your Apache install and adapting the MySQL (MariaDB) socket +config. </p> <p> -This will result in a Genome Browser accessible on localhost that loads its data -through genome-mysql.soe.ucsc.edu:3306 and hgdownload.soe.ucsc.edu:80. If -your geographic location is not on the US West Coast, the performance will be too slow for normal -use, though sufficient to test that the setup is functional. A special MySQL (MariaDB) server is -set up in Germany for users in Europe. You can change the <code>/usr/local/apache/cgi-bin/hg.conf</code> -genome-mysql.soe.ucsc.edu lines to genome-euro-mysql.soe.ucsc.edu in order to get better -performance. You can then use the program to download -assemblies of interest to your local Genome Browser, which will result in performance at least -as fast as the UCSC site. +This will result in a Genome Browser accessible on localhost that loads its data through +genome-mysql.soe.ucsc.edu:3306 and hgdownload.soe.ucsc.edu:80. If your geographic location is not on +the US West Coast, the performance will be too slow for normal use, though sufficient to test that +the setup is functional. A special MySQL (MariaDB) server is set up in Germany for users in Europe. +You can change the <code>/usr/local/apache/cgi-bin/hg.conf</code> genome-mysql.soe.ucsc.edu lines to +genome-euro-mysql.soe.ucsc.edu in order to get better performance. You can then use the program to +download assemblies of interest to your local Genome Browser, which will result in performance at +least as fast as the UCSC site. </p> -<h3>Network requirements</h3> +<h3 id="network-requirements">Network requirements</h3> + <p> Your network firewall must allow outgoing connections to the following servers and ports: +</p> + <ul> <li>MySQL (MariaDB) connections, used to load tracks not local to your computer: + <ul> <li>US server: Port 3306 on genome-mysql.soe.ucsc.edu (128.114.119.174)</li> <li>European server: Port 3306 on genome-euro-mysql.soe.ucsc.edu (129.70.40.120)</li> </ul></li> <li>Rsync, used to download track data: + <ul> <li>US server: TCP port 873 on hgdownload.soe.ucsc.edu (128.114.119.163)</li> <li>European server: TCP port 873 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li> </ul></li> <li>Download HTML descriptions on the fly: + <ul> <li>US server: TCP port 80 on hgdownload.soe.ucsc.edu (128.114.119.163)</li> <li>European server: TCP port 80 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li> </ul></li> - </ul></p> - -<a name="partition"></a> -<h3>Root file system too small for all data</h3> -<p> -If you need to move data to another partition because the root file system is too small for all -of the assembly's data, the following steps will help complete the installation. First, do a minimal -installation with the browserSetup.sh script as described below, using just the "install" -argument. Then make symlinks to the directory that will contain the data, e.g. if your biggest -filesystem is called "/big":</p> -<pre> -sudo mv /var/lib/mysql /big/ +</ul> + +<h3 id="root-file-system-too-small-for-all-data">Root file system too small for all data</h3> + +<p> +If you need to move data to another partition because the root file system is too small for all of +the assembly's data, the following steps will help complete the installation. First, do a +minimal installation with the browserSetup.sh script as described below, using just the “install” +argument. Then make symlinks to the directory that will contain the data, e.g. if your +biggest filesystem is called “/big”: +</p> + +<pre><code>sudo mv /var/lib/mysql /big/ sudo mv /gbdb /big/ sudo ln -s /big/mysql /var/lib/mysql -sudo ln -s /big/gbdb /gbdb -</pre> +sudo ln -s /big/gbdb /gbdb</code></pre> + <p> -Then use the "mirror" or "minimal" arguments to browserSetup.sh to rsync over -the majority of the data. +Then use the “mirror” or “minimal” arguments to browserSetup.sh to rsync over the majority of +the data. </p> <a name='gbic-commands'></a> <h2>GBiC Commands</h2> <p> The first argument of the program is called <code>command</code> in the following section of this document. The first command that you will need is <code>install</code>, which installs the Genome Browser dependencies, binary files and basic MySQL (MariaDB) infrastructure: </p> <pre><code>sudo bash browserSetup.sh install</code></pre> <p> There are a number of options supported by the GBiC program. In all cases, options must @@ -233,32 +242,32 @@ you may want to add this command to your crontab, perhaps running it every day, to keep your local tables in sync with those at UCSC: </p> <pre><code>sudo bash browserSetup.sh minimal hg19 hg38</code></pre> <p> To update only the Genome Browser software and not the data, use the <code>cgiUpdate</code> command: </p> <pre><code>sudo bash browserSetup.sh cgiUpdate</code></pre> <p> Software may break or not work correctly if the necessary data is not available. -Thus in most circumstances we recommend you use the <code>mirror</code>, <code>update</code>, or <code>minimal</code> commands instead -of <code>cgiUpdate</code>. +Thus in most circumstances, we recommend you use the <code>mirror</code>, <code>update</code>, or <code>minimal</code> commands +instead of <code>cgiUpdate</code>. </p> <p> You will also want to add a cleaning command to your crontab to remove the temporary files that are created during normal Genome Browser usage. These accumulate in <code>/usr/local/apache/trash</code> and can quickly consume significant space. A command like this should be added to your crontab file: </p> <pre><code>sudo bash browserSetup.sh clean</code></pre> <p> If you find that you need the Kent command line utilities in addition to the Genome Browser, the <code>addTools</code> command will install all the utilities into <code>/usr/local/bin</code>: </p>