src/hg/htdocs/goldenPath/help/gbic.html fbeaf51d42a4e9306db9a5761730af8189300a7e

fbeaf51d42a4e9306db9a5761730af8189300a7e
max
  Fri Nov 24 06:59:47 2023 -0800
chinafying all other video links, refs #32535

diff --git src/hg/htdocs/goldenPath/help/gbic.html src/hg/htdocs/goldenPath/help/gbic.html
index 2e8b122..423e4a4 100755
--- src/hg/htdocs/goldenPath/help/gbic.html
+++ src/hg/htdocs/goldenPath/help/gbic.html
@@ -1,387 +1,396 @@
 <!DOCTYPE HTML>
 <!-- DO NOT EDIT THE HTDOCS VERSION OF THIS FILE. THIS FILE IS AUTOMATICALLY
      GENERATED FROM A MARKDOWN FILE IN kent/src/product. MAKE ANY EDITS TO
      THIS PAGE THERE, RUN MAKE, AND FOLLOW THE INSTRUCTIONS TO EDIT THIS PAGE. -->
 <!--#set var="TITLE" value="GBIC" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 
 
 <h1>Genome Browser in the Cloud User&#39;s Guide</h1>
 <h2>Contents</h2>
 <h6><a href='#what-is-genome-browser-in-the-cloud'>What is Genome Browser in the Cloud?</a></h6>
 <h6><a href='#quick-start-instructions'>Quick Start Instructions</a></h6>
 <h6><a href='#how-does-the-gbic-program-work'>How does the GBiC program work?</a></h6>
 <h6><a href='#gbic-commands'>GBiC Commands</a></h6>
 <h6><a href='#all-gbic-options'>All GBiC options</a></h6>
 <h6><a href='#credits'>Credits</a></h6>
 <a name='what-is-genome-browser-in-the-cloud'></a>
 <h2>What is Genome Browser in the Cloud?</h2>
 
 <p>
 The Genome Browser in the Cloud (GBiC) program is a convenient tool that automates the setup of a
 UCSC Genome Browser mirror. The GBiC program is for users who want to set up a full mirror of the
 UCSC Genome Browser on their server/cloud instance, rather than using
 <a href='gbib.html' title=''>Genome Browser in a Box</a> (GBIB)
 or our public website. Please see the
 <a href='mirror.html#considerations-before-installing-a-genome-browser' title=''>Installation of a UCSC Genome Browser on a local machine (mirror)</a>
 page for a summary of installation options, including the pros and cons of using a mirror
 installation via the GBiC program vs. using GBiB.
 </p>
 
 <p>
 The program works by setting up MySQL (MariaDB), Apache, and Ghostscript, and then copying the
 Genome Browser CGIs onto the machine under <code>/usr/local/apache/</code>. Because it also deactivates the
 default Apache htdocs/cgi folders, it is best run on a new machine, or at least a host that is not
 already used as a web server. The tool can also download full or partial assembly databases,
 update the Genome Browser CGIs, and remove temporary files (aka &ldquo;trash cleaning&rdquo;).
 </p>
 
 <p>
 The GBiC program has been tested with Ubuntu 18/20 LTS, Centos 7.2/8, and Fedora 20.
 </p>
 
 <p>
 It has also been tested on virtual machines in Amazon EC2 (Centos) and Microsoft Azure (Ubuntu).
 </p>
 
 <p>
 If you want to load data on the fly from UCSC, you need to select the
 data centers &ldquo;US West (N. California)&rdquo; (Amazon) or &ldquo;West US&rdquo; (Microsoft) for best performance.
 Other data centers (e.g. East Coast) will require a local copy of the genome assembly, which
 requires 2TB-7TB of storage for the hg19 assembly. Note that if you mirror multiple assemblies, this
 may in rare cases exceed 16TB, the current maximum size of a single Amazon EBS volume. In these
 cases, you may need to use RAID striping of multiple EBS volumes, or use Amazon EFS for the /gbdb
 files and Amazon Aurora for the MySQL tables. We have had one report from a user that the MySQL
 performance is better when served from Aurora instead of being served from the same EC2 instance;
 the cost may be higher in Aurora, depending on the type of usage.
 </p>
 
 <a name='quick-start-instructions'></a>
 <h2>Quick Start Instructions</h2>
 
 <p>
 Download the GBiC program from the <a href='https://genome-store.ucsc.edu/' title=''>UCSC Genome Browser store</a>.
 </p>
 
 <p>
 Run the program as root, like this:
 </p>
 
 <pre><code>sudo bash browserSetup.sh install</code></pre>
 
 <p>
 The <code>install</code> command downloads and configures Apache, MySQL (MariaDB) and Ghostscript, copies the
 Genome Browser CGIs, and configures the mirror to load data remotely from UCSC. The <code>install</code>
 command must be run before any other command is used.
 </p>
 
 <p>
 For mirror-specific help, please contact the Mirror Forum as listed on our <a href='https://genome.ucsc.edu/contacts.html' title=''>contact page</a>.
 </p>
 
 <p>
 For an installation demonstration, see the <a href='https://www.youtube.com/watch?v=dcJERBVnjio' title=''>Genome Browser in the Cloud (GBiC) Introduction</a>
-video:
-</p>
-
-<p>
-
-<iframe width="560" height="315" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>
+video:<br>
 
-</iframe>
+<!--#if expr="${SERVER_NAME} = /-china/" -->
+    <a href='../../../videos/dcJERBVnjio.mp4'>
+    <img src=../../images/videoIcon.png></a>
+    &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
+    Visit our <a href="/videos/" target="_blank">Video Page</a>.
+<!--#else -->
+<iframe width="560" height="350" src="https://www.youtube.com/embed/dcJERBVnjio?rel=0" 
+frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" 
+allowfullscreen></iframe> </p>
 
+<p> Visit our <a href="https://www.youtube.com/channel/UCQnUJepyNOw0p8s2otX4RYQ/videos"
+target="_blank">YouTube channel</a> for more videos.
 </p>
+<!--#endif -->
+</p>
+
+<p>
 
 <a name='how-does-the-gbic-program-work'></a>
 <h2>How does the GBiC program work?</h2>
 
 <p>
 The GBiC program downloads the Genome Browser CGIs and sets up the central MySQL (MariaDB) database.
 All potentially destructive steps require confirmation by the user (unless the <code>-b</code> batch mode
 option is specified).
 </p>
 
 <p>
 In particular, MySQL (MariaDB) and Apache are installed and set up with the right package manager
 (yum or apt-get). A default random password is set for the MySQL (MariaDB) root user and added to
 the <code>~/.my.cnf</code> file of the Unix root account. If you have already set up MySQL (MariaDB), you must
 create the <code>~/.my.cnf</code> file. The program will detect this and create a template file for you. The
 program also performs some minor tasks such as placing symlinks, detecting MariaDB, deactivating
 SELinux, finding the correct path for your Apache install and adapting the MySQL (MariaDB) socket
 config.
 </p>
 
 <p>
 This will result in a Genome Browser accessible on localhost that loads its data through
 genome-mysql.soe.ucsc.edu:3306 and hgdownload.soe.ucsc.edu:80. If your geographic location is not on
 the US West Coast, the performance will be too slow for normal use, though sufficient to test that
 the setup is functional. A special MySQL (MariaDB) server is set up in Germany for users in Europe.
 You can change the <code>/usr/local/apache/cgi-bin/hg.conf</code> genome-mysql.soe.ucsc.edu lines to
 genome-euro-mysql.soe.ucsc.edu in order to get better performance. You can then use the program to
 download assemblies of interest to your local Genome Browser, which will result in performance at
 least as fast as the UCSC site.
 </p>
 
 <h3 id="network-requirements">Network requirements</h3>
 
 <p>
 Your network firewall must allow outgoing connections to the following servers and ports:
 </p>
 
 <ul>
 <li>MySQL (MariaDB) connections, used to load tracks not local to your computer:
 
 <ul>
 <li>US server: Port 3306 on genome-mysql.soe.ucsc.edu (128.114.119.174)</li>
 <li>European server: Port 3306 on genome-euro-mysql.soe.ucsc.edu (129.70.40.120)</li>
 </ul></li>
 <li>Rsync, used to download track data:
 
 <ul>
 <li>US server: TCP port 873 on hgdownload.soe.ucsc.edu (128.114.119.163)</li>
 <li>European server: TCP port 873 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li>
 </ul></li>
 <li>Download HTML descriptions on the fly:
 
 <ul>
 <li>US server: TCP port 80 on hgdownload.soe.ucsc.edu (128.114.119.163)</li>
 <li>European server: TCP port 80 on hgdownload-euro.soe.ucsc.edu (129.70.40.99)</li>
 </ul></li>
 </ul>
 
 <h3 id="root-file-system-too-small-for-all-data">Root file system too small for all data</h3>
 
 <p>
 If you need to move data to another partition because the root file system is too small for all of
 the assembly&#39;s data, the following steps will help complete the installation. First, do a
 minimal installation with the browserSetup.sh script as described below, using just the &ldquo;install&rdquo;
 argument. Then make symlinks to the directory that will contain the data, e.g. if your
 biggest filesystem is called &ldquo;/big&rdquo;:
 </p>
 
 <pre><code>sudo mv /var/lib/mysql /big/
 sudo mv /gbdb /big/
 sudo ln -s /big/mysql /var/lib/mysql
 sudo ln -s /big/gbdb /gbdb</code></pre>
 
 <p>
 Then use the &ldquo;mirror&rdquo; or &ldquo;minimal&rdquo; arguments to browserSetup.sh to rsync over the majority of
 the data.
 </p>
 
 <a name='gbic-commands'></a>
 <h2>GBiC Commands</h2>
 
 <p>
 The first argument of the program is called <code>command</code> in the following section of this document.
 The first command that you will need is <code>install</code>, which installs the Genome Browser dependencies,
 binary files and basic MySQL (MariaDB) infrastructure:
 </p>
 
 <pre><code>sudo bash browserSetup.sh install</code></pre>
 
 <p>
 There are a number of options supported by the GBiC program. In all cases, options must
 be specified before the command.
 </p>
 
 <p>
 The following example correctly specifies the batch mode option to the program:
 </p>
 
 <pre><code>sudo bash browserSetup.sh -b install</code></pre>
 
 <p>
 To improve the performance of your Genome Browser, the program accepts the command
 <code>minimal</code>. It will download the minimal tables required for reasonable
 performance from places in the US and possibly others, e.g., from
 Japan. Call it like this to trade space for performance and download a few
 of the most used MariaDB tables for hg38:
 </p>
 
 <pre><code>sudo bash browserSetup.sh minimal hg38</code></pre>
 
 <p>
 If the Genome Browser is still too slow, you will have to mirror all tables of a
 genome assembly. By default, rsync is used for the download. Alternatively you can use
 UDR, a UDP-based fast transfer protocol (option: <code>-u</code>).
 </p>
 
 <pre><code>sudo bash browserSetup.sh -u mirror hg38</code></pre>
 
 <p>
 A successful run of <code>mirror</code> will also cut the connection to UCSC: no tables
 or files are downloaded on-the-fly anymore from the UCSC servers. To change
 the remote on-the-fly loading, specify the option <code>-o</code> (offline) or
 <code>-f</code> (on-the-fly). If you are planning to keep sensitive data on your mirror,
 you will want to disable on-the-fly loading, like so:
 </p>
 
 <pre><code>sudo bash browserSetup.sh -o</code></pre>
 
 <p>
 If you notice that BLAT no longer works for your assembly while in offline-mode, you may have to
 update your GBiC mirror to restore functionality. Patch sequences are often added to assemblies
 which may result in errors if your CGIs and 2bit files have not been updated to the latest version.
 </p>
 
 <p>
 The full assembly download for hg19 is &gt;7TB. Limit this
 to 2TB or less with the <code>-t</code> option:
 </p>
 
 <pre><code>sudo bash browserSetup.sh -t noEncode mirror hg19</code></pre>
 
 <p>
 For a full list of <code>-t</code> options, see the <a href='#all-gbic-options' title=''>All GBiC options</a> section or run the
 program with no arguments.
 </p>
 
 <p>
 To update all CGIs and fully mirrored assemblies, call the
 tool with the <code>update</code> parameter like this:
 </p>
 
 <pre><code>sudo bash browserSetup.sh update</code></pre>
 
 <p>
 Minimal mirror sites (those that have partially mirrored an assembly) should not
 use the <code>update</code> command, but rather just rerun the <code>minimal</code> command, so that only the minimal
 tables are updated. For instance, if you have partially mirrored the hg19 and hg38 databases,
 you may want to add this command to your crontab, perhaps running it every day, to keep your local
 tables in sync with those at UCSC:
 </p>
 
 <pre><code>sudo bash browserSetup.sh minimal hg19 hg38</code></pre>
 
 <p>
 To update only the Genome Browser software and not the data, use the
 <code>cgiUpdate</code> command:
 </p>
 
 <pre><code>sudo bash browserSetup.sh cgiUpdate</code></pre>
 
 <p>
 Software may break or not work correctly if the necessary data is not available.
 Thus in most circumstances, we recommend you use the <code>mirror</code>, <code>update</code>, or <code>minimal</code> commands
 instead of <code>cgiUpdate</code>.
 </p>
 
 <p>
 You will also want to add a cleaning command to your crontab to remove
 the temporary files that are created during normal Genome Browser usage. These accumulate
 in <code>/usr/local/apache/trash</code> and can quickly consume significant space. A command like
 this should be added to your crontab file:
 </p>
 
 <pre><code>sudo bash browserSetup.sh clean</code></pre>
 
 <p>
 If you find that you need the Kent command line utilities in addition to the Genome Browser, the
 <code>addTools</code> command will install all the utilities into <code>/usr/local/bin</code>:
 </p>
 
 <pre><code>sudo bash browserSetup.sh addTools</code></pre>
 
 <p>
 A majority of these utilities require an <code>.hg.conf</code> file in the users home directory. For
 an example of a minimal <code>.hg.conf</code> file, click
 <a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/blob/master/src/product/minimal.hg.conf' title=''>here</a>.
 </p>
 
 <p>
 If you find a bug, or if your Linux distribution is not supported, please contact
 <a href='mailto:&#58&#103&#101&#110&#111&#109&#101&#45&#109&#105&#114&#114&#111&#114&#64&#115&#111&#101&#46&#117&#99&#115&#99&#46&#101&#100&#117' title=''>&#103&#101&#110&#111&#109&#101&#45&#109&#105&#114&#114&#111&#114&#64&#115&#111&#101&#46&#117&#99&#115&#99&#46&#101&#100&#117</a>.
 </p>
 
 <p>
 More details about the Genome Browser installation are available
 <a href='http://genome-source.soe.ucsc.edu/gitlist/kent.git/tree/master/src/product' title=''>here</a>.
 </p>
 
 <a name='all-gbic-options'></a>
 <h2>All GBiC options</h2>
 
 <p>
 Here is the full listing of commands and options supported by the GBiC program:
 </p>
 
 <pre><code>browserSetup.sh [options] [command] [assemblyList] - UCSC genome browser install script
 
 command is one of:
   install    - install the genome browser on this machine. This is usually 
                required before any other commands are run.
   minimal    - download only a minimal set of tables. Missing tables are
                downloaded on-the-fly from UCSC.
   mirror     - download a full assembly (also see the -t option below).
                No data is downloaded on-the-fly from UCSC.
   update     - update the genome browser software and data, updates
                all tables of an assembly, like &quot;mirror&quot;
   cgiUpdate  - update only the genome browser software, not the data. Not 
                recommended, see documentation.
   clean      - remove temporary files of the genome browser older than one 
                day, but do not delete any uploaded custom tracks
   addTools   - copy the UCSC User Tools, e.g. blat, featureBits, overlapSelect,
                bedToBigBed, pslCDnaFilter, twoBitToFa, gff3ToGenePred, 
                bedSort, ... to /usr/local/bin
 
 parameters for &#39;minimal&#39;, &#39;mirror&#39; and &#39;update&#39;:
   &lt;assemblyList&gt;     - download MySQL + /gbdb files for a space-separated
                        list of genomes
 
 examples:
   bash browserSetup.sh install     - install Genome Browser, do not download any genome
                         assembly, switch to on-the-fly mode (see the -f option)
   bash browserSetup.sh minimal hg19 - download only the minimal tables for the hg19 assembly
   bash browserSetup.sh mirror hg19 mm9 - download hg19 and mm9, switch
                         to offline mode (see the -o option)
   bash browserSetup.sh mirror -t noEncode hg19  - install Genome Browser, download hg19 
                         but no ENCODE tables and switch to offline mode 
                         (see the -o option)
   bash browserSetup.sh update hg19 -  update all data and all tables of the hg19 assembly
                          (in total 7TB)
   bash browserSetup.sh cgiUpdate   - update the Genome Browser CGI programs
   bash browserSetup.sh clean       -  remove temporary files older than one day
 
 All options have to precede the command.
 
 options:
   -a   - use alternative download server at SDSC
   -b   - batch mode, do not prompt for key presses
   -t   - only download track selection, requires a value.
          This option is only useful for Human/Mouse assemblies.
          Download only certain tracks, possible values:
          noEncode = do not download any tables with the wgEncode prefix, 
                     except Gencode genes, saves 4TB/7TB for hg19
          bestEncode = our ENCODE recommendation, all summary tracks, saves
                     2TB/7TB for hg19
          main = only RefSeq/Gencode genes and common SNPs, total 5GB for hg19
   -u   - use UDR (fast UDP) file transfers for the download.
          Requires at least one open UDP incoming port 9000-9100.
          (UDR is not available for Mac OSX)
          This option will download a udr binary to /usr/local/bin
   -o   - switch to offline-mode. Remove all statements from hg.conf that allow
          loading data on-the-fly from the UCSC download server. Requires that
          you have downloaded at least one assembly, using the &#39;&quot;download&quot;&#39; 
          command, not the &#39;&quot;mirror&quot;&#39; command.
   -f   - switch to on-the-fly mode. Change hg.conf to allow loading data
          through the internet, if it is not available locally. The default mode
          unless an assembly has been provided during install
   -h   - this help message</code></pre>
 
 <a name='credits'></a>
 <h2>Credits</h2>
 
 <ul>
 <li>Max Haeussler for writing the program.</li>
 <li>Christopher Lee for testing and QA.</li>
 <li>Daniel Vera (bio.fsu.edu) for his RHEL install notes.</li>
 <li>Bruce O&#39;Neill, Malcolm Cook for feedback.</li>
 </ul>
 
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->