0f7434df22a1a4481eb046a635ae26bcee370c41 max Wed Jan 21 01:40:23 2026 -0800 small update to mirror docs page, no redmine diff --git src/product/mirrorManual.txt src/product/mirrorManual.txt index c0943166c9f..4ab7bdcbd18 100644 --- src/product/mirrorManual.txt +++ src/product/mirrorManual.txt @@ -1,19 +1,26 @@ % Manual installation of the UCSC Genome Browser on a Unix server [comment]: <> (QA: When you are done editing this file, cd into mirrorDocs, run 'make' there and follow the instructions) +We do not recommend following the procedure below anymore, as we provide a docker image and +the "GBIC" installation bash script. They can setup a genome browser with +a single linux command line and within a few minutes. This page exists as a +reference, explains how the installer is structured and helps changing the +configuration files once your genome browser is running. If you run into +trouble with any of these steps, do not hesitate to contact us so we can improve this page. + # Overview of the Genome Browser directories and databases The genome browser requires only Apache and MariaDB and uses these directories: - static html files: we typically keep them under /usr/local/apache/htdocs and configure Apache to load them from there, to avoid conflicts with the distribution of the Linux default location /var/www/html - MariaDB databases: most of them are read-only, except the `hgcentral` database which is read-write. Most linux distributions keep these under /var/lib/mysql. (It is possible to get the genome browser to work with MySQL after version 8, but we do highly discourage it, as our download procedures use MyISAM .frm files which MySQL 8 dropped.) - static genome data files in /gbdb/ - binary CGI programs that generate images from the MariaDB and /gbdb files and write them into the `trash` directory (see below). We modify our Apache @@ -36,31 +43,33 @@ installed genome assemblies and the current user session from the MariaDB database hgcentral. For each genome assembly, there is a separate MariaDB database (e.g. hg38). Some types of data (e.g. raw genome sequences) are kept as indexed binary files outside of MariaDB, they are located in /gbdb, e.g. /gbdb/hg38. The location of the /gbdb directory can be changed with a setting in hg.conf. Some types of data are not specific for a genome, these are kept in the MariaDB databases hgFixed, proteome and visiGene. We strongly recommend to follow the default locations, and to place our CGI programs in `/usr/local/apache/cgi-bin`. The htdocs root directory for html files should then be in /usr/local/apache/htdocs. All Genome Browser components called from Apache get their settings from the central configuration file `/usr/local/apache/cgi-bin/hg.conf`. Among others, the location and the username/password for the MariaDB server is specified there. -To load data into the genome browser databases, you need a command line tool +Today, most tracks are loaded from bigBed or bigWig files, and are referenced from the +track configuration (trackDb) via a "bigDataUrl" statement. For legacy MariaDB tracks, +to load data into the genome browser databases, you need a command line tool like hgLoadBed. These tools are distributed separately from the CGI programs. Some tools create only MariaDB tables, others write into a /gbdb subdirectory. Most of them require a configuration file ~/.hg.conf in your home directory with the MariaDB connection information, like server name, username and password. The data loading is done from the Unix command line and not dependent on the CGI programs that create the Genome Browser graphics. # Software Requirements To run our provided binaries: * Linux/Ubuntu/CentOS/Unix/MacOSX operating system * Apache2.x - http web server - * MariaDB development system and libraries - (MySQL 8 removed support for MyISAM schema files, which makes downloading @@ -74,79 +83,77 @@ * gnu gcc - C code development system - * gnu make - Optional: * 'ghostscript' ps to pdf converter - * 'git' source code management: * 'gmt' map plotting tools * 'pstack' for stack traces * 'R' for the GTex track * 'python-mysqldb' for the gene interactions track (python2) It is best to install these packages with your standard operating system package management tools: -* Debian/Ubuntu: `apt-get install gcc make ghostscript apache2 mariadb-server gmt r-base uuid-dev libcurl4-openssl-dev libbz2-dev g++` +* Debian/Ubuntu: `apt-get install gcc make ghostscript apache2 mariadb-server gmt r-base uuid-dev libcurl4-openssl-dev libbz2-dev g++ libmariadb-dev libmariadb-dev` * Redhat/Fedora/CentOS: `yum install gcc libpng12 httpd ghostscript GMT hdf5 R libuuid-devel libcurl-devel bzip2-devel gcc-c++` - -On newer distributions, python-mysqldb / MySQL-python is not available anymore. -In this case, install python2, pip for it and then use pip to install the mysql -library ("pip2 install MySQL-python"). See the file -installer/browserSetup.sh for the commands. +The gcc, *-dev or *-devel packages are only needed if you want to build binaries +yourself, but will not take up more than a few hundred megabytes and give you more +options in case you want to modify the C code later. # Hardware and disk space requirements We currently use the following hardware to support our website: * 24 CPUs and 128Gb of memory for each of the six machines * 16 CPUs, 64 Gb of memory for the MariaDB server The UCSC Genome Browser website experiences over one million hits per day. Your hardware requirements may be much less demanding and will depend upon how much traffic you expect for your mirror. Annotation database size differs a lot between the assemblies: The full size -of the hg19 database in 2016 is 6 TB, for ce2 it is 5GB. It also depends on +of the hg38 database in 2026 is 8 TB, for ce2 it is 5GB. It also depends on the tracks: The size of the hg19 annotations can be reduced to 2TB if you do not download any ENCODE tracks. The size of only the main gene and SNP annotations is around 5GB for hg19 and hg38. You can use the following command to get the size of the files for all of the assemblies, but it can also be modified to give the size for a particular assembly: rsync -hna --stats rsync://hgdownload.soe.ucsc.edu/gbdb/ | egrep "Number of files:|total size is" For example, to get the size of all of the files for hg19, you would use the following command: rsync -hna --stats rsync://hgdownload.soe.ucsc.edu/gbdb/hg19/ | egrep "Number of files:|total size is" After running that command, you should see output like this: Number of files: 54886 total size is 6515.70G speedup is 5181080.38 (DRY RUN) The next command will give you the size of the entire mySQL/MariaDB database, but can be changed to get the size for a particular assembly: rsync -hna --stats rsync://hgdownload.soe.ucsc.edu/mysql/ | egrep "Number of files:|total size is" # Installing the UCSC Genome browser -**Note:** We offer Genome-Browser-in-the-Cloud (GBIC), an +**Note:** We offer Genome-Browser-in-the-Cloud (GBIC), a shell script that installs a genome browser in most main Linux distributions (Most Debian and Redhat-based ones, like Ubuntu and CentOS). GBIC is also available as a [dockerfile](/goldenPath/help/docker.html). See our [mirror page](https://genome.ucsc.edu/goldenPath/help/mirror.html) for more general information. Scripts to perform all of the functions below can be found in the directory . In a git clone of the kent repository, the scripts are located in src/product/scripts. Confirm the following: 1. Apache web server is installed and working, http://localhost/ provides the Apache default home page from your machine NOTE: The browser static html web pages require the Apache