62b1428915cc2c8b9acbf46a86c1248833257efe mspeir Mon Nov 17 18:50:32 2025 -0800 changing rest of genome-source references to github, refs #34485 diff --git src/product/scripts/README src/product/scripts/README index f46028f0c0c..4dacd3eb9c0 100644 --- src/product/scripts/README +++ src/product/scripts/README @@ -1,150 +1,150 @@ # # This file can be viewed at the following URL: -# http://genome-source.soe.ucsc.edu/gitlist/kent.git/raw/master/src/product/scripts/README +# http://github.com/ucscGenomeBrowser/kent/raw/master/src/product/scripts/README # Included here are some scripts to help with the job of constructing and maintaining a mirror site for the UCSC Genome Browser. Copy this directory contents to a work directory of your choice outside of the kent source tree so your scripts will remain stable despite source tree updates. These scripts are helpful and useful, but they are not the entire story. You need to understand what they are doing to utilize them appropriately. They expect that commands for rsync, git, and mysql, for example, are installed on your system. Instructions for the installation of MySQL and the Apache WEB server are not included here. Those products have excellent How-To resources on the internet. MySQL: http://dev.mysql.com/doc/refman/5.1/en/index.html Apache: http://httpd.apache.org/ You must copy and edit the specification file: browserEnvironment.txt to determine where the various objects are going to exist on your system. The examples in that file are sending files to /scratch/tmp/ which is probably not very useful. Please copy this directory to a directory of your choice outside of the source tree and edit the browserEnvironment.txt file. The pathname to that file will be used with these scripts so they understand how to function in your environment. What is the difference between "full" and "minimal" in the scripts ? A "minimal" set of files for a given UCSC genome database are sufficient to get a genome browser up and running with no extra tracks. This genome browser will have no annotations. The "full" set of files for a given genome browser are all files required to operate that genome browser with all annotation tracks available. The sizes of the larger databases are becoming more that 1 Tb. The update race condition ========================================== This problem can be avoided if you rsync the MySQL binary data files directly into your MySQL database. There isn't a script provided here to do this since it is just a periodic rsync. That can be a simple several line shell script. For example, to fetch a minimal set of tables for the hg19 browser directly into the MySQL database: for TBL in cytoBand cytoBandIdeo chromInfo gold gap grp trackDb hgFindSpec do for EXT in frm MYD MYI do rsync -a -P rsync://hgdownload.soe.ucsc.edu/mysql/hg19/${TBL}.${EXT} \ /var/mysql/hg19/ done done A database such as hg18 with individual tables for each chromosome would have an extra rsync to help pick up those tables: for TBL in cytoBand cytoBandIdeo chromInfo gold gap grp trackDb hgFindSpec do for EXT in frm MYD MYI do rsync -a -P rsync://hgdownload.soe.ucsc.edu/mysql/hg19/${TBL}.${EXT} \ /var/mysql/hg19/ rsync -a -P rsync://hgdownload.soe.ucsc.edu/mysql/hg19/chr*_${TBL}.${EXT} \ /var/mysql/hg19/ done done Please note, as of November 2011, there is a second rsync server you can use in place of the hgdownload mentioned above. Depending upon your internet location, the second server may provide better transfer speed. To use the second rsync server, use the name hgdownload-sd in place of the hgdownload in the above commands. If you instead need to fetch the goldenPath database text dumps and load them, please note the following discussion. There is a tricky coordination problem that must be considered. The updates of the MySQL text dumps in goldenPath/*/database/ are uncoordinated with when your database tables were last loaded. An attempt to coordinate this situation is included with the tagGoldenPath.sh script and the loadUpdateGoldenPath.sh script. If you know that all the databases are loaded completely from your existing set of goldenPath/*/database/ file, you can use the tagGoldenPath.sh script to tag those database/ directories with a lastTimeStamp of the newest file existing there. When an update rsync takes place, any new files will be newer than that lastTimeStamp. The loadUpdateGoldenPath.sh script will load or reload only the new files since the lastTimeStamp. If that load is successful, then the tagGoldenPath.sh can be run and a new lastTimeStamp will be established. As long as the loading and reloading are successful, there will be no difficulty with this coordination. In the worst case, if database tables and download dumps are unknown to be in sync or not, a full reload of the database should be undertaken. A full reload of a database can consume a number of hours. Brief description of these scripts. Run them with no arguments to view their full help messages: 1. activeDbList.sh - using MySQL commands to the public UCSC MySQL server, will fetch a list of active databases. Lists of databases can be used with some of the fetch and load scripts. 2. minimal.db.list.txt - a list of active databases as of mid-March 2010 3. kentSrcUpdate.sh - fetches and/or updates a local copy of the 'kent' source tree and builds that source tree with resulting binaries into directories of your specification in your copy of the browserEnvironment.txt file. Can be run as a cron job about every two weeks to keep your source tree, CGI binaries, and kent utilities up to date. 4. fetchHgCentral.sh - fetches a copy of the hgcentral.sql definition of the hgcentral database 5. updateHtml.sh - rsync fetch the static HTML hierarchy of the UCSC Genome Browser into your specified documentRoot 6. fetchMinimalGbdb.sh - rsync fetch of a minimal set of files in the /gbdb/ hierarchy to operate a genome browser with the smallest set of files 7. fetchMinimalGoldenPath.sh - rsync fetch of a minimal set of files in the goldenPath/ hierarchy to be used to load a genome datase 8. loadDb.sh - can perform an initial load of a database from the goldenPath/ database MySQL text dumps 9. fetchFullGbdb.sh - rsync fetch of all files for a given database into the local /gbdb/ hierarchy 10. fetchFullGoldenPath.sh - rsync fetch of all files for a given database into the local goldenPath/ database MySQL text dump directory 11. tagGoldenPath.sh - after everything is loaded successfully in goldenPath/*/database this script will touch a lastUpdate file to be used by loadUpdateGoldenPath.sh 12. loadUpdateGoldenPath.sh - after goldenPath/*/database download are updated, this script will load any new tables since the last timeStamp 13. dbTrashCleaner.sh - run as a cron job periodically to clean database tables in the customTrash database 14. trashCleaning.sh - run as a cron job periodically to clean files in your genome browser .../trash/ directory. 15. allDatabaseUpdate.sh - this attempts to put all db updates together into one script. It runs the four different fetch scripts and the database update script. It needs to have lists of databases to update minimally or fully. The logs of update activities need to be checked to verify it is working OK. 16. printEnv.pl - script to verify your cgi-bin directory is functioning correctly. Note comments in that script. ==================================================================== This file last updated: 2010/10/27 10:51:59 ====================================================================