d425930659b77acd38c14eaefda720d52da1ddc1 hiram Mon Dec 15 16:28:32 2025 -0800 more detail on these push procedures diff --git src/hg/utils/otto/genArk/README src/hg/utils/otto/genArk/README index dea3ca14a29..7b26a5c343e 100644 --- src/hg/utils/otto/genArk/README +++ src/hg/utils/otto/genArk/README @@ -8,15 +8,132 @@ ################################################################### Also using the assemblyList.py script from kent/src/hg/hubApi/assemblyList.py # scripts used to push out the /gbdb/genark/ hierarchy # from hgwdev to our RR sites, and the pullHgwdev.sh is # running in qateam cron job on the Asia node pullHgwdev.sh pushRR.sh ### manages the pushing of the beta and public versions of 'contrib' ### tracks in genark assemblies alphaBetaPush.pl + +################################################################### +# operation procedure +################################################################### + +Listings of files are made on hgwdev, hgwbeta and hgw1 in order to +determine what needs to be pushed out. It is done with these listings +instead of allowing rsync to simply push everything because there is +a staged alpha, beta, public release procedure that pushes out different +hub.txt files to hgwbeta and hgw1, and different 'contrib/' directories +in the GenArk hubs. + +1. the script devList.sh is running as an otto cron job on hgwdev: + 58 18 * * * /hive/data/inside/GenArk/pushRR/devList.sh + which runs on hgwdev and constructs listings of files with + their timestamps in: + /gbdb/GCA and /gbdb/GCF + sending the listings to an archive logs directory: + /hive/data/inside/GenArk/pushRR/logs/${Y}/${M}/ + and also a 'daily' list to be used by push scripts: + /hive/data/inside/GenArk/pushRR/dev.todayList.gz + + It also makes listings of files with timestamps in /gbdb/*/quickLift/ and + /gbdb/*/liftOver/ placing the results into the logs/ directory + and also the daily listings: + /hive/data/inside/GenArk/pushRR/dev.today.quickLiftList.gz + /hive/data/inside/GenArk/pushRR/dev.today.liftOverList.gz + +2. The same type of script is also running on all the RR machines, + sending their listings back to the otto logs directory: + /hive/data/inside/GenArk/pushRR/logs/${Y}/${M}/ + and on hgwbeta and hgw1 it also sends the listings back to the + otto files: + /hive/data/inside/GenArk/pushRR/${machName}.today.quickLiftList.gz + /hive/data/inside/GenArk/pushRR/${machName}.today.liftOverList.gz + to be compared to the lists made by the job on hgwdev to see what + might need to go out. + +3. As those listings of files are made, the primary push script runs + as the otto user cronjob: + + 03 01 * * * /hive/data/inside/GenArk/pushRR/pushRR.sh + + It is running two scripts: + pushNewOnes.sh + quickPush.pl + +4. the pushNewOnes.sh script runs: + + whatIsNew.sh + this is doing the joins between the listings on hgwdev + with the hgwbeta list to determine what files may be new + or updated between hgwdev and hgwbeta for the /gbdb/genark/ + hierarchy. These joins are done + while avoiding any hub.txt files or any contrib/ directories + in the assemblies, since those items are special and under control + of other operations. Listings made: + new.files.ready.to.beta.txt + new.beta.timeStamps.txt + It also puts together the listing: + rsync.gbdb.toRR.fileList.txt + which is used by cluster admin for a push list of files from hgwbeta + to the RR machines avoiding the hub.txt files and the contrib/ + directories. + + This script also runs: + quickLiftNew.sh + liftOverNew.sh + which is doing the same type of listing comparisons, but just + for /gbdb/*/liftOver/ and /gbdb/*/quickLift/ directories. + They make listings: + new.quickLift.ready.to.beta.txt + new.liftOver.ready.to.go.txt + beta.quickLift.timeStamps.txt + new.liftOver.timeStamps.txt + and adding to the cluster admin push list: + rsync.gbdb.toRR.fileList.txt + + + + pushNewOnes.sh uses the dev.todayList.gz and hgwbeta.todayList.gz lists + to push out any new assembly directories in /gbdb/genark/GCx/... + This push avoids any hub.txt files or any contrib/ directories + since those are under special control elsewhere. + It next uses the listing "new.files.ready.to.beta.txt" + to push out any new or updated files for existing browsers + for /gbdb/genark/ from hgwdev to hgwbeta + It uses the listing "new.quickLift.ready.to.beta.txt" to + push any new /gbdb/*/quickLift/ files to hgwbeta from hgwdev + It uses the listing "new.beta.timeStamps.txt" to send out + any updated files for assemblies in /gbdb/genark/... + And finally, the list: "beta.quickLift.timeStamps.txt" + to send any undated files from /gbdb/*/quickLift directories + from hgwdev to hgwbeta + +4. the quickPush.sh script is going to do the special businss + of getting the appropriate hub.txt and contrib/ directories + pushed out. It uses the source tree files: + kent/src/hg/makeDb/trackDb/betaGenArk.txt + kent/src/hg/makeDb/trackDb/publicGenArk.txt + to find out what 'contrib' tracks are destined for + either hgwbeta or out to the RR. It scans the + dev.todayList.gz listing for contrib directories or + hub.txt files: zegrep '/contrib/|hub.txt' dev.todayList.gz" + For 'contrib' track names in the betaGenArk it gets those + contrib/ directories out to hgwbeta along with their beta.hub.txt + file to become the 'hub.txt' file on hgwbeta. For the RR + push it uses the publicGenArk list and gets the designated + contrib/ directories out to 'hgw0' only, and their public.hub.txt + file to 'hgw0' only. The cluster admin rsync systems are responsible + for getting the 'hgw0' content out to all the other RR systems For 'contrib' track names in the betaGenArk it gets those + contrib/ directories out to hgwbeta along with their beta.hub.txt + file to become the 'hub.txt' file on hgwbeta. For the RR + push it uses the publicGenArk list and gets the designated + contrib/ directories out to 'hgw0' only, and their public.hub.txt + file to 'hgw0' only. The cluster admin rsync systems are responsible + for getting the 'hgw0' content out to all the other RR systems.