src/hg/makeDb/doc/asmHubs/README.txt 6d9006164b9b861ba1d199c9c81b6ea40ababc1c

6d9006164b9b861ba1d199c9c81b6ea40ababc1c
hiram
  Tue Oct 4 12:26:41 2022 -0700
update instructions to include how to add locally developed tracks to the GenArk hub

diff --git src/hg/makeDb/doc/asmHubs/README.txt src/hg/makeDb/doc/asmHubs/README.txt
index d7c7b3c..db8026f 100644
--- src/hg/makeDb/doc/asmHubs/README.txt
+++ src/hg/makeDb/doc/asmHubs/README.txt
@@ -1,128 +1,202 @@
 #############################################################################
 ### Building the assembly hubs ###
 #############################################################################
+### see below for adding custom/local developed tracks to an existing GenArk hub
+#############################################################################
 
 The build of each assembly takes place in, for example:
 
   /hive/data/genomes/asmHubs/refseqBuild/GCF/000/001/405/GCF_000001405.39_GRCh38.p13/
 
 (There is a corresponding hierarchy for 'genbank' GCA assemblies, i.e.:
 
 /hive/data/genomes/asmHubs/genbankBuild/GCA/902/686/455/GCA_902686455.1_mSciVul1.1
 
 )
 
 I have a 'goto' function in my shell, you can view at:
   ~hiram/.bashrc.hiram
 which I use to move around in this spread out hierarchy.  For example:
 
   $ goto GCF_000001405
 
 will get you to that build directory (when there is only one GCF_000001405)
 
 You should construct any new files in this directory hierarchy.  Maybe
 a subdirectory here if you have a whole category of files,  Note
 subdirectories already here: bbi html ixIxx for example.
 
 (download, sequence, idKeys, trackData are directories for data construction
   during the build)
 
 To deliver files from this build to hgdownload, scripts in:
    ~/kent/src/hg/makeDb/doc/asmHubs/
 construct symlinks from the build directory into the delivery staging directory hierarchy, for example:
    /hive/data/genomes/asmHubs/GCF/000/001/405/GCF_000001405.39/
 
 Nothing but symlinks here and just the deliver files for hgdownload:
   ls -ogLd *
 -rw-rw-r-- 1  945974069 Sep 10  2019 GCF_000001405.39.2bit
 -rw-rw-r-- 1     888417 Sep 10  2019 GCF_000001405.39.agp.gz
 -rw-rw-r-- 1      18097 Sep 10  2019 GCF_000001405.39.chrom.sizes.txt
 -rw-rw-r-- 1      29424 Sep 25 11:58 GCF_000001405.39.chromAlias.txt
 -rw-rw-r-- 1 2915673072 Jul 16 15:16 GCF_000001405.39.trans.gfidx
 -rw-rw-r-- 1 2262217236 Jul 16 15:19 GCF_000001405.39.untrans.gfidx
 drwxrwxr-x 2       4096 Sep 23 15:33 bbi
 -rw-rw-r-- 1        354 Dec  1 12:16 genomes.txt
 -rw-rw-r-- 1        508 Dec  1 12:16 groups.txt
 drwxrwxr-x 2       4096 Dec  1 12:16 html
 -rw-rw-r-- 1        240 Dec  1 12:16 hub.txt
 drwxrwxr-x 2       4096 Sep 23 15:33 ixIxx
 -rw-rw-r-- 1       9910 Sep 23 15:33 trackDb.txt
 
 Note how the names become shorter here, losing the full assembly identifier.
 Don't need that.  There should be only one 'GCF_000001405.39' assembly.
 NCBI has made a couple of mistakes and these names became duplicated for
 a couple of assemblies.  Don't care about that.  Eliminated the garbage.
 
 So, to add the construction of the deliver symlinks for your new files,
 you would add something to: ~/kent/src/hg/makeDb/doc/asmHubs/mkSymLinks.pl
 This is assuming you do want to deliver these files to hgdownload.  I would
 guess you would since external users that want to copy this assembly can copy
 this directory from hgdownload to get everything they need to operate it
 independently from us.
 
 You don't operate the scripts in .../makeDb/doc/asmHubs/ by themselves.
 The are used from makefile rules in each assembly hub definition directory.
 For example, for the primates, in the directory:
    kent/src/hg/makeDb/doc/primatesAsmHub/
 you just type 'make' and it does everything to get these items ready for
 delivery.  This is what makes the symLinks and all other files to make
 the assembly hub function.
 
 (This is *not* the build of the files in the build hierarchy, see below)
 
 Other hub dirctories here in makeDb/doc/
 
 primatesAsmHub
 mammalsAsmHub
 birdsAsmHub
 fishAsmHub
 vertebrateAsmHub
 legacyAsmHub
 plantsAsmHub
 bacteriaAsmHub
 vgpAsmHub
 
 Future work will create:
 
 fungiAsmHub
 invertebrateAsmHub
 viralAsmHub
 protozoaAsmHub
 bacteriaAsmHub
 archaeaAsmHub
 
 #############################################################################
 ### To run up a build of an assembly ###
 #############################################################################
 
 The actual build is taking place with the help of the 'runBuild'
 script (copy here in ~/kent/src/hg/makeDb/doc/asmHubs/runBuild)
 
 The builds are operated from the directory:
 
    /hive/data/genomes/asmHubs/allBuild/
    (a location to accumulate log files, and run lists, thus work history)
 
 The 'runBuild' is operated, for example, a single assembly:
 
   time (./runBuild GCF_000001405.39 GCF_000001405.39_GRCh38.p13 vertebrate_mammalian Homo_sapiens) >> GCF_000001405.39.log 2>&1 &
 
 Or, typically, there may be a whole list of such commands
 ( such as in the master.run.list here:
     ~/kent/src/hg/makeDb/doc/asmHubs/master.run.list
 )
 
 These are run, for example 5 at a time:
   time (kent/src/hg/utils/automation/perlPara.pl 5 master.run.list) \
      >> bigRun.log 2>&1
 
 The 'runBuild' script is usually set up to run all steps from
 'download' to 'trackDb', and it is OK to use it like this even on
 a build that has already taken place (currently it is disabled to
 avoid trying to rebuild an assembly).  There are cases, for example,
 where I want to update all the trackDb files since something has
 been improved for trackDb, in which case I adjust the
 stepStart and stepEnd to run just the trackDb step.  (would have
 to disable the rebuild prevention)
 
 #############################################################################
+### adding custom/local developed tracks to a GenArk hub
+#############################################################################
+
+Work in the trackData/ directory of the assembly hub in a directory
+name of the track, think of this as your /hive/data/genomes/<db>/bed/myTrack/
+usual work directory as if it were a database assembly.
+
+For example, the extra pcrAmplicon track on the Monkeypox browser
+GCF_000857045.1_ViralProj15142
+
+Is developed in:
+/hive/data/genomes/asmHubs/refseqBuild/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/trackData/pcrAmplicon/
+
+When your data is ready, add your big* files, ixIxx and html page description
+files to the browser with symLinks in the bbi, ixIxx and html directories:
+
+/hive/data/genomes/asmHubs/refseqBuild/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/bbi/
+and
+/hive/data/genomes/asmHubs/refseqBuild/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/ixIxx/
+/hive/data/genomes/asmHubs/refseqBuild/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/html/
+
+To get your track added to the GenArk hub, place your trackDb.txt
+definitions in the special named file: <asmId>.userTrackDb.txt
+in the top-level build directory:
+/hive/data/genomes/asmHubs/refseqBuild/GCF/014/621/545/GCF_014621545.1_ASM1462154v1/
+for example: GCF_014621545.1_ASM1462154v1.userTrackDb.txt
+
+Your track will push out to hgdownload with this GenArk hub the next time
+the build is run for the clade this organism is packaged in.
+
+Typical 'build' sequence to do the release of a clade set:
+
+  cd ~/kent/src/hg/makeDb/doc/viralAsmHub
+  # builds symLinks for delivery staging directory, constructs index pages
+  # for this clade set, makes everything available on genome-test
+  time (make) >> dbg 2>&1
+  # when finished, examine the dbg file to see if there are any errors reported
+  # by the scripts.  Then, verify it is looking good in the staging
+  # directory on genome-test:
+  time (make verifyTestDownload) >> test.down.log 2>&1
+  # this testing is performed by the API on hgwdev.
+  # this test.down.log file accumulates each time a build is run, to make sure
+  # it is sane and there are no errors, grep for 'checked' to see lines such as:
+
+  grep checked test.down.log
+# checked 221 hubs, 221 success, 0 fail, total tracks: 4720, 2022-09-25 14:58:55
+# checked 222 hubs, 222 success, 0 fail, total tracks: 4740, 2022-10-04 11:55:28
+
+  # if you wanted to view this clade set on genome-test to see what it
+  # looks like, the URL is:  https://genome-test.gi.ucsc.edu/hubs/viral/
+  # each clade has a different directory here:
+  #  primates mammals birds fish vertebrate plants fungi viral bacteria archaea
+
+  # if it looks good on genome-test and verifyTestDownload runs without errors,
+  # the hub can push to hgdownload:
+
+  time (make sendDownload) >> send.down.log 2>&1
+
+  # there isn't much to see in this send.down.log, it is just for the record
+  # then to verify it is correct on hgdownload:
+
+  time (make verifyDownload) >> verify.down.log 2>&1 &
+  # this testing runs via the API on hgwbeta so that the access
+  # activity logs on the RR won't be disturbed by such testing.
+
+  # to see if it is sane, grep for 'checked' in this log file:
+  grep checked verify.down.log
+
+# checked 221 hubs, 221 success, 0 fail, total tracks: 4720, 2022-09-25 19:42:48
+# checked 222 hubs, 222 success, 0 fail, total tracks: 4740, 2022-10-04 12:23:35
+
+#############################################################################