b7f8218ec50729e9114674e07ac351f6c5a5a16b
hiram
  Tue Mar 19 10:08:46 2024 -0700
updated instructions with refinements for genark assembly request handling refs #29545

diff --git src/hg/makeDb/doc/asmHubs/README.txt src/hg/makeDb/doc/asmHubs/README.txt
index 5b2b399..19c4476 100644
--- src/hg/makeDb/doc/asmHubs/README.txt
+++ src/hg/makeDb/doc/asmHubs/README.txt
@@ -1,31 +1,75 @@
 #############################################################################
 ### Building the GenArk assembly hubs ###
 #############################################################################
+### Requests from the request system:
+
+When a user sends in a request with an accession ID, e.g.: GCF_002776525.5
+the assembly may already exist in some version.  To check if something
+already exists, use just the number part of the ID: 002776525 and
+check the existing listings in the source tree:
+
+    grep 002776525 ~/kent/src/hg/makeDb/doc/*AsmHub/*.tsv
+
+They may be asking for a newer version, or they may be asking for
+a GenBank version when a RefSeq version already exists.  Can decide
+if what we have is better than what they ask for.  True, sometimes they
+may want a specific older version, negotiate that with the user in
+email to see if they would accept a newer version.
+
+When not there, check to see if has been recognized as something
+to build.  Again, just the number, for RefSeq assemblies:
+
+    grep 002776525 /hive/data/outside/ncbi/genomes/reports/newAsm/rs.todo.*
+for Genbank:
+    grep 002776525 /hive/data/outside/ncbi/genomes/reports/newAsm/gb.todo.*
+decide which one is best or most up to date, RefSeq is always first choice.
+Those are the pre-ready to go build commands to be run in the allBuild
+directory:
+
+    cd /hive/data/genomes/asmHubs/allBuild
+    time (./runBuild GCA_002776525.5_ASM277652v5 primates Piliocolobus_tephrosceles) >> GCA_002776525.5.log 2>&1
+
+If it doesn't show up there, check the 'master' listings from NCBI:
+
+   /hive/data/outside/ncbi/genomes/reports/assembly_summary*.txt
+assembly_summary_genbank.txt             assembly_summary_refseq.txt
+assembly_summary_genbank_historical.txt  assembly_summary_refseq_historical.txt
+
+These asssembly_summary*.txt files can also be scanned for scientific names
+if that is all the user supplied.
+
+#############################################################################
 ###
 ###  To build a single hub:
 #############################################################################
 
 0: Given an accession identifier, e.g. GCF_002776525.5
-a. find build command and designated clade
-b. run the build of the hub
-c. add lines to source tree files master.run.list and clade.orderList.tsv
-d. in the source tree run; time (make) > dbg 2>&1 # check for errors
-e. time (make verifyTestDownload) >> test.down.log 2>&1 # check for errors
-f. time (make sendDownload) >> send.down.log 2>&1 # check for errors
-g. time (make verifyDownload) >> verify.down.log 2>&1 # check for errors
-h. verify the browser functions: https://genome.ucsc.edu/h/GCF_002776525.5
+a. find build command and designated clade (see also above discussion)
+b. run the build of the hub (see also above discussion)
+c. add lines to source tree files master.run.list and
+                      doc/<clade>AsmHug/<clade>.orderList.tsv
+d. in the source tree doc/<clade>AsmHub/ directory, running commands:
+e. make symLinks      # prepares staging directory with symlinks to the build
+f. then: time (make) > dbg 2>&1 # check for errors: egrep "miss|err" dbg
+g. time (make verifyTestDownload) >> test.down.log 2>&1 # check for errors
+                                                   # grep check test.down.log
+h. time (make sendDownload) >> send.down.log 2>&1 # check for errors
+                                                  # grep error send.down.log
+i. time (make verifyDownload) >> verify.down.log 2>&1 # check for errors
+                                                  # grep check verify.down.log
+j. verify the browser functions: https://genome.ucsc.edu/h/GCF_002776525.5
 
 ### Details of those steps:
 
 1.  Given an accession identifier, e.g. GCF_002776525.5
     Find the build command and designated clade:
 
   grep GCF_002776525.5 \
      /hive/data/outside/ncbi/genomes/reports/newAsm/{rs,gb}.todo.*.txt
 
 Answer: 'primates' clade and command:
 rs.todo.primates.txt:./runBuild GCF_002776525.5_ASM277652v5 primates Piliocolobus_tephrosceles   2019_12_12
 
     If that grep finds nothing, that browser may already be built.
     Can grep source tree file: ~/kent/src/hg/makeDb/doc/asmHubs/master.run.list
     for your accession to see if it already done