1cb08c59e884b67fbfcbf79f9dc416574e5c6239
galt
  Fri Jul 2 11:55:28 2021 -0700
Added a note about extra mouse strains appearing in the mm10 patch6

diff --git src/hg/makeDb/doc/mm10.patchUpdate.6.txt src/hg/makeDb/doc/mm10.patchUpdate.6.txt
index 3e3f35e..7a9beb9 100644
--- src/hg/makeDb/doc/mm10.patchUpdate.6.txt
+++ src/hg/makeDb/doc/mm10.patchUpdate.6.txt
@@ -1,19 +1,42 @@
 # for emacs: -*- mode: sh; -*-
 
 # This file describes how mm10 was extended with patch sequences and annotations from grcM38P6
 
+ALTS FROM OTHER MOUSE STRAINS IN NCBI RELEASE CONSIDERATIONS
+
+The original NBCI release grcM38 (which we used for the initial mm10 release)
+had dozens alt-scaffolds on 14 mouse strains. Whoever did that assembly manually removed
+those sequences from other strains. When we ran the patch6, we went back to the NCBI source
+and ran our standard build tools. We did not realize that 99 out of 108 scaffolds were
+from the other 14 strains. It did have 9 alt-scaffolds for the native strain C57BL/6J too.
+We did not catch the issue until too late when QA had pushed already and we received message from a researcher.
+
+Since it would be a lot of work to go back and re-do all the patch6 without the extra mouse strain alts,
+we have decided to proceed. We have updated README and the mm10 main html page
+to reflect these changes and note the additional non-native strain sequences that appear in patch 6 release.
+This can be justified since we are having to deal with alt scaffolds anyway
+in our increasingly complex world, and this makes our release more similar to NCBIs.
+Those alt-scaffolds were chosen because they can be useful, e.g. genes from other strains
+that have important medical research.
+
+Currently we have no table to map the alts to their respective strains,
+but it is easy to tell the native from non-native alts since
+all the IDS for the native C57BL/6J alts have the letter K in them.
+Our convention is that the ID follows the chrom they are located on,
+so the native alts look like chrN_KK* or chrN_KZ*.
+
 ##############################################################################
 # Extend main database 2bit, chrom.sizes, chromInfo (DONE - 2021-04-08 - Galt)
 
 
     cd /hive/data/genomes/mm10
     # main 2bit
     time faToTwoBit <(twoBitToFa mm10.2bit stdout) \
            <(twoBitToFa /hive/data/genomes/grcM38P6/grcM38P6.2bit stdout) \
            mm10.p6.2bit
 #real    1m52.859s
 
     # unmasked 2bit
     time twoBitMask -type=.bed mm10.p6.2bit /dev/null mm10.p6.unmasked.2bit
 #real    0m3.104s