dac1d9e2d193f587f633dcfb619103d8dd86d47c
galt
  Fri Jul 2 13:05:28 2021 -0700
adding text to mm10 description.html about the inclusion of alternate scaffolds on other mouse strains, part of the NCBI mm38 release.

diff --git src/hg/makeDb/doc/mm10.patchUpdate.6.txt src/hg/makeDb/doc/mm10.patchUpdate.6.txt
index 7a9beb9..d4201ac 100644
--- src/hg/makeDb/doc/mm10.patchUpdate.6.txt
+++ src/hg/makeDb/doc/mm10.patchUpdate.6.txt
@@ -1,42 +1,60 @@
 # for emacs: -*- mode: sh; -*-
 
 # This file describes how mm10 was extended with patch sequences and annotations from grcM38P6
 
 ALTS FROM OTHER MOUSE STRAINS IN NCBI RELEASE CONSIDERATIONS
 
 The original NBCI release grcM38 (which we used for the initial mm10 release)
-had dozens alt-scaffolds on 14 mouse strains. Whoever did that assembly manually removed
-those sequences from other strains. When we ran the patch6, we went back to the NCBI source
-and ran our standard build tools. We did not realize that 99 out of 108 scaffolds were
+has dozens of alt-scaffolds on 14 mouse strains. Whoever did that assembly manually removed
+those sequences from other strains from the mm10 ucsc initial release. 
+When we ran the patch6, we went back to the NCBI source
+and ran our standard assembly build pipeline. We did not realize that 99 out of 108 scaffolds were
 from the other 14 strains. It did have 9 alt-scaffolds for the native strain C57BL/6J too.
 We did not catch the issue until too late when QA had pushed already and we received message from a researcher.
 
 Since it would be a lot of work to go back and re-do all the patch6 without the extra mouse strain alts,
 we have decided to proceed. We have updated README and the mm10 main html page
 to reflect these changes and note the additional non-native strain sequences that appear in patch 6 release.
 This can be justified since we are having to deal with alt scaffolds anyway
-in our increasingly complex world, and this makes our release more similar to NCBIs.
+in our increasingly complex world, and this makes our release more similar to NCBI's release.
 Those alt-scaffolds were chosen because they can be useful, e.g. genes from other strains
 that have important medical research.
 
 Currently we have no table to map the alts to their respective strains,
 but it is easy to tell the native from non-native alts since
 all the IDS for the native C57BL/6J alts have the letter K in them.
 Our convention is that the ID follows the chrom they are located on,
 so the native alts look like chrN_KK* or chrN_KZ*.
 
+ALTs that are native to C57BL/6J the strain used for mm10.
+[mm10]> select * from chromInfo where chrom like "%_K%_alt";
++--------------------+--------+----------------------+
+| chrom              | size   | fileName             |
++--------------------+--------+----------------------+
+| chr1_KK082441_alt  | 456798 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289080_alt | 394982 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289074_alt | 394026 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289078_alt | 390920 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289081_alt | 369973 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289079_alt | 368967 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289075_alt | 322221 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289073_alt | 215264 | /gbdb/mm10/mm10.2bit |
+| chr11_KZ289077_alt | 186144 | /gbdb/mm10/mm10.2bit |
++--------------------+--------+----------------------+
+
+
 ##############################################################################
 # Extend main database 2bit, chrom.sizes, chromInfo (DONE - 2021-04-08 - Galt)
 
 
     cd /hive/data/genomes/mm10
     # main 2bit
     time faToTwoBit <(twoBitToFa mm10.2bit stdout) \
            <(twoBitToFa /hive/data/genomes/grcM38P6/grcM38P6.2bit stdout) \
            mm10.p6.2bit
 #real    1m52.859s
 
     # unmasked 2bit
     time twoBitMask -type=.bed mm10.p6.2bit /dev/null mm10.p6.unmasked.2bit
 #real    0m3.104s