19ab7e792d62a1764de8fced4b1adcd2e6ab1ef4 Merge parents fafd6ff 54c4f85 dschmelt Tue Dec 17 11:49:43 2019 -0800 Fixing merge conflict #24595 diff --cc src/hg/htdocs/FAQ/FAQblat.html index c02b6a3,40fc3a9..e5f429b --- src/hg/htdocs/FAQ/FAQblat.html +++ src/hg/htdocs/FAQ/FAQblat.html @@@ -60,70 -60,79 +60,69 @@@ speed (no queues, response in seconds) at the price of lesser homology depth
BLAT is commonly used to look up the location of a sequence in the genome or determine the exon structure of an mRNA, but expert users can run large batch jobs and make internal parameter -sensitivity changes by installing command-line Blat on their own Linux server.
+sensitivity changes by installing command-line BLAT on their own Linux server. -First, check if you are using the correct version of the genome. For example, two versions of the human genome are currently in wide use (hg19 and hg38) and your sequence may be only in one of them. Many published articles do not specify the assembly version so trying both may be necessary.
Very short sequences that go over a splice site in a cDNA sequence can't be found, as they are not in the genome. qPCR primers are a typical example. For these cases, try using In-Silico PCR and selecting a gene set as the target. In general, the In-Silico PCR tool is more sensitive and should be preferred for pairs of primers.
-Another problematic case are sequences in repeats, as BLAT skips the most repetitive -parts of the query and also limits the number of best matches it finds. -The online version of Blat masks 11mers from the query that occur more than 1024 times in the -genome and also stops searching once it has found a certain number of matches on a chromosome -(repMatch, this is set to 100 on our BLAT web servers). This is done to -improve speed, but can result in missed hits when you are searching for -sequences in repeats. In these cases, a small subset of matches is found and -these are only part of all optimal matches in the genome. Often, you can use -the self-chain track to find the other matches, but only if the other matches -are long enough. You can always work around this limitation but adding more -flanking sequence to your query, to make the query unique enough. You can -check whether any sequence is indeed present at a particular location -by using the "Short match" track. -
- --If your input sequence is not one of the very repetitive sequences, but still -present a few dozen times on a chromosome, note that Blat results are limited -to 16 results per chromosome strand. This means that at most 32 locations -per chromosome are returned. The returned results are not necessarily the -highest-scoring ones. -
--To find all matches for repetitive sequences with the online version of Blat, you can add more flanking sequence to your -query. If this is not possible, the only alternative is to download the executables of Blat and the -.2bit file of a genome to your own machine and use BLAT on the command line. See -Downloading BLAT source and documentation for more information. -When using the command line version of BLAT, you can set the repMatch option to a large value -to try to improve finding matches in repetitive regions and do not -use one of the default 11.ooc masking files.
+Another problematic case is searching for sequences in repeats or transposons. +BLAT skips the most repetitive parts of the query and limits the number of matches it finds, +leading to missing matches for these repeat sequences. +The online version of BLAT masks 11mers from the query that occur more than 1024 times in the +genome and limits results to 16 matches per chromosome strand. This means that at most 32 locations +per chromosome are returned. This is done to improve speed, but can result in missed hits when you - are searching for sequences in repeats. - ++are searching for sequences in repeats. ++Often, you can use the self-chain track to +find the other matches, but only if the other matches are long enough. You can check whether +any sequence is indeed present at a particular location +by using the "Short match" track +if your sequence is less than 30 bp. +You can always work around this limitation but adding more flanking sequence to your query +to make the query unique enough. If this is not possible, the only alternative is to download +the executables of BLAT and the .2bit file of a genome to your own machine and use BLAT on +the command line. See Downloading BLAT source and documentation for +more information. When using the command line version of BLAT, you can set the repMatch +option to a large value to try to improve finding matches in repetitive regions and do not +use one of the default 11.ooc repeat masking files.
-This usually occurs on the newer genome assmeblies, such as hg38, when you search a sequence that has an "alternate" or "fix" sequence. To improve the quality of the these assemblies, curators have added multiple versions of some important loci, e.g. the MHC regions. They also add fix sequences to resolve errors without changing the reference. See our patches blog post for more information.
When you BLAT or isPCR a sequence which matches a chromosome location that also has a fix or alt sequence, you will see a match on the reference chromosome (e.g. "chr1") and another match on the patch sequence (e.g. chr1_KN196472v1_fix). In most cases it is safe to ignore the patch hit, as a human genome will not contain both the reference and alternate sequence at the same time. For more information on the specific kinds of patch sequences see our