29468b3e7a861865f59609512f3b9c8f4a961df1 max Tue Dec 17 01:59:28 2019 -0800 blat FAQ update, based on refs #24595 diff --git src/hg/htdocs/FAQ/FAQblat.html src/hg/htdocs/FAQ/FAQblat.html index 21d0067..40fc3a9 100755 --- src/hg/htdocs/FAQ/FAQblat.html +++ src/hg/htdocs/FAQ/FAQblat.html @@ -79,45 +79,48 @@ <h6>I can't find a sequence with Blat although I'm sure it is in the genome. Am I doing something wrong?</h6> <p> First, check if you are using the correct version of the genome. For example, two versions of the human genome are currently in wide use (hg19 and hg38) and your sequence may be only in one of them. Many published articles do not specify the assembly version so trying both may be necessary.</p> <p> Very short sequences that go over a splice site in a cDNA sequence can't be found, as they are not in the genome. qPCR primers are a typical example. For these cases, try using <a href="../cgi-bin/hgPcr">In-Silico PCR</a> and selecting a gene set as the target. In general, the In-Silico PCR tool is more sensitive and should be preferred for pairs of primers.</p> <p> Another problematic case are sequences in repeats, as BLAT skips the most repetitive parts of the query and also limits the number of best matches it finds. The online version of Blat masks 11mers from the query that occur more than 1024 times in the -genome and also stops searching once it has found a certain number of optimal matches on a chromosome. -This is done to improve speed, but can result in missed hits when you are searching for -sequences in repeats. In these cases, a small subset of matches is found and these -are only part of all optimal matches in the genome. Often, you can use the self-chain track to -find the other matches, but only if the other matches are long enough. You can always work around -this limitation but adding more flanking sequence to your query, to make the query unique enough. -You can check whether any sequence is indeed present at a particular location +genome and also stops searching once it has found a certain number of matches on a chromosome +(repMatch, this is set to 100 on our BLAT web servers). This is done to +improve speed, but can result in missed hits when you are searching for +sequences in repeats. In these cases, a small subset of matches is found and +these are only part of all optimal matches in the genome. Often, you can use +the self-chain track to find the other matches, but only if the other matches +are long enough. You can always work around this limitation but adding more +flanking sequence to your query, to make the query unique enough. You can +check whether any sequence is indeed present at a particular location by using the <a href="../cgi-bin/hgTrackUi?db=hg38&g=oligoMatch">"Short match" track</a>. </p> <p> If your input sequence is not one of the very repetitive sequences, but still present a few dozen times on a chromosome, note that Blat results are limited to 16 results per chromosome strand. This means that at most 32 locations -per chromosome are returned. +per chromosome are returned. The returned results are not necessarily the +highest-scoring ones. </p> <p> To find all matches for repetitive sequences with the online version of Blat, you can add more flanking sequence to your query. If this is not possible, the only alternative is to download the executables of Blat and the .2bit file of a genome to your own machine and use BLAT on the command line. See <a href="#blat3">Downloading BLAT source and documentation</a> for more information. When using the command line version of BLAT, you can set the repMatch option to a large value to try to improve finding matches in repetitive regions and do not use one of the default 11.ooc masking files.</p> <a name="blat1c"></a> <h2>Blat or In-Silico PCR finds multiple matches such as chr_alt or chr_fix even though only one is expected</h2> <h6>I am seeing two or more matches in the genome although there should only be one. What are these extra matches?</h6> <p>