623066b7ef5424f51400355a62706c2c5bc69283 hiram Mon May 20 09:34:05 2019 -0700 add html page for crisprAll track refs #23514 diff --git src/hg/makeDb/trackDb/crisprAll.html src/hg/makeDb/trackDb/crisprAll.html new file mode 100644 index 0000000..f342f79 --- /dev/null +++ src/hg/makeDb/trackDb/crisprAll.html @@ -0,0 +1,204 @@ +
+This track shows the DNA sequences targetable by CRISPR RNA guides using +the Cas9 enzyme from S. pyogenes (PAM: NGG) over the entire +$organism ($db) genome. CRISPR target sites were annotated with +predicted specificity (off-target effects) and predicted efficiency +(on-target cleavage) by various +algorithms through the tool CRISPOR. +
+ ++The track "CRISPR Regions" shows the regions of the genome where +target sites were analyzed.
+ ++The track "CRISPR Targets" shows the target sites in these regions. +The target sequence of the guide is shown with a thick (exon) bar. The PAM +motif match (NGG) is shown with a thinner bar. Guides +are colored to reflect both predicted specificity and efficiency. Specificity +reflects the "uniqueness" of a 20mer sequence in the genome; the less unique a +sequence is, the more likely it is to cleave other locations of the genome +(off-target effects). Efficiency is the frequency of cleavage at the target +site (on-target efficiency).
+ +Shades of gray stand for sites that are hard to target specifically, as the +20mer is not very unique in the genome:
+impossible to target: target site has at least one identical copy in the genome and was not scored | |
hard to target: many similar sequences in the genome that alignment stopped, repeat? | |
hard to target: target site was aligned but results in a low specificity score <= 50 (see below) |
Colors highlight targets that are specific in the genome (MIT specificity > 50) but have different predicted efficiencies:
+unable to calculate Doench/Fusi 2016 efficiency score | |
low predicted cleavage: Doench/Fusi 2016 Efficiency percentile <= 30 | |
medium predicted cleavage: Doench/Fusi 2016 Efficiency percentile > 30 and < 55 | |
high predicted cleavage: Doench/Fusi 2016 Efficiency > 55 |
+Mouse-over a target site to show predicted specificity and efficiency scores:
+
Click onto features to show all scores and predicted off-targets with up to +four mismatches. The Out-of-Frame score by Bae et al. 2014 +is correlated with +the probability that mutations induced by the guide RNA will disrupt the open +reading frame. The authors recommend out-of-frame scores > 66 to create +knock-outs with a single guide efficiently.
+ +
Off-target sites are sorted by the CFD score (Doench et al. 2016). +The higher the CFD score, the more likely there is off-target cleavage at that site. +Off-targets with a CFD score < 0.023 are not shown on this page, but are availble when +following the link to the external CRISPOR tool. +When compared against experimentally validated off-targets by +Haeussler et al. 2016, the large majority of predicted +off-targets with CFD scores < 0.023 were false-positives. For storage and performance +reasons, on the level of individual off-targets, only CFD scores are available.
+ ++Like most algorithms, the MIT specificity score is not always a perfect +predictor of off-target effects. Despite low scores, many tested guides +caused few and/or weak off-target cleavage when tested with whole-genome assays +(Figure 2 from Haeussler +et al. 2016), as shown below, and the published data contains few data points +with high specificity scores. Overall though, the assays showed that the higher +the specificity score, the lower the off-target effects.
+ +Similarly, efficiency scoring is not very accurate: guides with low +scores can be efficient and vice versa. As a general rule, however, the higher +the score, the less likely that a guide is very inefficient. The +following histograms illustrate, for each type of score, how the share of +inefficient guides drops with increasing efficiency scores: +
+ +When reading this plot, keep in mind that both scores were evaluated on +their own training data. Especially for the Moreno-Mateos score, the +results are too optimistic, due to overfitting. When evaluated on independent +datasets, the correlation of the prediction with other assays was around 25% +lower, see Haeussler et al. 2016. At the time of +writing, there is no independent dataset available yet to determine the +Moreno-Mateos accuracy for each score percentile range.
+ ++The entire $organism ($db) genome was for the -NGG motif. Flanking 20mer +guide sequences were +aligned to the genome with BWA and scored with MIT Specificity scores using the +command-line version of crispor.org. Non-unique guide sequences were skipped. +Flanking sequences were extracted from the genome and input for Crispor +efficiency scoring, available from the Crispor downloads page, which +includes the Doench 2016, Moreno-Mateos 2015 and Bae +2014 algorithms, among others. Note that the Doench 2016 scores were updated by +the Broad institute in 2017 ("Azimuth" update). As a result, earlier versions of +the track show the old Doench 2016 scores and this version of the track shows new +Doench 2016 scores. Old and new scores are almost identical, they are +correlated to 0.99 and for more than 80% of the guides the difference is below 0.02. +However, for very few guides, the difference can be bigger. In case of doubt, we recommend +the new scores. Crispor.org can display both scores and many more with the +"Show all scores" link.
+ ++The raw data can be explored interactively with the Table Browser. +For automated analysis, the genome annotation is stored in a bigBed file that +can be downloaded from +our download server. +The files for this track are called crispr.bb and crisprDetails.tab. Individual +regions or the whole genome annotation can be obtained using our tool bigBedToBed, +which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +here. The tool +can also be used to obtain only features within a given range, e.g. bigBedToBed +http://hgdownload.soe.ucsc.edu/gbdb/${db}/${track}/crispr.bb -chrom=chr21 +-start=0 -end=1000000 stdout
+ ++Track created by Maximilian Haeussler, with helpful input +from Jean-Paul Concordet (MNHN Paris) and Alberto Stolfi (NYU). +
+ ++Haeussler M, Schönig K, Eckert H, Eschstruth A, Mianné J, Renaud JB, Schneider-Maunoury S, +Shkumatava A, Teboul L, Kent J et al. +Evaluation of off-target and on-target scoring algorithms and integration into the +guide RNA selection tool CRISPOR. +Genome Biol. 2016 Jul 5;17(1):148. +PMID: 27380939; PMC: PMC4934014 +
+ ++Bae S, Kweon J, Kim HS, Kim JS. + +Microhomology-based choice of Cas9 nuclease target sites. +Nat Methods. 2014 Jul;11(7):705-6. +PMID: 24972169 +
+ ++Doench JG, Fusi N, Sullender M, Hegde M, Vaimberg EW, Donovan KF, Smith I, Tothova Z, Wilen C, +Orchard R et al. + +Optimized sgRNA design to maximize activity and minimize off-target effects of CRISPR-Cas9. +Nat Biotechnol. 2016 Feb;34(2):184-91. +PMID: 26780180; PMC: PMC4744125 +
+ ++Hsu PD, Scott DA, Weinstein JA, Ran FA, Konermann S, Agarwala V, Li Y, Fine EJ, Wu X, Shalem O +et al. + +DNA targeting specificity of RNA-guided Cas9 nucleases. +Nat Biotechnol. 2013 Sep;31(9):827-32. +PMID: 23873081; PMC: PMC3969858 +
+ ++Moreno-Mateos MA, Vejnar CE, Beaudoin JD, Fernandez JP, Mis EK, Khokha MK, Giraldez AJ. + +CRISPRscan: designing highly efficient sgRNAs for CRISPR-Cas9 targeting in vivo. +Nat Methods. 2015 Oct;12(10):982-8. +PMID: 26322839; PMC: PMC4589495 +