bf697abe808e06b482d49306cfaa852fb905968e gperez2 Tue Oct 7 16:03:35 2025 -0700 Updates to the SpliceAI Wildtype description page and updating the shortLabel, refs #35100 diff --git src/hg/makeDb/trackDb/human/spliceImpactSuper.html src/hg/makeDb/trackDb/human/spliceImpactSuper.html index 19be2d8f869..6b4fd94383f 100644 --- src/hg/makeDb/trackDb/human/spliceImpactSuper.html +++ src/hg/makeDb/trackDb/human/spliceImpactSuper.html @@ -20,35 +20,37 @@ </p> <h3>SpliceAI Variants</h3> <p>SpliceAI is an <a href="https://github.com/Illumina/SpliceAI" target="_blank">open-source</a> deep learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations. To score variants, the spliceAI algorithm is run on the genome sequence itself and scores each nucleotide for the probability that it is a donor or acceptor site, on both the forward and the reverse strand. Then variants are added to the sequence and the new sequence is scored. Variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms. SpliceAI was developed at Illumina; a <a href="https://spliceailookup.broadinstitute.org" target="_blank">lookup tool</a> is provided by the Broad institute. </p> <h3>SpliceAI Wildtype</h3> -<p>This "wildtype" container track shows the scores for the genome -sequence itself, without variants. The "wildtype" subtracks are -useful when looking at new transcript models, to evaluate how likely exon -boundaries are and where possible splice acceptor sites are, in combination with the variants track. -</p> +<p> +This SpliceAI "Wildtype" container track shows the scores for the genome sequence itself, +without variants, from predicted splice donor (5' intron boundaries) and splice acceptor +(3' intron boundaries) sites. Predictions are strand-specific, with separate subtracks for the +plus and minus strands. These tracks are useful when looking at new transcript models to evaluate +how likely exon boundaries are and where possible splice acceptor sites are, in combination with +the variants track.</p> <b>Why are some variants not scored by SpliceAI?</b> <p> SpliceAI only annotates variants within genes defined by the gene annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome ends (5kb on either side), deletions of length greater than twice the input parameter -D, or inconsistent with the reference fasta file. </p> <b>What are the differences between masked and unmasked tracks?</b> <p> The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites and weakening unannotated splice sites, which are typically much less pathogenic than weakening annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative @@ -167,37 +169,36 @@ <li>Donor loss score</li> <li>Relative location of affected cryptic acceptor</li> <li>Relative location of affected acceptor</li> <li>Relative location of affected cryptic donor</li> <li>Relative location of affected donor</li> </ul> <p> Since most of the values are 0 or almost 0, we selected only those variants with a score equal to or greater than 0.02. </p> <p> The complete processing of this track can be found in the <a target="_blank" href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/spliceAI/spliceAI.py"> makedoc</a>. </p> -<h3>SpliceAI wildtype</h3> -<p>Data was provided by the Michael Hiller lab. -SpliceAI was run on the entire genome reference chromosomes. -Since the algorithm does not know where transcripts start or end, -the scores can differ from other websites, especially for splice sites -before the last exon or around the first exon. -</p> + +<h3>SpliceAI Wildtype</h3> +<p>Data was provided by the Michael Hiller lab. SpliceAI was run on the entire genome reference +chromosomes. Since the algorithm does not know where transcripts start or end, the scores +can differ from those on other websites, especially for splice sites before the last exon or +around the first exon.</p> <h3>SpliceVarDB</h3> <p>The data was converted by Patricia Sullivan from SpliceVarDB to <a href="../../goldenPath/help/bigLolly.html">bigLolly format</a>, and the UCSC Browser staff downloaded it for display. </p> <h2>Data Access</h2> <p>Precomputed AbSplice-DNA scores in all 49 GTEx tissues are available at <a target="_blank" href="https://zenodo.org/search?q=AbSplice-DNA&l=list&p=1&s=10&sort=bestmatch"> Zenodo</a>.</p> <b>License</b> @@ -208,40 +209,47 @@ FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are made available by Illumina only for academic or not-for-profit research only. By accessing the SpliceAI data, you acknowledge and agree that you may only use this data for your own personal academic or not-for-profit research only, and not for any other purposes. You may not use this data for any for-profit, clinical, or other commercial purpose without obtaining a commercial license from Illumina, Inc. </p> <p> The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis, the data may be queried from our <a href="https://genome.ucsc.edu/goldenPath/help/api.html">REST API</a>.</p> <p> -For automated download and analysis, the genome annotation is stored in a bigBed file that -can be downloaded from +For automated download and analysis, the genome annotation is stored in a bigBed or a bigWig file +that can be downloaded from <a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/" target="_blank">our download server</a>. -Individual regions or the whole genome annotation can be obtained using our tool -<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as a precompiled +Individual regions or the whole genome annotation can be obtained using our tools, e.g. +<br> +<br> +<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/splicevardb/SVADB.bb + -chrom=chr21 -start=0 -end=100000000 stdout</tt> +<br> +<tt>bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 + http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/spliceAi/wildtype/spliceAiAcceptorMinus.bw + stdout</tt> +<br> +<br> +These tools can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found -<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. -The tool can also be used to obtain only features within a given range, e.g. -<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/splicevardb/SVADB.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt></p> -</p> +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.</p> <h2>Credits</h2> <p>Thanks to Nils Wagner for helpful comments and suggestionsi for the AbSplice track.</p> <p>Thanks to the SpliceVarDB team for converting the data into our data formats.</p> <h2>References</h2> <p> Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA, Arbelaez J, Cui W, Schwartz GB <em>et al</em>. <a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(18)31629-5" target="_blank"> Predicting Splicing from Primary Sequence with Deep Learning</a>. <em>Cell</em>. 2019 Jan 24;176(3):535-548.e24. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30661751" target="_blank">30661751</a>