e581e5a00da4cf4433b462a1f48f68a411861a08
gperez2
  Tue Mar 4 15:31:58 2025 -0800
Adding AbSplice and SpliceAI details to the spliceImpactSuper description page, refs #34823

diff --git src/hg/makeDb/trackDb/human/spliceImpactSuper.html src/hg/makeDb/trackDb/human/spliceImpactSuper.html
index f1239846f02..d1fcc0c65e7 100644
--- src/hg/makeDb/trackDb/human/spliceImpactSuper.html
+++ src/hg/makeDb/trackDb/human/spliceImpactSuper.html
@@ -1,35 +1,113 @@
 <h2>Description</h2>
 
 <p>
 The "Splicing Impact" container track contains tracks showing the predicted or validated effect of variants
 close to splice sites.
 </p>
 
+<h3>AbSplice</h3>
+<p>AbSplice is a method that predicts aberrant splicing across human tissues, as described in Wagner,
+&Ccedil;elik et al., 2023. This track displays precomputed AbSplice scores for all possible
+single-nucleotide variants genome-wide. The scores represent the probability that a given variant
+causes aberrant splicing in a given tissue.
+<a target="_blank" href="https://github.com/gagneurlab/absplice/tree/master">AbSplice</a> scores
+can be computed from VCF files and are based on quantitative tissue-specific splice site annotations
+(<a target="_blank" href="https://github.com/gagneurlab/splicemap">SpliceMaps</a>).
+While SpliceMaps can be generated for any tissue of interest from a cohort of RNA-seq samples, this
+track includes 49 tissues available from the
+<a target="_blank" href="https://www.gtexportal.org/home/samplingSitePage">Genotype-Tissue
+Expression (GTEx) dataset</a>.
+</p>
+
+<h3>SpliceAI</h3>
+<p>SpliceAI is an <a href="https://github.com/Illumina/SpliceAI" target="_blank">open-source</a> deep
+learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations.
+Such variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms.
+SpliceAI was developed at Illumina; a
+<a href="https://spliceailookup.broadinstitute.org" target="_blank">lookup tool</a>
+is provided by the Broad institute.
+</p>
+<b>Why are some variants not scored by SpliceAI?</b>
+<p>
+SpliceAI only annotates variants within genes defined by the gene
+annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome
+ends (5kb on either side), deletions of length greater than twice the input parameter -D, or
+inconsistent with the reference fasta file.
+</p>
+
+<b>What are the differeneces between masked and unmasked tracks?</b>
+<p>
+The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites
+and weakening unannotated splice sites, which are typically much less pathogenic than weakening
+annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing
+changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative
+splicing analysis and masked tracks for variant interpretation.
+</p>
+
 <h3>SpliceVarDB</h3>
 <p>SpliceVarDB is an online database consolidating over 50,000 variants assayed
 for their effects on splicing in over 8,000 human genes. The authors evaluated
 over 500 published data sources and established a spliceogenicity scale to
 standardize, harmonize, and consolidate variant validation data generated by a
 range of experimental protocols. Genes and variant locations were obtained using
 GENCODE v44. Splice regions were calculated as specific distances from the closest
 canonical exon, including 5&apos; and 3&apos; untranslated regions (UTRs). The
 database is available at
 <a target=_blank href="https://splicevardb.org">splicevardb.org</a>.</p>
 
 <h2>Display Conventions and Configuration</h2>
 
+<h3>AbSplice</h3>
+<p>The AbSplice score is a probability estimate of how likely aberrant splicing of some sort takes
+place in a given tissue. The authors <a target="_blank" href="https://github.com/gagneurlab/absplice?tab=readme-ov-file#output"
+>suggest</a> three cutoffs which are represented by color in the track.
+</p>
+
+<ul>
+<li><b><font color="#FF0000">High (red)</font></b> - <b>
+  An AbSplice score over 0.2</b> indicates a high likelihood of aberrant splicing in at least one tissue.</li>
+<li><b><font color="#FF8000">Medium (orange)</font></b> - <b>
+  A score between 0.05 and 0.2 </b> indicates a medium likelihood.</li>
+<li><b><font color="#0000FF">Low (blue)</font></b> - <b>
+  A score between 0.01 and 0.05 </b> indicates a low likelihood.</li>
+<li><b>Scores below 0.01 are not displayed.</b></li>
+</ul>
+<p>
+Mouseover on items shows the gene name, maximum score, and tissues that had this score. Clicking on
+any item brings up a table with scores for all 49 GTEX tissues.
+</p>
+
+<h3>SpliceAI</h3>
+<p>
+Variants are colored according to Walker et al. 2023 splicing impact:
+</p>
+<ul>
+<li><b><font color="#FF8000">Predicted impact on splicing: Score &gt;&#61; 0.2 </font></b> </li>
+<li><b><font color="#808080">Not informative: Score &lt; 0.2 and &gt; 0.1 </font></b> </li>
+<li><b><font color="#0000FF">No impact on splicing: Score &lt;&#61; 0.1 </font></b> </li>
+</ul>
+</p>
+Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor
+gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up
+a table with this information.
+</p>
+<p>
+The scores range from 0 to 1 and can be interpreted as the
+probability of the variant being splice-altering. In the paper, a detailed characterization is
+provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs.</p>
+
 <h3>SpliceVarDB</h3>
 <p>According to the strength of their supporting
 evidence, variants were classified as &quot;splice-altering&quot; (~25%), &quot;not
 splice-altering&quot; (~25%), and &quot;low-frequency splice-altering&quot; (~50%), which
 correspond to weak or indeterminate evidence of spliceogenicity. 55% of the
 splice-altering variants in SpliceVarDB are outside the canonical splice sites
 (5.6% are deep intronic). The data is shown as lollipop plots that can be clicked, 
 the details page then shows a link to SpliceVarDB with full details.
 </p>
 
 <p>The classification thresholds primarily follow those established by the original study.
 However, most studies only defined criteria for splice-altering variants and did not define
 criteria for variants that resulted in normal splicing. The authors implemented stringent
 thresholds to define the normal category and ensure a high-quality set of control variants.
 Variants that did not meet these criteria were classified as low-frequency splice-altering
@@ -39,57 +117,142 @@
 returned splice-altering and another returned normal, the &quot;conflicting&quot; category
 was applied.
 </p>
 
 <P>
 The lollipop plots are color-coded based on the <b>score</b> value, which corresponds
 to the following classifications:
 <ul>
  <li><b>3</b> - <span style="color: rgb(219,61,61);">Splice-altering</span></li>
  <li><b>2</b> - <span style="color: rgb(128,82,160);">Low-frequency</span></li>
  <li><b>1</b> - <span style="color: rgb(57,135,204);">Normal</span></li>
  <li><b>0</b> - <span style="color: rgb(140,140,140);">Conflicting</span></li>
 </ul>
 </P>
 
+<h2>Methods</h2>
+<h3>AbSplice</h3>
+<p>Data was converted from the files (AbSplice_DNA_ $db _snvs_high_scores.zip) provided by the authors
+at <a href="https://zenodo.org/search?q=AbSplice-DNA&l=list&p=1&s=10&sort=bestmatch"
+target="_blank">zenodo.org</a>. Files in the
+score_cutoff=0.01 directory were concatenated. To convert the data to bigBed format, scores and
+their tissues were selected from the AbSplice_DNA fields and maximum scores, and then calculated
+using a custom Python script, which can be found in the
+<a a target="_blank"  href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/outside/abSplice/">
+makeDoc</a> from our GitHub repository.</p>
+
+<h3>SpliceAI</h3>
+<p>
+The data were downloaded from <a
+target="_blank" href="https://basespace.illumina.com/s/otSPW8hnhaZR">Illumina</a>.
+The spliceAI scores are represented in the VCF INFO field as
+<code style="background-color: lightgray;">SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31</code> <br><br>
+Here, the pipe-separated fields contain
+<ul>
+  <li>ALT allele</li>
+  <li>Gene name</li>
+  <li>Acceptor gain score</li>
+  <li>Acceptor loss score</li>
+  <li>Donor gain score</li>
+  <li>Donor loss score</li>
+  <li>Relative location of affected cryptic acceptor</li>
+  <li>Relative location of affected acceptor</li>
+  <li>Relative location of affected cryptic donor</li>
+  <li>Relative location of affected donor</li>
+</ul>
+<p>
+Since most of the values are 0 or almost 0, we selected only those variants
+with a score equal to or greater than 0.02.
+</p>
+<p>
+The complete processing of this track can be found in the <a target="_blank"
+href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/spliceAI/spliceAI.py">
+makedoc</a>.
+</p>
+
+<h3>SpliceVarDB</h3>
+<p>The data was converted by Patricia Sullivan from SpliceVarDB to
+<a href="../../goldenPath/help/bigLolly.html">bigLolly format</a>, and the UCSC
+Browser staff downloaded it for display.
+</p>
+
 <h2>Data Access</h2>
+
+<p>Precomputed AbSplice-DNA scores in all 49 GTEx tissues are available at
+<a target="_blank" href="https://zenodo.org/search?q=AbSplice-DNA&l=list&p=1&s=10&sort=bestmatch">
+Zenodo</a>.</p>
+
+<b>License</b>
+<p>
+The SpliceAI data is not available for download from the Genome Browser.
+The raw data can be found directly on
+<a target="_blank" href="https://basespace.illumina.com/s/otSPW8hnhaZR">Illumina</a>.
+FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are
+made available by Illumina only for academic or not-for-profit research only.
+By accessing the SpliceAI data, you acknowledge and agree that you may only
+use this data for your own personal academic or not-for-profit research only,
+and not for any other purposes. You may not use this data for any for-profit,
+clinical, or other commercial purpose without obtaining a commercial license
+from Illumina, Inc.
+</p>
+
 <p>
 The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a>
-or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. The data can be
-accessed from scripts through our <a href="https://api.genome.ucsc.edu">API</a>, the track name is
-"splicevardb".
+or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis, the data may
+be queried from our <a href="https://genome.ucsc.edu/goldenPath/help/api.html">REST API</a>.</p>
 
 <p>
 For automated download and analysis, the genome annotation is stored in a bigBed file that
 can be downloaded from
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/splicevardb/" target="_blank">our download server</a>.
-The file for this track is called <tt>SVADB.bb</tt>. Individual
-regions or the whole genome annotation can be obtained using our tool <tt>bigBedToBed</tt>
-which can be compiled from the source code or downloaded as a precompiled
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/" target="_blank">our download server</a>.
+Individual regions or the whole genome annotation can be obtained using our tool
+<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as a precompiled
 binary for your system. Instructions for downloading source code and binaries can be found
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.
-The tool
-can also be used to obtain only features within a given range, e.g. 
+The tool can also be used to obtain only features within a given range, e.g.
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/splicevardb/SVADB.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt></p>
 </p>
 
-</p>
+<h2>Credits</h2>
 
-<h2>Methods</h2>
-<p>
-The data was converted by Patricia Sullivan from SpliceVarDB to
-<a href="../../goldenPath/help/bigLolly.html">bigLolly format</a>, and the UCSC
-Browser staff downloaded it for display.
-</p>
+<p>Thanks to Nils Wagner for helpful comments and suggestionsi for the AbSplice track.</p>
 
-<h2>Credits</h2>
 <p>Thanks to the SpliceVarDB team for converting the data into our data formats.</p>
 
 <h2>References</h2>
+<p>
+Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA,
+Arbelaez J, Cui W, Schwartz GB <em>et al</em>.
+<a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(18)31629-5" target="_blank">
+Predicting Splicing from Primary Sequence with Deep Learning</a>.
+<em>Cell</em>. 2019 Jan 24;176(3):535-548.e24.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30661751" target="_blank">30661751</a>
+</p>
+
 <p>
 Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0002-9297(24)00288-X" target="_blank">
     SpliceVarDB: A comprehensive database of experimentally validated human splicing variants</a>.
 <em>Am J Hum Genet</em>. 2024 Oct 3;111(10):2164-2175.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39226898" target="_blank">39226898</a>; PMC: <a
     href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480807/" target="_blank">PMC11480807</a>
 </p>
+
+<p>
+Wagner N, &#199;elik MH, H&#246;lzlwimmer FR, Mertes C, Prokisch H, Y&#233;pez VA, Gagneur J.
+<a href="https://doi.org/10.1038/s41588-023-01373-3" target="_blank">
+Aberrant splicing prediction across human tissues</a>.
+<em>Nat Genet</em>. 2023 May;55(5):861-870.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37142848" target="_blank">37142848</a>
+</p>
+
+<p>
+Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A,
+Tchourbanov A <em>et al</em>.
+<a href="https://linkinghub.elsevier.com/retrieve/pii/S0002-9297(23)00203-3" target="_blank">
+Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on
+splicing: Recommendations from the ClinGen SVI Splicing Subgroup</a>.
+<em>Am J Hum Genet</em>. 2023 Jul 6;110(7):1046-1067.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37352859" target="_blank">37352859</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10357475/" target="_blank">PMC10357475</a>
+</p>
+