4ea3d549b2d0aa682aa22e6b31e844fee78d2cfc
gperez2
  Tue Oct 21 13:55:33 2025 -0700
Code review edit, refs #36551

diff --git src/hg/makeDb/trackDb/human/spliceImpactSuper.html src/hg/makeDb/trackDb/human/spliceImpactSuper.html
index c2f3d795d29..0b0748f94bc 100644
--- src/hg/makeDb/trackDb/human/spliceImpactSuper.html
+++ src/hg/makeDb/trackDb/human/spliceImpactSuper.html
@@ -1,298 +1,298 @@
 <h2>Description</h2>
 
 <p>
 The "Splicing Impact" container track contains tracks showing the predicted or validated effect of variants
 close to splice sites.
 </p>
 
 <h3>AbSplice</h3>
 <p>AbSplice is a method that predicts aberrant splicing across human tissues, as described in Wagner,
 &Ccedil;elik et al., 2023. This track displays precomputed AbSplice scores for all possible
 single-nucleotide variants genome-wide. The scores represent the probability that a given variant
 causes aberrant splicing in a given tissue.
 <a target="_blank" href="https://github.com/gagneurlab/absplice/tree/master">AbSplice</a> scores
 can be computed from VCF files and are based on quantitative tissue-specific splice site annotations
 (<a target="_blank" href="https://github.com/gagneurlab/splicemap">SpliceMaps</a>).
 While SpliceMaps can be generated for any tissue of interest from a cohort of RNA-seq samples, this
 track includes 49 tissues available from the
 <a target="_blank" href="https://www.gtexportal.org/home/samplingSitePage">Genotype-Tissue
 Expression (GTEx) dataset</a>.
 </p>
 
 <h3>SpliceAI Variants</h3>
 <p>SpliceAI is an <a href="https://github.com/Illumina/SpliceAI" target="_blank">open-source</a> deep
 learning splicing prediction algorithm that can predict splicing alterations caused by DNA variations.
 To score variants, the spliceAI algorithm is run on the genome sequence itself and scores each
 nucleotide for the probability that it is a donor or acceptor site, on both the
 forward and the reverse strand. Then variants are added to the sequence and the new sequence is
 scored. Variants may activate nearby cryptic splice sites, leading to abnormal transcript isoforms.
 SpliceAI was developed at Illumina; a
 <a href="https://spliceailookup.broadinstitute.org" target="_blank">lookup tool</a>
 is provided by the Broad institute. 
 </p>
 
 <h3>SpliceAI Wildtype</h3>
 <p>
 This SpliceAI &quot;Wildtype&quot; container track shows the scores for the genome sequence itself,
 without variants, from predicted splice donor (5&apos; intron boundaries) and splice acceptor
 (3&apos; intron boundaries) sites. Predictions are strand-specific, with separate subtracks for the
 plus and minus strands. These tracks are useful in combination with the variants track for
 evaluating new transcript models. They can be used to assess potential exon boundaries or
 possible splice acceptor sites.</p>
 
 <b>Why are some variants not scored by SpliceAI?</b>
 <p>
 SpliceAI only annotates variants within genes defined by the gene
 annotation file. Additionally, SpliceAI does not annotate variants if they are close to chromosome
 ends (5kb on either side), deletions of length greater than twice the input parameter -D, or
 inconsistent with the reference fasta file.
 </p>
 
 <b>What are the differences between masked and unmasked tracks?</b>
 <p>
 The unmasked tracks include splicing changes corresponding to strengthening annotated splice sites
 and weakening unannotated splice sites, which are typically much less pathogenic than weakening
 annotated splice sites and strengthening unannotated splice sites. The delta scores of such splicing
 changes are set to 0 in the masked files. We recommend using the unmasked tracks for alternative
 splicing analysis and masked tracks for variant interpretation.
 </p>
 
 <h3>SpliceVarDB</h3>
 <p>SpliceVarDB is an online database consolidating over 50,000 variants assayed
 for their effects on splicing in over 8,000 human genes. The authors evaluated
 over 500 published data sources and established a spliceogenicity scale to
 standardize, harmonize, and consolidate variant validation data generated by a
 range of experimental protocols. Genes and variant locations were obtained using
 GENCODE v44. Splice regions were calculated as specific distances from the closest
 canonical exon, including 5&apos; and 3&apos; untranslated regions (UTRs). The
 database is available at
 <a target=_blank href="https://splicevardb.org">splicevardb.org</a>.</p>
 
 <h2>Display Conventions and Configuration</h2>
 
 <h3>AbSplice</h3>
 <p>The AbSplice score is a probability estimate of how likely aberrant splicing of some sort takes
 place in a given tissue. The authors <a target="_blank" href="https://github.com/gagneurlab/absplice?tab=readme-ov-file#output"
 >suggest</a> three cutoffs which are represented by color in the track.
 </p>
 
 <ul>
 <li><b><font color="#FF0000">High (red)</font></b> - <b>
   An AbSplice score over 0.2</b> indicates a high likelihood of aberrant splicing in at least one tissue.</li>
 <li><b><font color="#FF8000">Medium (orange)</font></b> - <b>
   A score between 0.05 and 0.2 </b> indicates a medium likelihood.</li>
 <li><b><font color="#0000FF">Low (blue)</font></b> - <b>
   A score between 0.01 and 0.05 </b> indicates a low likelihood.</li>
 <li><b>Scores below 0.01 are not displayed.</b></li>
 </ul>
 <p>
 Mouseover on items shows the gene name, maximum score, and tissues that had this score. Clicking on
 any item brings up a table with scores for all 49 GTEX tissues.
 </p>
 
 <h3>SpliceAI</h3>
 <p>
 Variants are colored according to Walker et al. 2023 splicing impact:
 </p>
 <ul>
 <li><b><font color="#FF8000">Predicted impact on splicing: Score &gt;&#61; 0.2 </font></b> </li>
 <li><b><font color="#808080">Not informative: Score &lt; 0.2 and &gt; 0.1 </font></b> </li>
 <li><b><font color="#0000FF">No impact on splicing: Score &lt;&#61; 0.1 </font></b> </li>
 </ul>
 </p>
 Mouseover on items shows the variant, gene name, type of change (donor gain/loss, acceptor
 gain/loss), location of affected cryptic splice, and spliceAI score. Clicking on any item brings up
 a table with this information.
 </p>
 <p>
 The scores range from 0 to 1 and can be interpreted as the
 probability of the variant being splice-altering. In the paper, a detailed characterization is
 provided for 0.2 (high recall), 0.5 (recommended), and 0.8 (high precision) cutoffs.</p>
 
 <h3>SpliceAI Wildtype</h3>
 <p>
 These tracks are in bigWig format. The signal height represents the SpliceAI probability score.
 This track may be configured in a variety of ways to highlight different aspects of the displayed
-information. Click the "Graph configuration help" link for an explanation of configuration
+information. Click the &quot;Graph configuration help&quot; link for an explanation of configuration
 options.</p>
 
 <h3>SpliceVarDB</h3>
 <p>According to the strength of their supporting
 evidence, variants were classified as &quot;splice-altering&quot; (~25%), &quot;not
 splice-altering&quot; (~25%), and &quot;low-frequency splice-altering&quot; (~50%), which
 correspond to weak or indeterminate evidence of spliceogenicity. 55% of the
 splice-altering variants in SpliceVarDB are outside the canonical splice sites
 (5.6% are deep intronic). The data is shown as lollipop plots that can be clicked, 
 the details page then shows a link to SpliceVarDB with full details.
 </p>
 
 <p>The classification thresholds primarily follow those established by the original study.
 However, most studies only defined criteria for splice-altering variants and did not define
 criteria for variants that resulted in normal splicing. The authors implemented stringent
 thresholds to define the normal category and ensure a high-quality set of control variants.
 Variants that did not meet these criteria were classified as low-frequency splice-altering
 variants with a wide range of sub-optimal scores. Variants that fell between the normal and
 splice-altering classifications were placed into a low-frequency splice-altering category.
 In situations where a variant was validated multiple times, if at least one validation
 returned splice-altering and another returned normal, the &quot;conflicting&quot; category
 was applied.
 </p>
 
 <P>
 The lollipop plots are color-coded based on the <b>score</b> value, which corresponds
 to the following classifications:
 <ul>
  <li><b>3</b> - <span style="color: rgb(219,61,61);">Splice-altering</span></li>
  <li><b>2</b> - <span style="color: rgb(128,82,160);">Low-frequency</span></li>
  <li><b>1</b> - <span style="color: rgb(57,135,204);">Normal</span></li>
  <li><b>0</b> - <span style="color: rgb(140,140,140);">Conflicting</span></li>
 </ul>
 </P>
 
 <h2>Methods</h2>
 <h3>AbSplice</h3>
 <p>Data was converted from the files (AbSplice_DNA_ $db _snvs_high_scores.zip) provided by the authors
 at <a href="https://zenodo.org/search?q=AbSplice-DNA&l=list&p=1&s=10&sort=bestmatch"
 target="_blank">zenodo.org</a>. Files in the
 score_cutoff=0.01 directory were concatenated. To convert the data to bigBed format, scores and
 their tissues were selected from the AbSplice_DNA fields and maximum scores, and then calculated
 using a custom Python script, which can be found in the
 <a a target="_blank"  href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/outside/abSplice/">
 makeDoc</a> from our GitHub repository.</p>
 
 <h3>SpliceAI</h3>
 <p>
 The data were downloaded from <a
 target="_blank" href="https://basespace.illumina.com/s/otSPW8hnhaZR">Illumina</a>.
 The spliceAI scores are represented in the VCF INFO field as
 <code style="background-color: lightgray;">SpliceAI=G|OR4F5|0.01|0.00|0.00|0.00|-32|49|-40|-31</code> <br><br>
 Here, the pipe-separated fields contain
 <ul>
   <li>ALT allele</li>
   <li>Gene name</li>
   <li>Acceptor gain score</li>
   <li>Acceptor loss score</li>
   <li>Donor gain score</li>
   <li>Donor loss score</li>
   <li>Relative location of affected cryptic acceptor</li>
   <li>Relative location of affected acceptor</li>
   <li>Relative location of affected cryptic donor</li>
   <li>Relative location of affected donor</li>
 </ul>
 <p>
 Since most of the values are 0 or almost 0, we selected only those variants
 with a score equal to or greater than 0.02.
 </p>
 <p>
 The complete processing of this track can be found in the <a target="_blank"
 href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/spliceAI/spliceAI.py">
 makedoc</a>.
 </p>
 
 <h3>SpliceAI Wildtype</h3>
 <p>Data was provided by the Michael Hiller lab. SpliceAI was run on the entire genome reference
 chromosomes. Since the algorithm does not know where transcripts start or end, the scores
 can differ from those on other websites, especially for splice sites before the last exon or
 around the first exon.</p>
 
 
 <h3>SpliceVarDB</h3>
 <p>The data was converted by Patricia Sullivan from SpliceVarDB to
 <a href="../../goldenPath/help/bigLolly.html">bigLolly format</a>, and the UCSC
 Browser staff downloaded it for display.
 </p>
 
 <h2>Data Access</h2>
 
 <p>Precomputed AbSplice-DNA scores in all 49 GTEx tissues are available at
 <a target="_blank" href="https://zenodo.org/search?q=AbSplice-DNA&l=list&p=1&s=10&sort=bestmatch">
 Zenodo</a>.</p>
 
 <b>License</b>
 <p>
 The SpliceAI data is not available for download from the Genome Browser.
 The raw data can be found directly on
 <a target="_blank" href="https://basespace.illumina.com/s/otSPW8hnhaZR">Illumina</a>.
 FOR ACADEMIC AND NOT-FOR-PROFIT RESEARCH USE ONLY. The SpliceAI scores are
 made available by Illumina only for academic or not-for-profit research only.
 By accessing the SpliceAI data, you acknowledge and agree that you may only
 use this data for your own personal academic or not-for-profit research only,
 and not for any other purposes. You may not use this data for any for-profit,
 clinical, or other commercial purpose without obtaining a commercial license
 from Illumina, Inc.
 </p>
 
 <p>
 The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a>
 or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For automated analysis, the data may
 be queried from our <a href="https://genome.ucsc.edu/goldenPath/help/api.html">REST API</a>.</p>
 
 <p>
 For automated download and analysis, the genome annotation is stored in a bigBed or a bigWig file
 that can be downloaded from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/" target="_blank">our download server</a>.
 Individual regions or the whole genome annotation can be obtained using our tools, e.g.
 <br>
 <br>
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg19/splicevardb/SVADB.bb
 	-chrom=chr21 -start=0 -end=100000000 stdout</tt>
 <br>
 <tt>bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500
 	http://hgdownload.soe.ucsc.edu/gbdb/hg38/bbi/spliceAi/wildtype/spliceAiAcceptorMinus.bw
 	stdout</tt>
 <br>
 <br>
 These tools can be compiled from the source code or downloaded as a precompiled
 binary for your system. Instructions for downloading source code and binaries can be found
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.</p>
 
 <h2>Credits</h2>
 
 <p>Thanks to Illumina for making SpliceAI available, both the model and the precomputed data files.</p>
 
 <p>Thanks to Francois Lecoquierre from the University of Oxford, Jean-Madeleine de Sainte Agathe
 from Institut Pasteur Paris, and Michael Hiller from the Senckenberg Museum Frankfurt for
 suggesting and then creating the SpliceAI Wildtype annotations.</p>
 
 <p>Thanks to Nils Wagner for helpful comments and suggestions for the AbSplice track.</p>
 
 <p>Thanks to the SpliceVarDB team for converting the data into our data formats.</p>
 
 <h2>References</h2>
 <p>
 Jaganathan K, Kyriazopoulou Panagiotopoulou S, McRae JF, Darbandi SF, Knowles D, Li YI, Kosmicki JA,
 Arbelaez J, Cui W, Schwartz GB <em>et al</em>.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(18)31629-5" target="_blank">
 Predicting Splicing from Primary Sequence with Deep Learning</a>.
 <em>Cell</em>. 2019 Jan 24;176(3):535-548.e24.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30661751" target="_blank">30661751</a>
 </p>
 
 <p>
 Sullivan PJ, Quinn JMW, Wu W, Pinese M, Cowley MJ.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0002-9297(24)00288-X" target="_blank">
     SpliceVarDB: A comprehensive database of experimentally validated human splicing variants</a>.
 <em>Am J Hum Genet</em>. 2024 Oct 3;111(10):2164-2175.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39226898" target="_blank">39226898</a>; PMC: <a
     href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11480807/" target="_blank">PMC11480807</a>
 </p>
 
 <p>
 Wagner N, &#199;elik MH, H&#246;lzlwimmer FR, Mertes C, Prokisch H, Y&#233;pez VA, Gagneur J.
 <a href="https://doi.org/10.1038/s41588-023-01373-3" target="_blank">
 Aberrant splicing prediction across human tissues</a>.
 <em>Nat Genet</em>. 2023 May;55(5):861-870.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37142848" target="_blank">37142848</a>
 </p>
 
 <p>
 Walker LC, Hoya M, Wiggins GAR, Lindy A, Vincent LM, Parsons MT, Canson DM, Bis-Brewer D, Cass A,
 Tchourbanov A <em>et al</em>.
 <a href="https://linkinghub.elsevier.com/retrieve/pii/S0002-9297(23)00203-3" target="_blank">
 Using the ACMG/AMP framework to capture evidence related to predicted and observed impact on
 splicing: Recommendations from the ClinGen SVI Splicing Subgroup</a>.
 <em>Am J Hum Genet</em>. 2023 Jul 6;110(7):1046-1067.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/37352859" target="_blank">37352859</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10357475/" target="_blank">PMC10357475</a>
 </p>