src/hg/makeDb/trackDb/human/constraintSuper.html 6630c8a0c3f887a5f9cea9fdc5dfb3205615b958

6630c8a0c3f887a5f9cea9fdc5dfb3205615b958
lrnassar
  Mon Jun 6 14:33:12 2022 -0700
Adding a yline at the 90th percentile and expanding the documentation on interpretation of JARVIS. Refs #29154

diff --git src/hg/makeDb/trackDb/human/constraintSuper.html src/hg/makeDb/trackDb/human/constraintSuper.html
index bfdb885..38c4e07 100644
--- src/hg/makeDb/trackDb/human/constraintSuper.html
+++ src/hg/makeDb/trackDb/human/constraintSuper.html
@@ -3,71 +3,88 @@
 <p>
 The "Constraint scores" container track includes several subtracks showing the results of
 constraint prediction algorithms. These try to find regions of negative
 selection, where variations likely have functional impact. The algorithms do
 not use multi-species alignments to derive evolutionary constraint, but use
 primarily human variation, usually from variants collected by gnomAD (see the
 gnomAD V2 or V3 tracks on hg19 and hg38) or TOPMED (contained in our dbSNP
 tracks and available as a filter). Another constraint score, gnomAD
 constraint, is not part of this container but can be found in the hg38 gnomAD
 track.
 
 The algorithms covered here are:
 <ol>
     <li><b><a href="https://github.com/astrazeneca-cgr-publications/jarvis" target="_blank">
     JARVIS - "Junk" Annotation genome-wide Residual Variation Intolerance Score</a></b>: 
-    First scan the entire genome with a
+    JARVIS scores were creating by first scanning the entire genome with a
     sliding-window approach (using a 1-nucleotide step), recording the number of
     all TOPMED variants and common variants, irrespective of their predicted effect,
     within each window, to eventually calculate a single-nucleotide resolution
-    genome-wide residual variation intolerance score (gwRVIS). Then combine
-    gwRVIS, primary genomic sequence context, and additional genomic
+    genome-wide residual variation intolerance score (gwRVIS). That score, gwRVIS
+    was then combined with primary genomic sequence context, and additional genomic
     annotations with a multi-module deep learning framework to infer
     pathogenicity of noncoding regions that still remains naive to existing
     phylogenetic conservation metrics. The higher the score, the more deleterious
-    is the prediction.
+    the prediction.
 
     <li><b><a href="https://www.cardiodb.org/hmc/" target="_blank">
     HMC - Homologous Missense Constraint</a></b>:
     Homologous Missense Constraint (HMC) is a amino acid level measure
     of genetic intolerance of missense variants within human populations.
     For all assessable amino-acid positions in Pfam domains, the number of
     missense substitutions directly observed in gnomAD (Observed) was counted
     and compared to the expected value under a neutral evolution
     model (Expected). The upper limit of a 95% confidence interval for the
     Observed/Expected ratio is defined as the HMC score. Missense variants
     disrupting the amino-acid positions with HMC&lt;0.8 are predicted to be
-    likely deleterious
+    likely deleterious.
    
     <li><b><a href="http://biosig.unimelb.edu.au/mtr-viewer/" target="_blank">
     MTR - Missense Tolerance Ratio</a> (hg19 only)</b>:
     Missense Tolerance Ratio (MTR) scores aim to quantify the amount of purifying 
     selection acting specifically on missense variants in a given window of 
     protein-coding sequence. It is estimated across sliding windows of 31 codons 
     (default) and uses observed standing variation data from the WES component of 
     gnomAD / the Exome Aggregation Consortium Database (ExAC), version 2.0. Scores
-    were computed using Ensembl v95 release 
+    were computed using Ensembl v95 release.
 </ol>
 
 <h2>Display Conventions and Configuration</h2>
 
 <h3>JARVIS</h3>
 <p>
-JARVIS scores are the scores as a signal ("wiggle") track, with one score per genome position.
-Mousing over the bars displays the exact values. The scores were downloaded and converted to a single bigWig file.
-See <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt" target=_blank>hg19 makeDoc</a> and
-<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/jarvis.txt" target=_blank>hg38 makeDoc</a>.</p>
+JARVIS scores are displayed as a signal ("wiggle") track, with one score per genome position.
+Mousing over the bars displays the exact values. A horizontal line exists at the <b>0.733</b>
+value which signifies the 90th percentile.</p>
+<p>
+<b>Interpretation:</b> The authors offer a suggested guideline of <b> > 0.9998</b> for identifying
+higher confidence calls and minimizing false positives. In addition to that strict threshold, the 
+following two more relaxed cutoffs can be used to explore additional hits. Note that these
+thresholds are offered as guidelines and are not necessarily representative of pathogenicity.</p>
+
+<p>
+<table class="stdTbl">
+    <tr align=left>
+        <th>Percentile</th><th>JARVIS score threshold</th></tr>
+    <tr align=left>
+        <td>99th</td><td>0.9998</td></tr>
+    <tr align=left>
+        <td>95th</td><td>0.9826</td></tr>
+    <tr align=left>
+        <td>90th</td><td>0.7338</td></tr>
+</table>
+</p>
 
 <h3>HMC</h3>
 <p>
 HMC scores are displayed as a signal ("wiggle") track, with one score per genome position.
 Mousing over the bars displays the exact values. The highly-constrained cutoff
 of 0.8 is indicated with a line.</p>
 <p>
 The HMC scores were downloaded and converted to .bedGraph files with a
 custom Python script. The bedGraph files were then converted to bigWig files,
 as documented in our <a
 href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt"
 target=_blank>makeDoc</a> hg19 build log.</p>
 
 <h3>MTR</h3>
 <p>
@@ -96,58 +113,61 @@
 <li><b><font color=blue>Blue items</font></b> No MTR score
 </ul>
 <p>
 <b>Interpretation:</b> Regions with low MTR scores were seen to be enriched with
 pathogenic variants. For example, ClinVar pathogenic variants were seen to
 have an average score of 0.77 whereas ClinVar benign variants had an average score
 of 0.92. Further validation using the FATHMM cancer-associated training dataset saw
 that scores less than 0.5 contained 8.6% of the pathogenic variants while only containing
 0.9% of neutral variants. In summary, lower scores are more likely to represent
 pathogenic variants whereas higher scores could be pathogenic, but have a higher chance
 to be a false positive. For more information see the <a target="_blank"
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602522/">MTR-Viewer publication</a>.</p>
 
 <h2>Methods</h2>
 
+<h3>JARVIS</h3> 
+<p>
+Scores were downloaded and converted to a single bigWig file. See the
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt"
+target=_blank>hg19 makeDoc</a> and the
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/jarvis.txt"
+target=_blank>hg38 makeDoc</a> for more info.
+</p>
+
 <h3>HMC</h3>
 <p>
 Scores were downloaded and converted to .bedGraph files with a custom Python 
 script. The bedGraph files were then converted to bigWig files, as documented in our 
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt" 
 target=_blank>makeDoc</a> hg19 build log.</p>
-<h3>Jarvis</h3> 
-<p>
-Scores were downloaded and converted to a single bigWig file. See 
-<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg19.txt" 
-target=_blank>hg19 makeDoc</a> and  <a 
-href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/jarvis.txt" 
-target=_blank>hg38 makeDoc</a></p>
+
 <h3>MTR</h3> 
 <p>
 <a href="http://biosig.unimelb.edu.au/mtr-viewer/downloads" target="_blank">V2
 file</a> was downloaded and columns were reshuffled as well as itemRgb added for the
 <b>MTR All data</b> track. For the <b>MTR Scores</b> track the file was parsed with a python
 script to pull out the highest possible MTR score for each of the 3 possible mutations
 at each base pair and 4 tracks built out of these values representing each mutation.</p>
 <p>
 See the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg
 /makeDb/doc/hg19.txt" target=_blank>hg19 makeDoc</a> entry on MTR for more info.</p>
 
 <h2>Credits</h2>
 
 <p>
-Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the Jarvis, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios for helping clean up the hg38 Jarvis files. 
+Thanks to Jean-Madeleine Desainteagathe (APHP Paris, France) for suggesting the JARVIS, MTR, HMC tracks. Thanks to Xialei Zhang for providing the HMC data file and to Dimitrios Vitsios and Slavé Petrovski for helping clean up the hg38 JARVIS files for providing guidance on interpretation. 
 </p>
 
 <h2>References</h2>
 
 <p>
 Vitsios D, Dhindsa RS, Middleton L, Gussow AB, Petrovski S.
 <a href="https://www.ncbi.nlm.nih.gov/pubmed/33686085" target="_blank">
     Prioritizing non-coding regions based on human genomic constraint and sequence context with deep
     learning</a>.
 <em>Nat Commun</em>. 2021 Mar 8;12(1):1504.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/33686085" target="_blank">33686085</a>; PMC: <a
     href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7940646/" target="_blank">PMC7940646</a>
 </p>
 
 <p>