02d07fdf331de9cdf04f74c5d9211800403dfa8b jeltje.van.baren Tue Jan 21 10:52:51 2025 -0800 basing alphaMissense html on revel diff --git src/hg/makeDb/trackDb/human/alphaMissense.html src/hg/makeDb/trackDb/human/alphaMissense.html index e69de29bb2d..462d7df5305 100644 --- src/hg/makeDb/trackDb/human/alphaMissense.html +++ src/hg/makeDb/trackDb/human/alphaMissense.html @@ -0,0 +1,172 @@ +
This track collection shows Rare Exome Variant Ensemble Learner (alphaMissense) scores for predicting +the deleteriousness of each nucleotide change in the genome. +
+ ++alphaMissense is an ensemble method for predicting the pathogenicity of missense variants +based on a combination of scores from 13 individual tools: MutPred, FATHMM v2.3, +VEST 3.0, PolyPhen-2, SIFT, PROVEAN, MutationAssessor, MutationTaster, LRT, GERP++, +SiPhy, phyloP, and phastCons. alphaMissense was trained using recently discovered pathogenic +and rare neutral missense variants, excluding those previously used to train its +constituent tools. The alphaMissense score for an individual missense variant can range +from 0 to 1, with higher scores reflecting greater likelihood that the variant is +disease-causing. +
+ +Most authors of deleteriousness scores argue against using fixed cutoffs in +diagnostics. But to give an idea of the meaning of the score value, the alphaMissense +authors note: "For example, 75.4% of disease mutations but only 10.9% of +neutral variants (and 12.4% of all ESVs) have a alphaMissense score above 0.5, +corresponding to a sensitivity of 0.754 and specificity of 0.891. Selecting a +more stringent alphaMissense score threshold of 0.75 would result in higher specificity +but lower sensitivity, with 52.1% of disease mutations, 3.3% of neutral +variants, and 4.1% of all ESVs being classified as pathogenic". (Figure S1 of +the reference below) +
+ ++There are five subtracks for this track: +
Four lettered subtracks, one for every nucleotide, showing +scores for mutation from the reference to that +nucleotide. All subtracks show the alphaMissense ensemble score on mouseover. Across the exome, +there are three values per position, one for every possible +nucleotide mutation. The fourth value, "no mutation", representing +the reference allele, e.g. A to A, is always set to zero, "0.0". alphaMissense only +takes into account amino acid changes, so a nucleotide change that results in no +amino acid change (synonymous) also receives the score "0.0". +
+In rare cases, two scores are output for the same variant at a +genome position. This happens when there are two transcripts with +different splicing patterns and since some input scores for alphaMissense take into account +the sequence context, the same mutation can get two different scores. In these cases, +only the maximum score is shown in the four per-nucleotide subtracks. The complete set of +scores are shown in the Overlaps track. +
+ +One subtrack, Overlaps, shows alternate alphaMissense scores when applicable. +In rare cases (0.05% of genome positions), multiple scores exist with a single variant, +due to multiple, overlapping transcripts. For example, if there are +two transcripts and one covers only half of an exon, then the amino acids +that overlap both transcripts will get two different alphaMissense scores, since some of the underlying +scores (polyPhen for example) take into account the amino acid sequence context and +this context is different depending on the transcript. +For these cases, this subtrack contains at least two +graphical features, for each affected genome position. Each feature is labeled +with the mutation (A, C, T or G). The transcript IDs and resulting score is +shown when hovering over the feature or clicking +it. For the large majority of the genome, this subtrack has no features. +This is because alphaMissense usually outputs only a single score per nucleotide and +most transcript-derived amino acid sequence contexts are identical. +
++Note that in most diagnostic assays, variants are called using WGS +pipelines, not RNA-seq. As a result, variants are originally located on the +genome, not on transcripts, and the choice of transcript is made by +a variant calling software using a heuristic. In addition, clinically, in the +field, some transcripts have been agreed-on as more relevant for a disease, e.g. +because only certain transcripts may be expressed in the relevant tissue. So +the choice of the most relevant transcript, and as such the alphaMissense score, may be +a question of manual curation standards rather than a result of the variant itself. +
++When using this track, zoom in until you can see every basepair at the +top of the display. Otherwise, there are several nucleotides per pixel under +your mouse cursor and no score will be shown on the mouseover tooltip. +
+ +Track colors
++This track is colored according to Table 2 in Vikas et al. The colors represent the recommended ACMG/AMP score cutoffs. + +
Range | +Classification | +
---|---|
≥ .773 | +Pathogenic | +
.772 - .184 | +Neutral | +
≤ .183 | +Benign | +
For hg38, note that the data was converted from the hg19 data using the UCSC +liftOver program, by the alphaMissense authors. This can lead to missing values or +duplicated values. When a hg38 position is annotated with two scores due to the +lifting, the authors removed all the scores for this position. They did the same when +the reference allele has changed from hg19 to hg38. Also, on hg38, the track has +the "lifted" icon to indicate +this. You can double-check if a nucleotide +position is possibly affected by the lifting procedure by activating the track +"Hg19 Mapping" under "Mapping and Sequencing". +
+ ++alphaMissense scores are available at the + +alphaMissense website. +The site provides precomputed alphaMissense scores for all possible human missense variants +to facilitate the identification of pathogenic variants among the large number of +rare variants discovered in sequencing studies. + +
+ +
+The alphaMissense data on the UCSC Genome Browser can be explored interactively with the
+Table Browser or the
+Data Integrator.
+For automated download and analysis, the genome annotation is stored at UCSC in bigWig
+files that can be downloaded from
+our download server.
+The files for this track are called a.bw, c.bw, g.bw, t.bw. Individual
+regions or the whole genome annotation can be obtained using our tool bigWigToWig
+which can be compiled from the source code or downloaded as a precompiled
+binary for your system. Instructions for downloading source code and binaries can be found
+here.
+The tools can also be used to obtain features confined to given range, e.g.
+
+
+bigWigToBedGraph -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/alphaMissense/a.bw stdout
+
+
+
+Data were converted from the files provided on +the alphaMissense Downloads website. As with all other tracks, +a full log of all commands used for the conversion is available in our +source repository, for hg19 and hg38. The release used for each assembly is shown on the track description page. + +
+ ++Thanks to the alphaMissense development team for providing precomputed data and fixing duplicated values in the hg38 files. +
+ ++
+