c58182a80ec3c032b1c8613db4bd168aaeb7a543 max Mon May 30 02:36:53 2022 -0700 fixing revel docs, after great QA remarks, refs #29475 diff --git src/hg/makeDb/trackDb/human/revel.html src/hg/makeDb/trackDb/human/revel.html index ea5411d..d504510 100644 --- src/hg/makeDb/trackDb/human/revel.html +++ src/hg/makeDb/trackDb/human/revel.html @@ -39,47 +39,54 @@ are output for a genome position. This happens when there are two transcripts with different splicing patterns and since some input scores for REVEL take into account the sequence context, the same mutation can get two different scores. In these cases, only the maximum score is shown in the four per-nucleotide subtracks. </p> <p> For single nucleotide variants (SNV), at every genome position, there are three values per position, one for every possible nucleotide mutation. The fourth value, "no mutation", representing the reference allele, e.g. A to A, is always set to zero, "0.0". REVEL only takes into account amino acid changes, so a nucleotide change that results in no amino acid change (synonymous) also receives the score "0.0". </p> <li> -<p>One subtrack for duplicated scores. For the rare cases where multiple scores are possible -for a position, this track contains > 2 graphical features for each affected -genome position. Each feature is labeled with the mutation (A, C, T or G). -The transcript ID and resulting score for this transcript is shown when -hovering the mouse over the feature. For the large majority of the genome, -this subtrack has no features, because REVEL output only a single score -per nucleotide. +<p>One subtrack for duplicated scores. There are rare cases where multiple scores are possible +at a genome position, due to multiple, overlapping transcripts. For example, if there are +two transcript and one covers only half on an exon, then the first amino acids +of this exon will get two different REVEL scores, since some of the underlying +scores (polyPhen, for example), take into account the amino acid sequence context, and +this context is different, depending on the transcript. +For these cases, the last subtrack contains at least two +graphical features, for each affected genome position. Each feature is labeled +with the mutation (A, C, T or G) and the transcript IDs and resulting score for +this transcript is shown when hovering the mouse over the feature or clicking +it. For the large majority of the genome, this subtrack has no features, +because REVEL usually output only a single score per nucleotide, as the most +genome positions the transcript-derived amino acid sequence context is +identical. </p> <p> Note that in most diagnostic assays, variants are called using WGS pipelines, not RNA-seq. As a result, variants are originally located on the genome, not on transcripts, and a choice of the transcript is possibly made by a variant calling software using a heuristic. In addition, clinically, in the field, some transcripts have been agreed as more relevant for a disease, e.g. because only certain transcripts may be expressed in the relevant tissue. So -the choice of the most relevant transcript may be a question of manual -curation standards rather than a result of the assay. +the choice of the most relevant transcript, and as such the REVEL score, may be +a question of manual curation standards rather than a result of the variant itself. </p> </ul> <p> When using this track, zoom in until you can see every basepair at the top of the display. Otherwise, there are several nucleotides per pixel under your mouse cursor and no score will be shown on the mouseover tooltip. </p> <p>For hg38, note that the data was converted from the hg19 data using the UCSC liftOver program, by the REVEL authors. This can lead to missing values or duplicated values. When a hg38 position is annotated with two scores due to the lifting, the authors removed all the scores for this position. They did the same when the reference allele has changed from hg19 to hg38. Also, on hg38, the track has the "lifted" icon to indicate