50d8ee9d40c249470a706bfa94dccfca19364707 max Tue Feb 9 09:07:22 2021 -0800 adding docs page for CADD track, refs #18492 diff --git src/hg/makeDb/trackDb/human/cadd.html src/hg/makeDb/trackDb/human/cadd.html new file mode 100644 index 0000000..1717ca2 --- /dev/null +++ src/hg/makeDb/trackDb/human/cadd.html @@ -0,0 +1,98 @@ +<h2>Description</h2> + +<p> This track collection shows <a href="https://cadd.gs.washington.edu/" +target="_blank">Combined Annotation Dependent Depletion</a> score. +CADD is a tool for scoring the deleteriousness of single nucleotide variants as well as insertion/deletions variants in the human genome. +</p> + +<p> +Many mutation annotations +tend to exploit a single information type (e.g. conservation) and/or are +restricted in scope (e.g. to missense changes). Thus, a broadly applicable +metric that objectively weights and integrates diverse information is needed. +Combined Annotation Dependent Depletion (CADD) is a framework that integrates +multiple annotations into one metric by contrasting variants that survived +natural selection with simulated mutations. +</p> + +<p> +C-scores strongly correlate with allelic diversity, pathogenicity of both +coding and non-coding variants, and experimentally measured regulatory effects, +and also highly rank causal variants within individual genome sequences. +Finally, C-scores of complex trait-associated variants from genome-wide +association studies (GWAS) are significantly higher than matched controls and +correlate with study sample size, likely reflecting the increased accuracy of +larger GWAS. +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +There are four subtracks of this track: four for every possible single nucleotide mutation, +one for insertions and one for deletions. All subtracks show the CADD Phred score on mouse over. +<p> + +<p> +For single nucleotides, at every +nucleotide position, with three values per position, one for every possible +mutation. +For the single nucleotide variants, please zoom in until you can see every basepair. The mouse overs +will other show averages over all nucleotides under the cursor, which is indicated by the prefix "~" +in the mouse over text. +</p> + +<p>The scores are also shown on mouse over for a set of insertions and deletions. On hg38, the selected +set has been obtained from Gnomad3. On hg19, it has been obtained from XXX (TODO: ask CADD authors). +</p> + +<H2>Data access</H2> +<p> +The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> +or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. + +<p> +For automated download and analysis, the genome annotation is stored in a bigWig file that +can be downloaded from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/cadd/" target="_blank">our download server</a>. +The files for this track are called <tt>a.bw, c.bw, g.bw, t.bw, ins.bb and del.bb</tt>. Individual +regions or the whole genome annotation can be obtained using our tool <tt>bigWigToWig</tt> +which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool +can also be used to obtain only features within a given range, e.g. +<tt>bigWigToBedGraph http://hgdownload.soe.ucsc.edu/gbdb/$db/cadd/a.bw stdout</tt></p> +</p> + +<h2>Methods</h2> + +<p> +Data were converted from the files provided on +<a href="https://cadd.gs.washington.edu/download" target="_blank">the CADD Downloads website</a>, provided by the Kircher lab, +using <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/cadd" target=_BLANK>custom Python scripts</a>, +documented in our <a target=_BLANK href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/cadd.txt">makeDoc</a> files. +</p> + +<h2>Credits</h2> +<p> +Thanks to the Kircher lab for providing the data. +</p> + +<h2>References</h2> +<p> +Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J. +<a href="http://dx.doi.org/10.1038/ng.2892" target="_blank"> + A general framework for estimating the relative pathogenicity of human genetic variants</a>. +<em>Nat Genet</em>. 2014 Mar;46(3):310-5. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/24487276" target="_blank">24487276</a>; PMC: <a + href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3992975/" target="_blank">PMC3992975</a> +</p> + +<p> +Rentzsch P, Witten D, Cooper GM, Shendure J, Kircher M. +<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gky1016" target="_blank"> + CADD: predicting the deleteriousness of variants throughout the human genome</a>. +<em>Nucleic Acids Res</em>. 2019 Jan 8;47(D1):D886-D894. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/30371827" target="_blank">30371827</a>; PMC: <a + href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6323892/" target="_blank">PMC6323892</a> +</p> +