a1cbac0f4ffff0ec3f9f709e48ef04fcc9769aa3 max Fri Jan 24 08:00:44 2020 -0800 adding a do script for clinvar lift track and docs page, refs #24825 (Not sure what to do about makedocs for an automated track like this) diff --git src/hg/makeDb/trackDb/clinvarLift.html src/hg/makeDb/trackDb/clinvarLift.html new file mode 100644 index 0000000..0fc9a45 --- /dev/null +++ src/hg/makeDb/trackDb/clinvarLift.html @@ -0,0 +1,99 @@ +<h2>Description</h2> + +<p> +This track shows human clinically variants from the +<a href="https://www.ncbi.nlm.nih.gov/clinvar/" target="_blank">ClinVar database</a>, +mapped from hg38 to the $db genome. The mapping uses UCSC's whole-genome alignments and the +tool <a href="https://genome.ucsc.edu/cgi-bin/hgLiftOver" target=_blank>liftOver</a>. +The annotations are somewhat speculative, +as liftOver is not meant to be used for cross-organism mapping. Among others, +liftOver has no notion of phylogenetic trees or protein orthology, so the +exact protein to which they are mapped may not be the annotated ortholog. +In areas with protein repeats it may have been mapped to the wrong exon. When the +genome nucleotide in $db is different from hg38, the corresponding position +could be several basepairs away. Generally, the more different the gene, the harder the +mapping. Before planning assays on these data, a manual alignment and annotation +of the human and $db nucleotide or amino acid sequences is recommended. + + +<h2>Display Conventions and Configuration</h2> + +<p> +Genomic locations of ClinVar variants are labeled with the human ClinVar variant +descriptions. For example, the label "C>G" usually means that in human, the cDNA +nucleotide change is from C>T. On a transcript on the reverse strand, the human +genome nucleotide on the forward strand would be G. In $db, the genome may not +be G at this position. Zoom in to see the nucleotide in $db, or click the +variant to show the human position and nucleotide and the $db nucleotide.</p> + +<p>All ClinVar information related to each is variant is shown on that +variant's details page. Leave the mouse over a feature for more than 2 seconds +to show the clinical significance of a variant in humans. +</p> + +<p>Only short variants with a length < 10 bp on the human genome were +lifted. A few variants that after lifting result in $db annotations longer than +30bp were filtered out, too. This can happen in repetitive regions that are +hard to align.</p> + +<p> +Annotations are shaded by clinical annotation: +<b><font color="red">red for pathogenic</font></b>, +<B><font color="#888">dark grey for uncertain significance or not provided</font></b> and +<B><font color="green">green for benign</font></b>. +</p> + +<p> +The score of the variants is the number of "stars" in ClinVar. On the track configuration page (above), you can filter the track to show only variants with more than a certain number of stars. For more information on the star rating, see the <a href="https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/" +target="_blank">ClinVar documentation</a>. +</p> + +<h2>Data updates</h2> +ClinVar is updated every month, but these mappings are not updated yet on a regular schedule. Please contact us +if you are interested in regular updates. +</p> + +<H2>Data access</H2> +<p> +The raw data can be explored interactively with the <a href="../cgi-bin/hgTables">Table Browser</a> +or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>. + +<p> +For automated download and analysis, the genome annotation is stored in a bigBed file that +can be downloaded from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/" target="_blank">our download server</a>. +The files for this track are called <tt>clinvarLift.bb</tt>. Individual +regions or the whole genome annotation can be obtained using our tool <tt>bigBedToBed</tt> +which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool +can also be used to obtain only features within a given range, e.g. +<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/clinvarLift.bb -chrom=chr1 -start=0 -end=100000000 stdout</tt></p> +</p> + +<h2>Methods</h2> + +<p> +The hg38 ClinvarMain track was annotated with nucleotides and positions, lifted to $db, filtered again for variants < 30bp +and annotated with nucleotides again. The output was converted to the <a href="../goldenPath/help/bigBed.html">bigBed</a> format. +The program that performs the mapping is available on +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/doClinvarLift" +target="_blank">Github</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to NCBI for making the ClinVar data available on their FTP site as a tab-separated file. +</p> + +<h2>References</h2> +<p> +Landrum MJ, Lee JM, Benson M, Brown G, Chao C, Chitipiralla S, Gu B, Hart J, Hoffman D, Hoover J +<em>et al</em>. +<a href="https://academic.oup.com/nar/article/44/D1/D862/2502702/ClinVar-public-archive-of-interpretations-of" target="_blank"> +ClinVar: public archive of interpretations of clinically relevant variants</a>. +<em>Nucleic Acids Res</em>. 2016 Jan 4;44(D1):D862-8. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/26582918" target="_blank">26582918</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702865/" target="_blank">PMC4702865</a> +</p>