a8e19c9657462c53e955bb58de7ef4093fda6d9c hiram Wed Oct 9 11:43:23 2019 -0700 adding data access section and correct reference refs #21784 diff --git src/hg/makeDb/trackDb/human/platinumGenomes.html src/hg/makeDb/trackDb/human/platinumGenomes.html new file mode 100644 index 0000000..53d8fe5 --- /dev/null +++ src/hg/makeDb/trackDb/human/platinumGenomes.html @@ -0,0 +1,57 @@ +<h2>Abstract</h2> + +<p> +Improvement of variant calling in next-generation sequence data requires +a comprehensive, genome-wide catalog of high-confidence variants called in +a set of genomes for use as a benchmark. We generated deep, whole-genome +sequence data of 17 individuals in a three-generation pedigree and called +variants in each genome using a range of currently available algorithms. +We used haplotype transmission information to create a phased "Platinum" +variant catalog of 4.7 million single-nucleotide variants (SNVs) +plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are +consistent with the pattern of inheritance in the parents and 11 children +of this pedigree. Platinum genotypes are highly concordant with the current +catalog of the National Institute of Standards and Technology for +both SNVs (>99.99%) and indels (99.92%) and add a validated truth catalog +that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that +were consistent between informatics pipelines yet inconsistent with haplotype +transmission ("nonplatinum") revealed that the majority of these variants +are de novo and cell-line mutations or reside within previously unidentified +duplications and deletions. The reference materials from this study are a +resource for objective assessment of the accuracy of variant calls +throughout genomes. +</p> + +<p> +The 'hybrid' truthsets were generated by merging Genome in a Bottle +high confidence calls (hg001, v3.3.2) with those from the Platinum +Genomes truthset for the same sample (NA12878, v2017-1.0). Merged +records were validated by performing a k-mer test on alignments from +the lower pedigree CEPH 1463 (11 children). Records with k-mer support +via haplotype inheritance were added to the hybrid truthset. +</p> + +<h2>Data Access</h2> +<p> +The VCF files for this track can be obtained from the download server: +<a href="https://hgdownload.soe.ucsc.edu/gbdb/$db/platinumGenomes/" target=_blank> +https://hgdownload.soe.ucsc.edu/gbdb/$db/platinumGenomes/</a>.<br> +These files were obtained from the Platinum genomes source archive: +<a href="https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt" target=_blank>https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt</a>. +</p> + +<h2>Reference</h2> + +<a href="https://genome.cshlp.org/content/27/1/157" target=_blank> +A reference data set of 5.4 million phased human variants +validated by genetic inheritance from sequencing a three-generation +17-member pedigree</a><br> +<em>Genome Research</em>. 2017 Jan;27(1):157-164. doi: 10.1101/gr.210500.116. Epub 2016 Nov 30.<br> +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27903644" target="_blank">27903644</a><br> +PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5204340/" target=_blank>PMC5204340</a> +<p> +Michael A. Eberle, Epameinondas Fritzilas, Peter Krusche, Morten Källberg, +Benjamin L. Moore, Mitchell A. Bekritsky, Zamin Iqbal, Han-Yu Chuang, +Sean J. Humphray, Aaron L. Halpern, Semyon Kruglyak, Elliott H. Margulies, +Gil McVean and David R. Bentley +</p>