e75ee7807a3856fa9fa49446575701e9c648a3c9 brianlee Fri Mar 11 14:36:47 2022 -0800 Few extra changes to platinum genomes page. diff --git src/hg/makeDb/trackDb/human/platinumGenomes.html src/hg/makeDb/trackDb/human/platinumGenomes.html index da191f9..89e6ef6 100644 --- src/hg/makeDb/trackDb/human/platinumGenomes.html +++ src/hg/makeDb/trackDb/human/platinumGenomes.html @@ -1,73 +1,73 @@ -

Abstract

+

Description

-These tracks shows high-confidence "Platinum Genome" variant calls for two individuals, +These tracks show high-confidence "Platinum Genome" variant calls for two individuals, NA12877 and NA12878, part of a sequenced 17 member pedigree for family number 1463, from the Centre d'Etude du Polymorphisme Humain (CEPH). The hybrid track displays a merging of the NA12878 results with variant calls produced by Genome in a Bottle, discussed further below. CEPH is an international genetic research center that provides -a resource of immortalised cell cultures used to map genetic markers, and pedigree 1463 +a resource of immortalized cell cultures used to map genetic markers, and pedigree 1463 represents a family lineage from Utah of four grandparents, two parents, and 11 children. The whole pedigree was sequenced to 50x depth on a HiSeq 2000 Illumina system, which is considered a platinum standard, where platinum refers to the quality and completeness of the resulting assembly, such as providing full chromosome scaffolds with phasing and haplotypes resolved across the entire genome.

This figure depicts the pedigree of the family sequenced for this study, where the ID for each sample is defined by adding the prefix NA128 to each numbered individual, so that 77 = NA12877 and 78 = NA12878, corresponding to the VCF tracks available in this track set. The dark orange individuals indicate sequences used in the analysis methods, whereas the blue represent the founder generations (grandparents), which were also sequenced and used in validation steps. -The genomes of the parent child trio on the top right side, 91-92-78, were also sequenced +The genomes of the parent-child trio on the top right side, 91-92-78, were also sequenced during Phase I of the 1000 Genomes Project.

-These tracks represent a comprehensive genome-wide set of phased small variants that has been +These tracks represent a comprehensive genome-wide set of phased small variants that have been validated to high confidence. Sequencing and phasing a larger pedigree, beyond the two parents and one child, increases the ability to detect errors and assess the accuracy of more of the variants compared to a standard trio analysis. The genetic inheritance data enables creating a more comprehensive catalog of "platinum variants" that reflects both high accuracy and completeness. These results are significant as a comprehensive set of valid single-nucleotide variants (SNVs) and insertions and deletions (indels), in both the easy and difficult parts of the genome, provides a vital resource for software developers creating the next generation of variant callers, because these are the areas where the current methods most need training data to improve their methods. Since every one of the variants in this catalog is phased, this data set provides a resource to better assess emerging technologies designed to generate valid phasing information. To generate the calls, six analysis -pipelines to call SNVs and indels were used, and merged into one catalog, where sensitivity of +pipelines to call SNVs and indels were used and merged into one catalog, where the sensitivity of the genetic inheritance aided to detect genotyping errors and maximize the chance of only including true variants, that might otherwise be removed by suboptimal filtering. Read more about the detailed methods in the referenced paper, further describing this variant catalog of 4.7 million SNVs plus 0.7 million small (1-50 bp) indels, that are all consistent with the pattern of inheritance in the parents and 11 children of this pedigree.

-The hybrid track in this set extends the characterisation of NA12878 +The hybrid track in this set extends the characterization of NA12878 by incorporating high confidence calls produced by Genome in a Bottle analysis. The resulting merged files contain more comprehensive coverage of variation than either -set independently, for instance the hg19 version contains over 80,000 more indels than +set independently, for instance, the hg19 version contains over 80,000 more indels than either input set. Read more about the hybrid methods at the following link: https://github.com/Illumina/PlatinumGenomes/wiki/Hybrid-truthset

Data Access

The VCF files for this track can be obtained from the download server: https://hgdownload.soe.ucsc.edu/gbdb/$db/platinumGenomes/.
These files were obtained from the Platinum genomes source archive: https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt.

Reference

Eberle MA, Fritzilas E, Krusche P, Källberg M, Moore BL, Bekritsky MA, Iqbal Z, Chuang HY, Humphray SJ, Halpern AL et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 2017 Jan;27(1):157-164. PMID: 27903644; PMC: PMC5204340