b783f9e658057de6ade55b8d8b8932c9e6d606c8
brianlee
  Fri Mar 11 14:13:58 2022 -0800
At b0b's request revising the Platinum Genomes Track Description page, no RM

diff --git src/hg/makeDb/trackDb/human/platinumGenomes.html src/hg/makeDb/trackDb/human/platinumGenomes.html
index 0f9efd4..da191f9 100644
--- src/hg/makeDb/trackDb/human/platinumGenomes.html
+++ src/hg/makeDb/trackDb/human/platinumGenomes.html
@@ -1,55 +1,73 @@
 <h2>Abstract</h2>
-
 <p>
-Improvement of variant calling in next-generation sequence data requires
-a comprehensive, genome-wide catalog of high-confidence variants called in
-a set of genomes for use as a benchmark. We generated deep, whole-genome
-sequence data of 17 individuals in a three-generation pedigree and called
-variants in each genome using a range of currently available algorithms.
-We used haplotype transmission information to create a phased "Platinum"
-variant catalog of 4.7 million single-nucleotide variants (SNVs)
-plus 0.7 million small (1-50 bp) insertions and deletions (indels) that are
-consistent with the pattern of inheritance in the parents and 11 children
-of this pedigree. Platinum genotypes are highly concordant with the current
-catalog of the National Institute of Standards and Technology for
-both SNVs (&gt;99.99%) and indels (99.92%) and add a validated truth catalog
-that has 26% more SNVs and 45% more indels. Analysis of 334,652 SNVs that
-were consistent between informatics pipelines yet inconsistent with haplotype
-transmission ("nonplatinum") revealed that the majority of these variants
-are de novo and cell-line mutations or reside within previously unidentified
-duplications and deletions. The reference materials from this study are a
-resource for objective assessment of the accuracy of variant calls
-throughout genomes. 
-</p>
-
+These tracks shows high-confidence &quot;Platinum Genome&quot; variant calls for two individuals,
+NA12877 and NA12878, part of a sequenced 17 member pedigree for family number
+<a href="https://catalog.coriell.org/0/Sections/Collections/NIGMS/CEPHFamiliesDetail.aspx?PgId=441&fam=1463"
+target="_blank">1463</a>, from the Centre d'Etude du Polymorphisme Humain (CEPH). The hybrid
+track displays a merging of the NA12878 results with variant calls produced by Genome in a
+Bottle, discussed further below. CEPH is an international genetic research center that provides
+a resource of immortalised cell cultures used to map genetic markers, and pedigree 1463
+represents a family lineage from Utah of four grandparents, two parents, and 11 children.
+The whole pedigree was sequenced to 50x depth on a HiSeq 2000 Illumina system, which is
+considered a platinum standard, where platinum refers to the quality and completeness of
+the resulting assembly, such as providing full chromosome scaffolds with phasing and
+haplotypes resolved across the entire genome.</p>
+<p><img class="text-center" src="/images/platinumTree.jpg" width="400px"></p>
 <p>
-The 'hybrid' truthsets were generated by merging Genome in a Bottle
-high confidence calls (hg001, v3.3.2) with those from the Platinum
-Genomes truthset for the same sample (NA12878, v2017-1.0). Merged
-records were validated by performing a k-mer test on alignments from
-the lower pedigree CEPH 1463 (11 children). Records with k-mer support
-via haplotype inheritance were added to the hybrid truthset.
-</p>
+This figure depicts the pedigree of the family sequenced for this study, where the ID for each
+sample is defined by adding the prefix NA128 to each numbered individual, so that 77 = NA12877
+and 78 = NA12878, corresponding to the VCF tracks available in this track set. The dark orange
+individuals indicate sequences used in the analysis methods, whereas the blue represent the
+founder generations (grandparents), which were also sequenced and used in validation steps.
+The genomes of the parent child trio on the top right side, 91-92-78, were also sequenced
+during Phase I of the 1000 Genomes Project.</p>
+<p>
+These tracks represent a comprehensive genome-wide set of phased small variants that has been
+validated to high confidence. Sequencing and phasing a larger pedigree, beyond the two parents
+and one child, increases the ability to detect errors and assess the accuracy of more of the
+variants compared to a standard trio analysis. The genetic inheritance data enables creating a more
+comprehensive catalog of &quot;platinum variants&quot; that reflects both high accuracy and
+completeness. These results are significant as a comprehensive set of valid
+single-nucleotide variants (SNVs) and insertions and deletions (indels),
+in both the easy and difficult parts of the genome, provides a vital resource for software
+developers creating the next generation of variant callers, because these are the areas where
+the current methods most need training data to improve their methods. Since every one of the
+variants in this catalog is phased, this data set provides a resource to better assess emerging
+technologies designed to generate valid phasing information. To generate the calls, six analysis
+pipelines to call SNVs and indels were used, and merged into one catalog, where sensitivity of
+the genetic inheritance aided to detect genotyping errors and maximize the chance of only
+including true variants, that might otherwise be removed by suboptimal filtering. Read more
+about the detailed methods in the referenced paper, further describing this variant catalog
+of 4.7 million SNVs plus 0.7 million small (1-50 bp) indels, that are all consistent with
+the pattern of inheritance in the parents and 11 children of this pedigree.</p>
+<p>
+The hybrid track in this set extends the characterisation of NA12878
+by incorporating high confidence calls produced by Genome in a Bottle analysis.
+The resulting merged files contain more comprehensive coverage of variation than either
+set independently, for instance the hg19 version contains over 80,000 more indels than
+either input set. Read more about the hybrid methods at the following link:
+<a href="https://github.com/Illumina/PlatinumGenomes/wiki/Hybrid-truthset"
+target="_blank">https://github.com/Illumina/PlatinumGenomes/wiki/Hybrid-truthset</a></p>
 
 <h2>Data Access</h2>
 <p>
 The VCF files for this track can be obtained from the download server:
 <a href="https://hgdownload.soe.ucsc.edu/gbdb/$db/platinumGenomes/" target=_blank>
 https://hgdownload.soe.ucsc.edu/gbdb/$db/platinumGenomes/</a>.<br>
 These files were obtained from the Platinum genomes source archive:
 <a href="https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt" target=_blank>https://s3.eu-central-1.amazonaws.com/platinum-genomes/2017-1.0/ReleaseNotes.txt</a>.
 </p>
 
 <h2>Reference</h2>
 
 <p>
 Eberle MA, Fritzilas E, Krusche P, K&#228;llberg M, Moore BL, Bekritsky MA, Iqbal Z, Chuang HY,
 Humphray SJ, Halpern AL <em>et al</em>.
 <a href="https://genome.cshlp.org/content/27/1/157" target="_blank">
 A reference data set of 5.4 million phased human variants validated by genetic inheritance from
 sequencing a three-generation 17-member pedigree</a>.
 <em>Genome Res</em>. 2017 Jan;27(1):157-164.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27903644" target="_blank">27903644</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5204340/" target="_blank">PMC5204340</a>
 </p>