src/hg/makeDb/trackDb/human/dgvPlus.html 0c7fee255303c1f274b1133ae232d73d14478021

0c7fee255303c1f274b1133ae232d73d14478021
lrnassar
  Fri Apr 7 14:20:49 2023 -0700
Staging the new DGV gold track and adding a mouseOvers to other DGV tracks, refs #30908, #30909

diff --git src/hg/makeDb/trackDb/human/dgvPlus.html src/hg/makeDb/trackDb/human/dgvPlus.html
index 74caf39..5ac7f4f 100644
--- src/hg/makeDb/trackDb/human/dgvPlus.html
+++ src/hg/makeDb/trackDb/human/dgvPlus.html
@@ -1,59 +1,67 @@
 <H2>Description</H2>
 <P>
 This track displays copy number variants (CNVs), insertions/deletions (InDels),
 inversions and inversion breakpoints annotated by the
 <A HREF="http://dgv.tcag.ca/dgv/app/home"
 TARGET=_BLANK>Database of Genomic Variants</A> (DGV), which
 contains genomic variations observed in healthy individuals.
 DGV focuses on structural variation, defined as
 genomic alterations that involve segments of DNA that are larger than
 1000 bp.  Insertions/deletions of 50 bp or larger are also included.
 </P>
 
 <H2>Display Conventions</H2>
 <P>
-This track contains two subtracks:
+This track contains three subtracks:
 <p>
 <ul>
 <li>Structural Variant Regions: annotations that have been generated from one or more reported
 structural variants at the same location.
 </li>
 <li>Supporting Structural Variants: the sample-level reported structural variants.
 </li>
+<li>Gold Standard Variants: curated variants from a selected number of studies in DGV.
+</li>
 </ul>
 <P>
 Color is used in both subtracks to indicate the type of variation:
 <UL>
  <LI><B><span style="color:#C000C0;">Inversions</span></B> and
      <B><span style="color:#C000C0;">inversion breakpoints</span></B> are purple.
  </LI>
 
  <LI>CNVs and InDels are blue if there is a
    <B><span style="color:#0000C0;">gain in size</span></B> relative to the reference.
  </LI>
 
  <LI>CNVs and InDels are red if there is a
    <B><span style="color:#C00000;">loss in size</span></B> relative to the reference.
  </LI>
 
  <LI>CNVs and InDels are brown if there are reports of
    <B><span style="color:#8B4513;">both a loss and a gain in size</span></B>
    relative to the reference.
  </LI>
 </UL>
 </P>
+<p>
+The DGV Gold Standard subtrack utilizes a boxplot-like display to represent the 
+merging of records as explained in the Methods section below. In this track, the 
+middle box (where applicable), represents the high confidence location of the CNV, 
+while the thin lines and end boxes represent the possible range of the CNV.
+</p>
 <P>
 Clicking on a variant leads to a page with detailed information about the variant, 
 such as the study reference and PubMed abstract link, the study's method and any
 genes overlapping the variant. Also listed, if available, are the sequencing or array platform
 used for the study, a sample cohort description, sample size, sample ID(s) in which
 the variant was observed, observed gains and observed losses.
 If the particular variant is a merged variant, links to genome browser views of 
 the supporting variants are listed. If the particular variant is a supporting variant,
 a link to the genome browser view of its merged variant is displayed.
 A link to DGV's Variant Details page for each variant is also provided.
 </P>
 <P>
 For most variants, DGV uses <a href="http://dgv.tcag.ca/dgv/data-model/entd.html#Variant"
 target="_blank">accessions</a> from peer archives of structural variation
 (<A HREF="https://www.ncbi.nlm.nih.gov/dbvar/" TARGET=_BLANK>dbVar</A>
@@ -107,30 +115,41 @@
 Published structural variants are imported from peer archives
 <A HREF="https://www.ncbi.nlm.nih.gov/dbvar/" TARGET=_BLANK>dbVar</A> and
 <A HREF="https://www.ebi.ac.uk/dgva" TARGET=_BLANK>DGVa</A>.
 DGV then applies <a href="http://dgv.tcag.ca/dgv/app/faq#q4"
 target="_blank">quality filters</a> and merges overlapping variants.
 </P>
 <P>
 For data sets where the variation calls are reported at a
 sample-by-sample level, DGV merges calls with similar boundaries
 across the sample
 set. Only variants of the same type (i.e. CNVs, Indels, inversions)
 are merged, and gains and losses are merged separately.
 Sample level calls that overlap by &ge; 70% are merged in this
 process.
 </P>
+<p>
+The initial criteria for the Gold Standard set require that a variant 
+is found in at least two different studies and found in at least two different 
+samples. After filtering out low-quality variants, the remaining variants are 
+clustered according to 50% minimum overlap, and then merged into a single 
+record. Gains and losses are merged separately.</p>
+<p>
+The highest ranking variant in the cluster defines the inner box, while the 
+outer lines define the maximum possible start and stop coordinates of the CNV. 
+In this way, the inner box forms a high-confidence CNV location and the 
+thin connecting lines indicate confidence intervals for the location of CNV.</p>
 
 <h2>Data Access</h2>
 <p>
 The raw data can be explored interactively with the <a href="../hgTables">Table Browser</a>, or
 the <a href="../hgIntegrator">Data Integrator</a>. For automated access, this track, like all
 others, is available via our <a href="../goldenPath/help/api.html">API</a>. However, for bulk
 processing, it is recommended to download the dataset. The genome annotation is stored in a bigBed
 file that can be downloaded from the
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/dgv/">download server</a>.
 The exact filenames can be found in the track configuration file. Annotations can be converted to
 ASCII text by our tool <code>bigBedToBed</code> which can be compiled from the source code or
 downloaded as a precompiled binary for your system. Instructions for downloading source code and
 binaries can be found
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. The tool can
 also be used to obtain only features within a given range, for example:</p>