1c1e57e11e060937653c32bed082331dcaa93107 ann Tue Jun 4 14:17:57 2019 -0700 3-way browser genome release agreement diff --git src/hg/htdocs/browserAgreement.html src/hg/htdocs/browserAgreement.html new file mode 100755 index 0000000..e52df18 --- /dev/null +++ src/hg/htdocs/browserAgreement.html @@ -0,0 +1,98 @@ +<!DOCTYPE html> +<!--#set var="TITLE" value="Browser Genome Release Agreement" --> +<!--#set var="ROOT" value="." --> + +<!-- Relative paths to support mirror sites with non-standard GB docs install --> +<!--#include virtual="$ROOT/inc/gbPageStart.html" --> + +<h1>Browser Genome Release Agreement</h1> + +<h2>Purpose</h2> +<p>The purpose of this document is to establish a common set of minimum requirements for public +display of genome data by the Ensembl, NCBI and UCSC browsers/annotation groups. This is a +follow up document to informal discussions held at the Biology of Genomes meeting at Cold +Spring Harbor, NY in May of 2008.</p> + +<h2>Background</h2> +<p>Previously, the only agreement among the major browsers was to display the same set of +reference coordinates for the human genome reference assembly. This has largely extended to +other organisms as well, but issues remain that can lead to differences in the data provided +by the browsers. The issue that likely causes the largest number of problems is the annotatio +and display of genome assembly data prior to deposition of the genome assembly to the +International Nucleotide Sequence Database Collaboration (INSDC), commonly referred to as +DDBJ/EMBL/GenBank. The most common problems are (in increasing order of severity):</p> + +<h6>Inconsistent sequence identifiers amongst browsers.</h6> +<p>Inconsistent sequence identifiers increase the level of difficulty when trying to exchange +annotation sets. This has been apparent as NCBI and Ensembl have tried to exchange gene model +datasets for organisms other than human and mouse. Of note, all browsers get these two assemblies +from a single source.</p> + +<h6>Inconsistent assembly identifiers amongst browsers.</h6> +<p>Inconsistent assembly identifiers make it difficult for users to know which coordinate system +is being displayed, regardless of the data source.</p> + +<h6>Different sequence data amongst browsers.</h6> +<p>Upon deposition of the data to the INSDC, quality control exercises will often uncover problems +with the assembly that is initially submitted. In many cases, the submitter is interested in +correcting these errors but the corrections may not get propagated to any browser that has already +picked up the data. An addendum to this item is the inconsistent handling of unplaced sequences. +Some groups choose to concatenate these sequences into a pseudo molecule, while others leave these +as independent sequences. The inconsistent use of sequence identifiers increases the difficulty of +mapping annotations amongst sources.</p> + +<h6>Some assemblies not ever submitted to INSDC.</h6> +<p>Once assemblies get picked up, annotated and displayed in a browser, the initial sequencing and +assembly group may have little incentive to submit this assembly to the INSDC. For example, the +<i>Xenopus tropicalis</i> assembly has been 'available' since August, 2005 but has never been +submitted to the INSDC.</p> + +<h2>Agreement</h2> +<p>Beginning in the spring of 2009, with the release of the Genome Reference Consortium Human +Build 37 release Ensembl, NCBI and UCSC agree that:</p> + +<ul> + <li>Data will be displayed only after it has been released by the INSDC.</li> + <ul> + <li>This document deals solely with the deposition of the genome assembly (contigs + + scaffolds). Submission of annotation is not a requirement for public display of genome + assembly data in any of the browsers.</li> + <li>It is anticipated that most genome assemblies will be able to be deposited to the INSDC. + However, in the event that a genome assembly does not meet the INSDC criteria for submission, + the genome browsers will be free to show this data. It is anticipated that the browsers will + work together to provide a consistent view of the assembly and its identifiers.</li> + <li>Assembly submitters can use the Hold Until Publication (HUP) mode of submission. Once the + assembly is accessioned, even if it has a HUP status, the submitter can distribute the + assembly to any third party browser/annotation group. However, the data for these assemblies + should not be made public by the browsers until the HUP status has been removed and the + assembly data are public in the INSDC.</li> + </ul> + <li>The sequence identifiers used in the browser and publicly distributed via FTP should be + correlated with the INSDC records.</li> + <ul> + <li>Browsers can use alternate sequence identifiers but it should be clear how these + identifiers map to the INSDC record. Ideally, this will have a minimal disruption on dataflow + but still provide a framework for easy data exchange between the various groups. This implies + that the starting AGP files should use the INSDC accession.version to identify all of the + objects and components describing the assembly. For a reminder of AGP definitions, please see + the specification found + <a href=http://www.ncbi.nlm.nih.gov/projects/genome/assembly/agp/AGP_Specification.shtml target="_blank">here</a>.</li> + </ul> + <li>All browsers will refer to any given assembly by the same name, preferably a submitter + approved name. This should be collected at the time of assembly submission and guidance should + be given to the submitted group in terms of selecting an appropriate name.</li> + <ul> + <li>There are several assemblies submitted that have no real submitter approved name. In + these cases, every effort should be made by the browsers to reconcile the names/assemblies + so that it is clear to users what data is being supplied at each browser and that data + exchange between the browsers/annotation groups is facilitated.</li> + <li>Browser-specific assembly names are permitted only as an adjunct to the official, + submitter-approved name, not as a replacement for the official name.</li> + </ul> +</ul> + +<p>The terms of this document are not meant to be retroactive, and data currently displayed in any +of the browsers that do not meet these criteria do not need to be removed. However, we should +endeavor to begin bringing all genome assembly data into compliance moving forward.</p> + +<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->