e107868e195088405fd83fa63173ecc25e4b05b2 ann Tue Jun 4 14:47:03 2019 -0700 a few small changes diff --git src/hg/htdocs/browserAgreement.html src/hg/htdocs/browserAgreement.html index e52df18..fe11284 100755 --- src/hg/htdocs/browserAgreement.html +++ src/hg/htdocs/browserAgreement.html @@ -1,98 +1,102 @@
The purpose of this document is to establish a common set of minimum requirements for public display of genome data by the Ensembl, NCBI and UCSC browsers/annotation groups. This is a -follow up document to informal discussions held at the Biology of Genomes meeting at Cold +follow-up document to informal discussions held at the Biology of Genomes meeting at Cold Spring Harbor, NY in May of 2008.
Previously, the only agreement among the major browsers was to display the same set of reference coordinates for the human genome reference assembly. This has largely extended to other organisms as well, but issues remain that can lead to differences in the data provided -by the browsers. The issue that likely causes the largest number of problems is the annotatio +by the browsers. The issue that likely causes the largest number of problems is the annotation and display of genome assembly data prior to deposition of the genome assembly to the International Nucleotide Sequence Database Collaboration (INSDC), commonly referred to as DDBJ/EMBL/GenBank. The most common problems are (in increasing order of severity):
Inconsistent sequence identifiers increase the level of difficulty when trying to exchange annotation sets. This has been apparent as NCBI and Ensembl have tried to exchange gene model datasets for organisms other than human and mouse. Of note, all browsers get these two assemblies from a single source.
Inconsistent assembly identifiers make it difficult for users to know which coordinate system is being displayed, regardless of the data source.
Upon deposition of the data to the INSDC, quality control exercises will often uncover problems with the assembly that is initially submitted. In many cases, the submitter is interested in correcting these errors but the corrections may not get propagated to any browser that has already picked up the data. An addendum to this item is the inconsistent handling of unplaced sequences. Some groups choose to concatenate these sequences into a pseudo molecule, while others leave these as independent sequences. The inconsistent use of sequence identifiers increases the difficulty of mapping annotations amongst sources.
Once assemblies get picked up, annotated and displayed in a browser, the initial sequencing and assembly group may have little incentive to submit this assembly to the INSDC. For example, the -Xenopus tropicalis assembly has been 'available' since August, 2005 but has never been +Xenopus tropicalis assembly has been 'available' since August 2005 but has never been submitted to the INSDC.
Beginning in the spring of 2009, with the release of the Genome Reference Consortium Human Build 37 release Ensembl, NCBI and UCSC agree that:
The terms of this document are not meant to be retroactive, and data currently displayed in any of the browsers that do not meet these criteria do not need to be removed. However, we should endeavor to begin bringing all genome assembly data into compliance moving forward.