7fd61a9b8e3db3a18bf285628ed3cd60cdf7731f lrnassar Wed Jul 31 10:19:09 2019 -0700 Updating old entries #23611 diff --git src/hg/htdocs/FAQ/FAQreleases.html src/hg/htdocs/FAQ/FAQreleases.html index 1625e50..8a46eb9 100755 --- src/hg/htdocs/FAQ/FAQreleases.html +++ src/hg/htdocs/FAQ/FAQreleases.html @@ -10,31 +10,30 @@ <h2>Topics</h2> <ul> <li><a href="#release1">List of UCSC genome releases</a></li> <li><a href="#release2">Initial assembly release dates</a></li> <li><a href="#release3">UCSC assemblies</a></li> <li><a href="#release4">Comparison of UCSC and NCBI human assemblies</a></li> <li><a href="#release12">Differences between UCSC and NCBI mouse assemblies</a></li> <li><a href="#release5">Accessing older assembly versions</a></li> <li><a href="#release6">Frequency of GenBank data updates</a></li> <li><a href="#release7">Coordinate changes between assemblies</a></li> <li><a href="#release8">Converting positions between assembly versions</a></li> <li><a href="#release9">Missing annotation tracks</a></li> <li><a href="#release10">What next with the human genome?</a></li> <li><a href="#release11">Mouse strain used for mouse genome sequence</a></li> -<li><a href="#release13">UniProt (Swiss-Prot/TrEMBL) display changes</a></li> </ul> <hr> <p> <a href="index.html">Return to FAQ Table of Contents</a></p> <a name="release1"></a> <h2>List of UCSC genome releases</h2> <h6>How do UCSC's release numbers correspond to those of other organizations, such as NCBI?</h6> <p> The first release of an assembly is given a name using the first three characters of the organism's genus and species classification in the format gggSss#, with subsequent assemblies incrementing the number. Assemblies predating the 2003 introduction of the six-letter naming system were given two-letter names in a similar gs# format and human assemblies are named hg# for human genome.</p> <table border=1> @@ -314,74 +313,67 @@ <a href="#release1">List of UCSC Genome Releases</a>.</p> <p> The annotations accompanying an assembly are obtained from a variety of sources. The UCSC Genome Bioinformatics Group generates several of the tracks; the remainder are contributed by collaborators at other sites. Each track has an associated description page that credits the authors of the annotation.</p> <p> For detailed information about the individuals and organizations who contributed to a specific assembly, see the <a href="../goldenPath/credits.html">Credits</a> page.</p> <a name="release4"></a> <h2>Comparison of UCSC and NCBI human assemblies</h2> <h6>How do the human assemblies displayed in the UCSC Genome Browser differ from the NCBI human assemblies?</h6> <p> -Recent human assemblies displayed in the Genome Browser (hg10 and higher) are identical to the -NCBI assemblies.</p> +Human assemblies displayed in the Genome Browser (hg10 and higher) are near identical to the +NCBI assemblies when it comes to primary sequence. Minor differences may be present, however. +Sources include:</p> +<ul> + <li>We repeat mask our own genomes</li> + <li>All UCSC chroms use the 'chr1' nomenclature, whereas certain NCBI schemas may only + have the numbers (chr1 = 1)</li> + <li>The mitochondrion for hg19 differs from the one in NCBI (GRCh37)</li> +</ul> <a name="release12"></a> <h2>Differences between UCSC and NCBI mouse assemblies</h2> <h6>Is the mouse genome assembly displayed in the UCSC Genome Browser the same as the one on the NCBI website?</h6> <p> The mouse genome assemblies featured in the UCSC Genome Browser are the same as those on the NCBI web site with one difference: the UCSC versions contain only the reference strain data (C57BL/6J). NCBI provides data for several additional strains in their builds.</p> <a name="release5"></a> <h2>Accessing older assembly versions</h2> <h6>I need to access an older version of a genome assembly that's no longer listed in the Genome Browser menu. What should I do?</h6> <p> In addition to the assembly versions currently available in the Genome Browser, you can access the data for older assemblies of the browser through our <a href="http://hgdownload.soe.ucsc.edu/downloads.html" target="_blank">Downloads</a> page.</p> <a name="release6"></a> <h2>Frequency of GenBank data updates</h2> <h6>How frequently does UCSC update its databases with new data from GenBank?</h6> <p> -Daily and weekly incremental updates of mRNA, RefSeq, and EST data are in place for several of the -more recent Genome Browser assemblies. Assemblies that are not on an incremental update schedule +GenBank updates for mRNA, RefSeq, and EST data occur on a semi-quarterly basis, following major +NCBI releases. These updates are in place for most Genome Browser assemblies. Assemblies that +are not on an incremental update schedule are updated whenever we load a new assembly or make a major revision to a table.</p> <p> -Data are updated on the following schedule:</p> -<ul> - <li> - Native and xeno mRNA and refSeq tracks: updated daily for human and mouse assemblies; updated - approximately weekly for all other organisms</li> - <li> - EST data: updated weekly on Saturday morning</li> - <li> - Downloadable data files: updated weekly on Saturday morning</li> - <li> - Outdated sequences - removed once per quarter</li> -</ul> -<p> -Mirror sites are not required to use an incremental update process, and should not experience -problems as a result of these updates.</p> <a name="release7"></a> <h2>Coordinate changes between assemblies </h2> <h6>I noticed that the chromosomal coordinates for a particular gene that I'm looking at have changed since the last time I used your browser. What happened?</h6> <p> A common source of confusion for users arises from mixing up different assemblies. It is very important to be aware of which assembly you are looking at. Within the Genome Browser display, assemblies are labeled by organism and date. To look up the corresponding UCSC database name or NCBI build number, use the <a href="#release1">release table</a>.</p> <p> UCSC database labels are of the form hg<em>#</em>, panTro<em>#</em>, etc. The letters designate the organism, e.g. <em>hg</em> for human genome or <em>panTro</em> for <em>Pan troglodytes</em>. The number denotes the UCSC assembly version for that organism. For example, ce1 refers to the first UCSC assembly of the <em>C. elegans</em> genome.</p> @@ -423,96 +415,16 @@ Rest assured that work will continue. There will be updates to the assembly over the next several years. This has been the case for all other finished (i.e. essentially complete) genome assemblies as gaps are closed. For example, the <em>C. elegans</em> genome has been "finished" for several years, but small bits of sequence are still being added and corrections are being made. NCBI will continue to coordinate the human genome assemblies in collaboration with the individual chromosome coordinators, and UCSC will continue to QC the assembly in conjunction with NCBI (and, to a lesser extent, Ensembl). UCSC, NCBI, Ensembl, and others will display the new releases on their sites as they become available.</p> <a name="release11"></a> <h2>Mouse strain used for mouse genome sequence</h2> <h6>What strain of mouse was used for the Mus musculus genome?</h6> <p> C57BL/6J.</p> -<a name="release13"></a> -<h2>UniProt (Swiss-Prot/TrEMBL) display changes</h2> -<h6>What has UCSC done to accommodate the changes to display IDs recently introduced by UniProt -(aka Swiss-Prot/TrEMBL)?</h6> -<p> -Here is a detailed description of the database changes we have made to accommodate the UniProt -changes. If you are using the <em>proteinID</em> field in our knownGene table or the -Swiss-Prot/TrEMBL display ID for indexing or cross-referencing other data, we strongly suggest you -transition to the UniProt accession number. These changes will also affect anyone who is mirroring -our site.</p> -<ol> - <li> - The latest UniProt Knowledgebase (Release 46.0, Feb. 1st, 2005) was parsed and the results were - stored in a newly created database <em>sp050201</em>.</li> - <li> - A corresponding database, <em>proteins050201</em>, was constructed based on data in - <em>sp050201</em> and other protein data sources.</li> - <li> - Two new symbolic database pointers, <em>uniProt</em> and <em>proteome</em>, have been created to - point to the two new databases mentioned above. Some parts of our programs use the data in these - two DBs. - <pre><code>uniProt ---> sp050201 -proteome ---> proteins050201</code></pre></li> - <li> - The existing protein symbolic database pointers, <em>swissProt</em> and <em>proteins</em> remain - unchanged. Some parts of our programs still use these two pointers and the data in their - associated protein databases. - <pre><code>swissProt ---> sp041115 -proteins ---> proteins041115</code></pre></li> - <li> - Two new tables, <em>spOldNew</em> and <em>uniProtAlias</em>, have been added to the proteome - database.<br><br> - The <em>spOldNew</em> table contains three columns: - <ul> - <li><em>acc</em> -- primary accession number</li> - <li><em>oldDisplayId</em> -- old display ID</li> - <li><em>newDisplayId</em> -- new display ID</li> - </ul> - <br> - The uniProtAlias table contains four columns: - <ul> - <li><em>acc</em> -- UniProt accession number</li> - <li><em>alias</em> -- alias (could be acc, old and new display IDs, etc.)</li> - <li><em>aliasSrc</em> -- source of the alias type</li> - <li><em>aliasSrcDate</em> -- date of the source data</li> - </ul> - <p> - The aliases include primary accessions, secondary accessions new display IDs, old display IDs, - and old display IDs corresponding to new secondary accessions.</p> - <li> - Three new functions have been added to <em>kent/src/hg/spDb.c</em>: - <pre><code>char *oldSpDisplayId(char *newSpDisplayId); -/* Convert from new Swiss-Prot display ID to old display ID */ - -char *newSpDisplayId(char *oldSpDisplayId); -/* Convert from old Swiss-Prot display ID to new display ID */ - -char *uniProtFindPrimAcc(char *id); -/* Return primary accession given an alias. */</code></pre> - The <em>uniProtFindPrimAcc()</em> function is enabled by the new <em>uniProtAlias</em> - table.</li> -</ol> -<p> -We anticipate additional changes down the road and may eventually merge the two sets of protein DB -pointers into one set.</p> -<p> -Currently, the <em>proteinID</em> field of the knownGene table for existing genome releases (hg15, -hg16, hg17, mm3, mm4, mm5, rn2, and rn3) uses old Swiss-Prot/TrEMBL display IDs (pre-1 Feb. '05). -In the future, we may change this field to show the UniProt accession number. Should we choose not -to change the content of the <em>proteinID</em> field, we may consider adding a new field, -<em>uniProtAcc</em>.</p> -<p> -If you have any questions about these changes and their impact on your work, please email us at -<a href="mailto:genome@soe.ucsc.edu">genome@soe.ucsc.edu</a>. Mirror sites may send questions to -<a href="mailto:genome-mirror@soe.ucsc.edu">genome-mirror@soe.ucsc.edu</a>. -<strong><em><span class="gbsWarnText">Messages sent to these addresses will be posted to the -moderated mailing lists, which are archived on a SEARCHABLE, PUBLIC -<a HREF="https://groups.google.com/a/soe.ucsc.edu/forum/#!forum/genome-mirror">Google Groups -forum</a></span></em></strong>.</p> - <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->