d5f33c3bbb34ad7fd6c08714156b9dfbf292d3bf jnavarr5 Tue May 5 16:26:38 2020 -0700 Adding a FAQ about the asmEquivalent table, refs #21074 diff --git src/hg/htdocs/FAQ/FAQreleases.html src/hg/htdocs/FAQ/FAQreleases.html index 59b3caa..c45e048 100755 --- src/hg/htdocs/FAQ/FAQreleases.html +++ src/hg/htdocs/FAQ/FAQreleases.html @@ -310,30 +310,64 @@ Browser?

All the assembly data displayed in the UCSC Genome Browser are obtained from external sequencing centers. To determine the data source and version for a given assembly, see the assembly's description on the Genome Browser Gateway page or the List of UCSC Genome Releases.

The annotations accompanying an assembly are obtained from a variety of sources. The UCSC Genome Bioinformatics Group generates several of the tracks; the remainder are contributed by collaborators at other sites. Each track has an associated description page that credits the authors of the annotation.

For detailed information about the individuals and organizations who contributed to a specific assembly, see the Credits page.

+ +
Which UCSC assemblies are equivalent to Ensembl or NCBI assemblies?
+

+The asmEquivalent table on the hgFixed database is available on the public MySQL server to show +which assemblies versions are identical (or almost identical) to each other between UCSC, Ensembl, +Genbank, and RefSeq assemblies.

+
+mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -e 'desc asmEquivalent;' hgFixed
++----------------------+-------------------------------------------+
+| Field                | Type                                      |
++----------------------+-------------------------------------------+
+| source               | varchar(255)                              |
+| destination          | varchar(255)                              |
+| sourceAuthority      | enum('ensembl','ucsc','genbank','refseq') |
+| destinationAuthority | enum('ensembl','ucsc','genbank','refseq') |
+| matchCount           | bigint(20)                                |
+| sourceCount          | bigint(20)                                |
+| destinationCount     | bigint(20)                                |
++----------------------+-------------------------------------------+
+

+The "Count" indications are the count of individual sequences in the assembly. When all +three counts are identical, matchCount == sourceCount == destinationCount, then the +match between genome assemblies is perfectly identical.

+

+Non-perfect matches can be due to a number of factors: +

    +
  1. different or not included chrMT genome sequences in an assembly
  2. +
  3. identical duplicated sequences present or absent from an assembly
  4. +
  5. some smaller contigs not included in an assembly
  6. +
  7. slight differences in versions of assemblies where some contain sequences not in the other + assembly
  8. +
+

+

Comparison of UCSC and NCBI human assemblies

How do the human assemblies displayed in the UCSC Genome Browser differ from the NCBI human assemblies?

Human assemblies displayed in the Genome Browser (hg10 and higher) are near identical to the NCBI assemblies when it comes to primary sequence. Minor differences may be present, however. Sources include: