481dd77123df9a0bd00a6dedb595bbcf6546f3e7 gperez2 Wed Apr 12 18:04:06 2023 -0700 Adding FAQ for differences in conservation scores between download file and hgTracks/hgTables, refs #30734 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index a5c3da2..08a4673 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -41,30 +41,31 @@

Data from Evolutionary Conservation Score tracks

Mapping UCSC STS marker IDS to those of other groups

deCODE map data

Direct MariaDB (MySQL) access to data

Name of fourth column in BED output

Track data access

How do I download dbSNP data?

Why doesn't this SNP have two alleles?

Known issues with Table Browser GTF output

Table Browser output file not ordered

'Permisssion denied' error when trying to use command-line utilities

Restricted Track Data

What is the genome analysis set?

How do I download GenArk data?

Why are the conservation scores different from the ones in the download file?

Return to FAQ Table of Contents

Downloading sequence and annotation data

How do I obtain the sequence and/or annotation data for a release?

Sequence and annotation data downloads are usually made available within the first week of the release of a new assembly. The download directories are automatically updated nightly to incorporate additions and modifications to the data.

You can download sequence and annotation data using our FTP server, but we recommend using rsync, which has the advantage of starting up where it left off @@ -1206,16 +1207,28 @@ than our goldenPath, SQL, or gbdb file directories. There are two ways to access this data for download. First, you can go to the GenArk page and select your clade (primates, mammals, birds, etc.) and then you will be brought to a page with a table of species and GCA/GCF assembly identifiers. Find your genome and click on the third column, labeled "Scientific name and data download", which will take you to the download directory for that species.

Alternatively, you can enter your GCA/GCF identifier in the URL in groups of three characters, seperated by slashes. For example, the identifier "GCA_004027835.1" has data in the following directory:

https://hgdownload.soe.ucsc.edu/hubs/GCA/004/027/835/

+ +

Conservation scores downloads

Why are the conservation scores on the UCSC Genome Browser site different from the ones in the +download file?

+The difference in the conservation scores, for both PhastCons and PhyloP, is that the wiggle +database format (from which the details page and Table Browser scores are extracted) uses lossy +compression that keeps enough resolution to display the pixelated scores in the browser graphic +display but does not reconstruct the true original scores. This is why we make the original score +files available for download. +