Question:
How do I learn more about different ENCODE file formats? For example what is the difference between a file.bed and a file.bed9 in the ENCODE methylation data?

Response:
By clicking the File Formats link from the ENCODE portal page you can reach a list of various file formats used in ENCODE. Every ENCODE file has metadata included under a "files.txt" file in the related downloads page. - For example, from the HudsonAlpha DNA methylation download page, - in the files.txt + For example, from the HudsonAlpha DNA methylation download page, + in the files.txt file, a line after the specific bed9 file in question, wgEncodeHaibMethylRrbsAg04449UwstamgrowprotSitesRep1.bed9, reads 'objstatus=replaced'. This metadata indicates this bed9 file was preliminary data that has since been replaced. A similar note in the automatically displayed README file states: "WARNING - Revoked and replaced data files may be present in this directory."

@@ -447,37 +447,37 @@

DOWNLOAD ALL ENCODE DATA

Question:
Is there a service providing ENCODE data on a hard drive? What is the total data volume? We have been trying FTP, but it takes too much bandwidth and time.

Response:
The total volume of ENCODE data are greater than 31 TB. Unfortunately, it is not possible for you to obtain a disk copy, however, there is a new protocol to try called UDR (UDT Enabled Rsync). UDR provides users much faster download rates.

Here is an example using UDR, once installed, to download all the mouse mm9 ENCODE information:

$ udr rsync -avP hgdownload.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/

$ udr rsync -avP hgdownload.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/

Please read more about the new UDR method here.

For those not downloading high amounts of data, we highly recommend using rsync. For example:

$ rsync -a -P rsync://hgdownload.soe.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeDir/wgEncodeFile ./

$ rsync -a -P rsync://hgdownload.gi.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeDir/wgEncodeFile ./

Using rsync has the advantage of starting up where it left off after a failure, when run again.

Question:
@@ -539,31 +539,31 @@ For example downloadable files in the wgEncodeCaltechRnaSeq/ directory have a gene_id format like gene_id "GM12878-rep1.1045777" where the first part is the cell type. Would you know what does the last number 1045777 means?

Response:
At the top of the page for each of the download directories you are visiting there is a README.txt file that is automatically displayed. A link is provided that will bring you to a user interface enabling filtering of files by cell type and other parameters, as well as including additional information such as release status, restriction dates, track description, methods, and metadata that can answer such questions.

For example in the README.txt file displayed at the top of the page in the - Caltech RNA-seq directory + Caltech RNA-seq directory you can find the following link: "http://genome.ucsc.edu/cgi-bin/hgFileUi?db=hg19&g=wgEncodeCaltechRnaSeq"

By navigating to the page above, Caltech RNA-seq Downloadable Files, you can scroll to the bottom (or click the "Description" link in the top right corner) and read the track description's "Methods" section. In the "Data Processing and Analysis" section there is information explaining how the numbers in gene_id, "GM12878-rep1.####" represent de novo identifiers output by Cufflinks software. At the very bottom of the page is a "Credits" section where contacts are listed. You should send remaining process-specific questions about the data you are investigating to the appropriate contact listed.

@@ -680,31 +680,31 @@

Question:
What program reads ".bb" TFBS files from ENCODE? I am interested in looking at the AWG TFBS data. I downloaded the files and one is called: spp.optimal.wgEncodeBroadHistoneGm12878CtcfStdAlnRep0_VS_wgEncodeBroadHistoneGm12878ControlStdAlnRep0.bb

However, I do not have a program that can open this file. What is the program for this file and where can I find it?

Response:
Files ending in ".bb" are bigBed files. Click here for extensive information on the bigBed format and how to extract data with different binary utilities located in this - directory. + directory.

Question:
I am making a public hub for my paper, is there an example html file to use for my data description?

Response:
@@ -727,29 +727,29 @@

Other Examples:

Here are a few good examples of hub structure and configuration from the ENCODE Analysis hub:

http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hub.txt
http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/genomes.txt
http://ftp.ebi.ac.uk/pub/databases/ensembl/encode/integration_data_jan2011/hg19/trackDb.txt

Note: We recommend a minimal number of default visible tracks in your trackDb.txt to quicken hub loading time and to avoid overwhelming users. For more suggestions on hub structure, please see our - Public Hub Guidelines + Public Hub Guidelines wikipage. Also, for help defining unfamiliar terms, you may want to see the Hub Track Database Definition's table of contents.

ENCODE FILE FORMATS

ENCODE SCORE DEFINITION

DOWNLOAD ALL ENCODE DATA

ENCODE PAPERS

HUB EXAMPLES