bbabbd5d2566d47d923d51dbe350634783455999 mspeir Sun Oct 26 12:14:52 2025 -0700 change soe to gi, refs #35031 diff --git src/hg/htdocs/ENCODE/newsarch.html src/hg/htdocs/ENCODE/newsarch.html index 65a01f3d4a1..6ba8b7c62f8 100755 --- src/hg/htdocs/ENCODE/newsarch.html +++ src/hg/htdocs/ENCODE/newsarch.html @@ -163,31 +163,31 @@
12 Sept 2013 - New UDR ENCODE Download Method Available
The UCSC Genome Browser is pleased to offer a new download protocol to use when downloading large sets of files from our download servers: UDR (UDT Enabled Rsync). UDR utilizes rsync as the transport mechanism, but sends the data over the UDT protocol, which enables huge amounts of data to be downloaded efficiently over long distances.
Remember that we now have two identical download servers to better serve your needs. You can use either one:
-http://hgdownload.soe.ucsc.edu +http://hgdownload.gi.ucsc.edu
Typical TCP-based protocols like http, ftp and rsync have a problem in that the further away the download source is from you, the slower the speed becomes. Protocols like UDT/UDR allow for many UDP packets to be sent in batch, thus allowing for much higher transmit speeds over long distances. UDR will be especially useful for users who are downloading from places that are far away from California. The US East Coat and the international community will likely see much higher download speeds by using UDR rather than rsync, http or ftp.
It should be noted that UDR is not written or managed by UCSC, it was written by the @@ -195,44 +195,44 @@ under Linux, FreeBSD and Mac OSX, but may work under other UNIX variants. The source code can be obtained here, through GitHub:
https://github.com/LabAdvComp/UDR
If you need help building the UDR binaries or have questions about how UDR functions, please read the documentation on the GitHub page, and if necessary, contact the UDR authors via the GitHub page. We recommend reading the documentation on the UDR GitHub page to better understand how UDR works. UDR is written in C++. UDR is Open Source and is released under the Apache 2.0 License. You must first have rsync installed on your system.
For your convenience, we are offering a binary distribution of UDR for Red Hat Enterprise Linux 6.x (or variants such as CentOS 6 or Scientific Linux 6). You'll find both a 64-bit and 32-bit rpm here:
-http://hgdownload.soe.ucsc.edu/admin/udr +http://hgdownload.gi.ucsc.edu/admin/udr
Once you have a working UDR binary, either by building from source or by installing the rpm (if you are using RHEL 6.x or other variant), you can download files from either of our our download servers in a very similar fashion to rsync. For example, using rsync, you may want to download all of the ENCODE information for the mm9 database using the following command:
-
$ rsync -avP rsync://hgdownload.soe.ucsc.edu/goldenPath/mm9/encodeDCC/ /my/local/mm9/+
$ rsync -avP rsync://hgdownload.gi.ucsc.edu/goldenPath/mm9/encodeDCC/ /my/local/mm9/
Using UDR is very similar. The UDR syntax for downloading the same data would be:
-
$udr rsync -avP hgdownload.soe.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/+
$udr rsync -avP hgdownload.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/
If you installed the rpm, use the 'man udr' command for more information via the man page; if you installed from source please refer to the UDR GitHub page for more details on the capabilities of UDR and how to use it.
UDR establishes connections on TCP/9000, then transmits the data stream over UDP/9000-9100. Your institution may need to modify its firewall rules to allow inbound and outbound ports TCP/9000 and UDP/9000-9100 from either of the two download machines.
If you decide to install and use UDR, we hope that you experience greatly increased download speeds. If you have difficulties installing UDR on your system, please contact the Laboratory for Advanced Computing through their gitHub page: https://github.com/LabAdvComp/UDR. @@ -287,31 +287,31 @@ 13 May 2013 - Uniform Peaks of Transcription Factor ChIP-seq from ENCODE/Analysis
UCSC has released a new browser track containing 690 datasets of transcription factor ChIP-seq peaks based on data from all five ENCODE TFBS ChIP-seq production groups from the project inception in 2007 through the ENCODE March 2012 data freeze. The track covers 161 unique regulatory factors (generic and sequence-specific factors), spanning 91 human cell types, some under various treatment conditions.
Browser track:
Transcription Factor ChIP-seq Uniform Uniform Peaks from ENCODE/Analysis
File downloads:
-Directory
+Directory
File selection tool
This track represents peak calls (regions of enrichment) generated by the ENCODE Analysis Working Group (AWG) using the uniform processing pipeline developed for the ENCODE Integrative Analysis effort and published in a set of coordinated papers in September 2012. Peak calls from that effort (based on datasets from the January 2011 ENCODE data freeze) are available at the ENCODE Analysis Data Hub. The new Uniform TFBS track at UCSC includes newer data, slightly modified processing methods, and improved metadata. Quality metrics are included in metadata, with detailed metrics in a quality spreadsheet linked to the track description. Browser users will see the uniform peaks first when using track search for TFBS, and this track is now the default track shown when the ENCODE TF Binding menu item is selected in the browser.
@@ -1905,31 +1905,31 @@
Note that the Variation and Comparative Genomics data were not lifted during this migration; instead, they will be replaced by new data. The first ENCODE MSA alignment for hg18 (TBA) is currently in progress on the UCSC - development + development server.
During the migration, ENCODE tracks with whole-genome data were moved into the standard browser track groups. These include the GIS PET and UCSD/LI TAF1 tracks. Future submissions of whole-genome ENCODE data will be loaded directly into the standard track groups.
We have expanded the ENCODE downloads site to include original data for all "wiggle" datasets. These data files now have filename extensions indicating the wiggle input format (fixed step, variable step, or bedGraph).
@@ -2078,31 +2078,31 @@
7 Oct. 2006 - Comparative Genomics Data Release
Twelve tracks of data produced by the ENCODE Multi-Species Sequence Analysis group have been released to the UCSC public server. These tracks contain multiple sequence alignments, conservation, and conserved (constrained) elements produced by four conservation methods (phastCons, binCons, GERP, SCONE) applied to three sequence alignments (TBA, MLAGAN, MAVID), and also an assessment of the agreement among the alignment methods. The alignments were based on genomic sequence in the ENCODE regions of 28 vertebrate species, as defined in the - MSA September 2005 sequence freeze.
The following tracks can now be found in the ENCODE Comparative Genomics track group on the public ENCODE browser:
Thanks to the following providers of this data: @@ -2638,32 +2638,32 @@ We'd also like to acknowledge the UCSC team members who worked on these annotation tracks: Angie Hinrichs (track development), Galt Barber and Ali Sultan-Qurraie (QA), and Jim Kent and Donna Karolchik (track documentation).
A new ENCODE MSA sequence data freeze is available on the UCSC downloads server. The latest freeze contains sequences from 23 vertebrates provided by NISC, Baylor, the Broad Institute (2X) and the whole genome shotgun (WGS) assemblies. The data may be downloaded as - individual data files or a - directory tarball. + individual data files or a + directory tarball. Aligners are encouraged to upload alignments and related data (such as conservation scores and elements) to the UCSC ENCODE ftp site as soon as possible and then notify Kate Rosenbloom. Other data, (conservation, trees, etc.) will be generated based on this dataset.
The following is a summary of data updates from the previous release:
Thanks to Chunxu Qu and the Ren lab for providing these data.
The ENCODE download area has been reorganized and updated on our public download server. The downloads access page is now:
- http://hgdownload.soe.ucsc.edu/goldenPath/encode/ + http://hgdownload.gi.ucsc.edu/goldenPath/encode/
and the annotations are now located in the assembly-specific download directory, currently:
- http://hgdownload.soe.ucsc.edu/goldenPath/hg16/encode/ + http://hgdownload.gi.ucsc.edu/goldenPath/hg16/encode/
Any web pages referencing the previous UCSC ENCODE downloads will need to be updated. Please contact us if you have any difficulties.
A second set of ENCODE ChIp/Chip data is now available on the July 2003 human genome assembly:
ChIp/Affy Pol2 Pval
ChIp/Affy Pol2 Sites
@@ -2892,34 +2892,34 @@
We are pleased to release the first "official" sequence data freeze for the ENCODE multiple sequence alignment projects. The data formats are described in the - README file, and the sequences and supporting information is collected in the - data directory.
The species included in this freeze are as follows:
_SPECIES_ _SOURCE_
Human hg16
Chimpanzee panTro1
Dog canFam1
Rat rn3
RatB BCM
Mouse mm5
Chicken galGal2
Galago NISC
Baboon NISC
Marmoset NISC
@@ -2977,26 +2977,26 @@
the ENCODE project community, including this home
page to consolidate these resources.
The initial resources include sequences for the current
human assemblies (hg16, hg15, hg13, and hg12), sequence of
the comparative species from
NISC,
tools for coordinate conversion between human
assemblies, format descriptions for data
submission, and contact information for help with
submitting annotation data and analyses.
Bulk downloads of the sequence and annotations may
be obtained from the ENCODE Project
- Downloads
+ Downloads
page. The sequences available here are repeat-masked
versions of the GenBank records.