bbabbd5d2566d47d923d51dbe350634783455999 mspeir Sun Oct 26 12:14:52 2025 -0700 change soe to gi, refs #35031 diff --git src/hg/htdocs/goldenPath/help/cloud.html src/hg/htdocs/goldenPath/help/cloud.html index a73904f2b0b..33b081f0f80 100755 --- src/hg/htdocs/goldenPath/help/cloud.html +++ src/hg/htdocs/goldenPath/help/cloud.html @@ -87,58 +87,58 @@

Where can I learn more about Amazon Tools?

What is the Amazon s3://genome-browser bucket?

S3 stands for Simple Storage Service, and it is the name for cloud storage in Amazon Web Services (AWS). The data available through S3 is essentially stored in a folder called a bucket, and files are called objects. The s3://genome-browser bucket is a copy of the main data available on our -UCSC Genome Browser Download website: https://hgdownload.soe.ucsc.edu/downloads.html

+UCSC Genome Browser Download website: https://hgdownload.gi.ucsc.edu/downloads.html

By placing our Download server files in an S3 bucket, developers working in the cloud can more easily integrate with UCSC data. You can learn more about how S3-object-based storage works, and its advantages of being accessible anywhere across the world with low latency and high durability by reviewing Amazon's S3 documentation.

What specific files are in the s3://genome-browser bucket?

-The data mirrors our UCSC Genome Browser Download website's main rsync directories:

 UCSC Human Golden Path Downloads             s3://genome-browser/goldenPath
 UCSC Human Genome Browser Gbdb Data Files    s3://genome-browser/gbdb
 UCSC Human Genome Raw Mysql Tables           s3://genome-browser/mysql
 UCSC Human Genome Web Site CGI Binaries      s3://genome-browser/cgi-bin
 UCSC Human Genome Web Site Htdocs            s3://genome-browser/htdocs

The goldenPath directory is organized by assembly name, and represents the file structure on our Download server, which includes README.txt files. For instance, the sequence data for the human hg38 assembly would be found in this location with an instructive README.txt: goldenPath/hg38/bigZips/README.txt. The README.txt, also -available on the Download website, informs that the most recent patch-inclusive sequence is found in goldenPath/hg38/bigZips/latest/.
The gbdb directory, also organized by assembly name, provides access to genome browser database files in binary format used by the browser software. For instance, the underlying binary indexed sequence data for the hg38 databases used in the display in the UCSC Genome Browser would be located in the following location, gbdb/hg38/hg38.2bit, matching the file in the goldenPath/hg38/bigZips/latest/ directory, reflecting how these files are operated on by the UCSC Genome Browser software in order to display assembly sequence when browsing.
The mysql directory, also organized by assembly name, provides access to MySQL database tableName.MYD files, and their related tableName.MYI index and tableName.frm format files, providing a copy of the tables used by the main Browser site.

By reviewing example data access URLs demonstrating of list and getData functions and further practical examples URLs of extracting specific track data items you can learn more about the ways of using the API to extract data.

What is the Download server and how does one use it?

-The UCSC Genome Browser Download website, hgdownload.soe.ucsc.edu, is the source of the data +The UCSC Genome Browser Download website, hgdownload.gi.ucsc.edu, is the source of the data hosted in the Amazon s3://genome-browser bucket. It can be viewed in a web browser to access specific download files, or the data can be copied with rysnc commands.

Examples

For instance, the following rsync command will show you the various rysnc directories available on our Download server:

-$ rsync -a -P rsync://hgdownload.soe.ucsc.edu/ 
+$ rsync -a -P rsync://hgdownload.gi.ucsc.edu/ 
 
 genome         UCSC Human Genome Downloads
 sars           UCSC Human Genome SARS Downloads
 htdocs         UCSC Human Genome Web Site Htdocs
 goldenPath     UCSC Human Golden Path Downloads
 cgi-bin        UCSC Human Genome Web Site CGI Binaries x86_64
 cgi-bin-i386   UCSC Human Genome Web Site CGI Binaries i386
 gbdb           UCSC Human Genome Browser Gbdb Config Files
 archives       UCSC Human Genome Browser Archived Config Files
 mysql          UCSC Human Genome Raw Mysql Tables
 gbib           UCSC Genome Browser in a Box
 hubs           UCSC Genome Browser Public Hubs

For instance, here is an example of accessing a README file in the goldenPath/Downloads directory:
-rsync -a -P rsync://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/README.txt ./

rsync -a -P rsync://hgdownload.gi.ucsc.edu/goldenPath/hg38/bigZips/README.txt ./

And here is an example link that would access the gbdb/ binary data directory for the human hg38 assembly 2bit file:
-rsync -a -P rsync://hgdownload.soe.ucsc.edu/gbdb/hg38/hg38.2bit ./

rsync -a -P rsync://hgdownload.gi.ucsc.edu/gbdb/hg38/hg38.2bit ./

And here is an example link that would access our publications html page from the bucket's htdocs/ hypertext document directory:
-rsync -a -P rsync://hgdownload.soe.ucsc.edu/htdocs/goldenPath/pubs.html ./

rsync -a -P rsync://hgdownload.gi.ucsc.edu/htdocs/goldenPath/pubs.html ./

Many of these rsync directories exist to support the Genome Browser in a Cloud (GBiC) and the Genome Browser in a Box (GBiB) software products discussed below. Also note that there is a mirror of the download server available in Europe so the above rysnc commands can also be pointed to the hgdownload-euro locations.

For instance here is a command to access data from the Europe location:
rsync -a -P rsync://hgdownload-euro.soe.ucsc.edu/gbdb/hg38/hg38.2bit ./

What is the MySQL server and how does one use it?

The UCSC Genome Browser uses MariaDB (fork of MySQL) as the backend database server and maintains -a public server at genome-mysql.soe.ucsc.edu to allow direct queries.

genome-mysql.gi.ucsc.edu

Examples

For instance, here is an example of accessing the hg38 human assembly database and selecting from the table trackDb all the entries in the group (grp) "genes" and ordering those entries by tableName:
-mysql -h genome-mysql.soe.ucsc.edu -u genome -NBe 'select tableName from trackDb where grp = "genes" order by tableName' hg38 +mysql -h genome-mysql.gi.ucsc.edu -u genome -NBe 'select tableName from trackDb where grp = "genes" order by tableName' hg38
And here is an example of accessing a specific Transcription Factor Binding Site (TFBS) table wgEncodeRegTfbsClusteredV3 on the human hg19 assembly and selecting entries from a 500 base pair region on chr1:
-mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A -Ne 'select chrom,chromStart,chromEnd,name,score +mysql --user=genome --host=genome-mysql.gi.ucsc.edu -A -Ne 'select chrom,chromStart,chromEnd,name,score from wgEncodeRegTfbsClusteredV3 where chrom = "chr1" and chromStart > 10000 and chromEnd < 10500;' hg19
And here is an example query that will pull all the long non-coding entries (lncRNA) from the wgEncodeGencodeBasicV39 table on the hg38 genome:
-mysql -u genome -h genome-mysql.soe.ucsc.edu hg38 -e 'select g.name,a.transcriptType from wgEncodeGencodeBasicV39 g, +mysql -u genome -h genome-mysql.gi.ucsc.edu hg38 -e 'select g.name,a.transcriptType from wgEncodeGencodeBasicV39 g, wgEncodeGencodeAttrsV39 a where (g.name = a.transcriptId) and (a.transcriptType = "lncRNA");'

See the Downloading Data using MariaDB (MySQL) for more information. Also, there is a mirror of the MariaDb server available in Europe so commands can also be pointed to the genome-euro-mysql location.

For instance here is a command to access hg38 data from the Europe location:
mysql -h genome-mysql-euro.soe.ucsc.edu -u genome -NBe 'show tables' hg38