01e223245a85e1ae6d1f2e914c64367cfd006ea9 ccpowell Tue Jul 9 16:00:29 2019 -0700 Changing MySQL to MariaDB in documentation, refs #23597 diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index d22d10d..621592b 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -30,31 +30,31 @@
Return to FAQ Table of Contents
@@ -141,31 +141,31 @@
These tables are also accessible from:
There are two ways to extract genomic sequence in batch from an assembly:
A. Download the appropriate fasta files from our ftp server and extract sequence data using your own tools or the tools from our source tree. This is the recommended method when you have very large sequence datasets or will be extracting data frequently. Sequence data for most assemblies is located in the assembly's "chromosomes" subdirectory on the downloads server. For example, @@ -247,34 +247,34 @@
Microsoft Word or any program that can handle large text files will do. Some of the chromosomes begin with long blocks of Ns. You may want to search for an A to get past them.
Unless you have a particular need to view or use the raw data files, you might find it more interesting to look at the data using the Genome Browser. Type the name of a gene in which you're interested into the position box (or use the default position), then click the submit button. In the resulting Genome Browser display, click the DNA link on the menu bar at the top of the page. Select the Extended case/color options button at the bottom of the next page. Now you can color the DNA sequence to display which portions are repeats, known genes, genetic markers, etc.
-
-Yes. The Genome Browser and Table Browser are both driven by the same underlying MySQL database. +Yes. The Genome Browser and Table Browser are both driven by the same underlying MariaDB database. Check that your downloaded tables are from the same assembly version as the one you are viewing in the Genome Browser. If the assembly dates don't match, the coordinates of the data within the tables may differ. In a very rare instance, you could also be affected by the brief lag time between the update of the live databases underlying the Genome Browser and the time it takes for text dumps of these databases to become available in the downloads directory.
The characters most commonly seen in sequence are A, C, G, T, and N, but there are several other valid characters that are used in clones to indicate ambiguity about the identity of certain bases in the sequence. It's not uncommon to see these @@ -774,40 +774,40 @@ this ID to look it up in the stsMap table where the marker is located. For example, D10S249 has UCSC ID 2880 and is located at chr10:240791-241019.
You can obtain this information from the combination of a couple of tables. The stsMap table contains the physical position of all STS markers, including those on the deCODE map. This file also contains information about the position on the genome-wide maps, including the deCODE map. A second file, stsInfo2, contains additional information about each marker, including aliases, primer sequence information, etc. This table is related to the first table by an ID (the identNo field in both files).
-Yes. See our documentation on Downloading Data using -MySQL.
+MariaDB.-Connect to the US MySQL server using the command:
+Connect to the US MariaDB server using the command:mysql --user=genome --host=genome-mysql.soe.ucsc.edu -A
-Or to the European MySQL server using the command:
+Or to the European MariaDB server using the command:
mysql --user=genome --host=genome-euro-mysql.soe.ucsc.edu -A
The fourth column of the BED output contains a lot of information separated by underscores. For example:
uc009vjk.2_cds_1_0_chr1_324343_f
This information is represented as follows:
ucscId_sequenceType_sequenceTypeNumber_basesAdded_chromosome_positionOfFirstBaseOfItem_strand
The raw data underlying a track can be explored interactively with the Table Browser, Data Integrator, or Variant Annotation Integrator. For automated analysis, the genome annotation can be downloaded from the downloads server, one of our two -public MySQL servers, or +public MariaDB servers, or using our JSON API.
bigBed data: For bigBed files, individual regions or the whole genome annotation can be obtained using our tool bigBedToBed which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain only features within a given range using one of the hgdownload servers, example:
bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout
bigBedToBed http://hgdownload-euro.soe.ucsc.edu/gbdb/path/to/file/bigBedfile.bb -chrom=chr21 -start=0 -end=1000000 stdout
-SNP data: If queries against the SNP table on one of our public MySQL servers or on your -own MySQL installation are slow, then they can be sped up by using the "bin" field; you +SNP data: If queries against the SNP table on one of our public MariaDB servers or on your +own MariaDB installation are slow, then they can be sped up by using the "bin" field; you can contact us for more information.
Read more in our blog about Accessing the Genome Browser Programmatically to acquire data.
Currently, the Table Browser does not have an option return data as
GTF files. Currently, the best method to obtain
GTF files is to use the command-line format conversion utility, genePredToGtf
. This can be set up
@@ -897,31 +897,31 @@
includes proper start and stop codons.
genePredToGtf
command-line
utility can be used to convert genePred to GTF. Download the genePredToGtf
operating
system-specific command-line utility from the
utilities directory.
Please see the Genes in GTF
or GFF Format wiki page for examples and various methods for conversion. The genePredToGtf
utility can convert files from several sources, such as Table Browser output from a genePred table,
a local downloaded gene set table like refGene.txt, or from querying
-public MySQL tables.
Most of our tables have a special first column called "bin" that helps with quickly displaying data on the Genome Browser. This (chrom,bin) index causes query results to be ordered first by bin, then by chromStart. This allows us to query and return results more quickly than if they were sorted by chromStart.
A quick way to sort an output BED file by position is to use the following UNIX command on our Table Browser output BED file:
sort -k1,1 -k2n,2n example.bed > example.sorted.bed