198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index cee9b1932a6..c177e8038ad 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -38,31 +38,31 @@

RepeatMasker version differences - UCSC vs. Repeatmasker website

Obtaining promoter sequence

Data from Evolutionary Conservation Score tracks

Minus strand coordinates - axtNet files

Mapping UCSC STS marker IDS to those of other groups

deCODE map data

Direct MariaDB (MySQL) access to data

Name of fourth column in BED output

Track data access

How do I download dbSNP data?

Why doesn't this SNP have two alleles?

Known issues with Table Browser GTF output

Table Browser output file not ordered

'Permisssion denied' error when trying to use command-line utilities

'Permission denied' error when trying to use command-line utilities

Restricted Track Data

What is the genome analysis set?

How do I download GenArk data?

Why are the conservation scores different from the ones in the download file?

How to make programmatic queries to the Genome Browser

Return to FAQ Table of Contents

Downloading sequence and annotation data

How do I obtain the sequence and/or annotation data for a release?

Sequence and annotation data downloads are usually made available within the first week of the @@ -77,31 +77,31 @@ Downloads page or our DAS server. To download a specific subset of the data or to configure the output format of the data, use the Table Browser. For information on extracting a large set of sequences from an assembly, see Extracting sequence in batch from an assembly.

For more information on using the UCSC DAS server, see Downloading data from the UCSC DAS server.

Another option for querying sequence and annotation data is the REST API. This interface allows for extraction of sequence and annotations from both UCSC assemblies and from hubs.

-To quickly download large volumes of data you can use UDR (UDT Enabled Rysnc): UDR +To quickly download large volumes of data you can use UDR (UDT Enabled Rsync): UDR provides users much faster download rates. Here is an example using UDR, once installed, to download all the mouse mm9 ENCODE information that amounts to several terabytes:

$ udr rsync -avP hgdownload.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/

Optional: download from our secondary download server.

$ udr rsync -avP hgdownload2.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/

Please read more about the new UDR method here.

Metadata tables for GenBank and RefSeq moved to hgFixed database

I can no longer find metadata tables like gbCdnaInfo for an assembly.

@@ -639,31 +639,31 @@

Converting genome coordinates between assemblies

I've been researching a specific area of the human genome on the current assembly, and now you've just released a new version. Is there an easy way to locate my area of interest on the new assembly?

You can migrate sequences from one assembly to another by using the Blat alignment tool or by converting assembly coordinates. There are two conversion tools available on the Genome Browser web site: the Convert utility and the LiftOver tool. The Convert utility, which is accessed from the View menu on the Genome Browser annotation tracks page, supports forward, reverse, and cross-species conversions, but does not accept batch input. The LiftOver tool, accessed via the Tools link on the Genome Browser home page, also supports forward, reverse, and cross-species conversions, as well as batch conversions.

-Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, +Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry.

If you wish to update a large number of coordinates to a different assembly and have access to a Linux platform, you may find it useful to try the command-line version of the LiftOver tool. The executable file for this utility can be downloaded here. LiftOver requires a pre-generated over.chain file as input, available for selected assemblies from the Downloads page. If the desired file is not available, send a request to the genome mailing list and we may be able to provide you with one.

Using liftOver

Here is an example on how to set up and run LiftOver from the command line:

@@ -1187,31 +1187,31 @@

Table Browser output file order

My table browser output file is not ordered by position, how is it ordered?

Most of our tables have a special first column called "bin" that helps with quickly displaying data on the Genome Browser. This (chrom,bin) index causes query results to be ordered first by bin, then by chromStart. This allows us to query and return results more quickly than if they were sorted by chromStart.

A quick way to sort an output BED file by position is to use the following UNIX command on our Table Browser output BED file:

sort -k1,1 -k2n,2n example.bed > example.sorted.bed

'Permisssion denied' error when trying to use command-line utilities

'Permission denied' error when trying to use command-line utilities

Why do I get a 'Permission denied' error when I try to run command-line utilities?

In order for your computer to run a freshly downloaded utility, you will need to update the file system permissions to allow your operating system to run the program.
To make utilities usable, turn on its 'executable' bit:

 $ chmod +x ./filePath

 $ ./filePath/utility_name

Example:

$ chmod +x /home/user/liftover/liftOver

See also: http://en.wikipedia.org/wiki/Chmod

@@ -1229,31 +1229,31 @@

Analysis set

Some genomes in the download server also reference an analysis set, what is the difference?

For certain genomes (GRCm38/mm10, GRCh37/hg19, GRCh38/hg38), NCBI provides an analysis set in addition to the standard genome files. These are FASTA files with modified sequence identifiers and index files convenient for analysis with Next Generation Sequencing tools. These files are particularly helpful for NGS pipelines including variant calling and RNA-Seq analysis.

Though not all analysis sets contain the same information, features include:

Removal of alternate and fix sequences which can interfere with read alignment programs
Hard masking of duplicate copies of the pseudo-autosomal regions (PARs) and centromeric -arrays
+arrays
Addition of "decoy" sequences
Index files generated by BWA, Samtools, Bowtie and HISAT2

For more information on analysis sets, see the NCBI FAQ. Information on what is contained in each specific assembly analysis set can be found in the README by clicking the Genome sequence files link for the assembly of interest in our Downloads page.

GenArk Downloads

How do I download GenArk assembly hub data for my species?

@@ -1264,31 +1264,31 @@ number. You can also access the browsers for these species directly with links in the following format:

https://genome.ucsc.edu/h/GCF_000951035.1

The downloads data for these assemblies is stored in a different location than our goldenPath, SQL, or gbdb file directories. There are two ways to access this data for download. First, you can go to the GenArk page and select your clade (primates, mammals, birds, etc.) and then you will be brought to a page with a table of species and GCA/GCF assembly identifiers. Find your genome and click on the third column, labeled "Scientific name and data download", which will take you to the download directory for that species.

Alternatively, you can enter your GCA/GCF identifier -in the URL in groups of three characters, seperated by slashes. For example, the +in the URL in groups of three characters, separated by slashes. For example, the identifier "GCA_004027835.1" has data in the following directory:

https://hgdownload.gi.ucsc.edu/hubs/GCA/004/027/835/

Conservation scores downloads

Why are the conservation scores on the UCSC Genome Browser site different from the ones in the download file?

The difference in the conservation scores, for both PhastCons and PhyloP, is that the wiggle database format (from which the details page and Table Browser scores are extracted) uses lossy compression that keeps enough resolution to display the pixelated scores in the browser graphic display but does not reconstruct the true original scores. This is why we make the original score files available for download.