198c9b8daecc44fbda6a6494c566c723920f030a lrnassar Wed Mar 11 18:25:21 2026 -0700 Fixing a few hundred clear typos with the help of Claude. Some are less important in code comments, but majority of them are in user-facing places. I manually approved 60%+ of the changes and didn't see any that were an incorrect suggestion, at worst it was potentially uncessesary, like a code comment having cant instead of can't. No RM. diff --git src/hg/htdocs/FAQ/FAQdownloads.html src/hg/htdocs/FAQ/FAQdownloads.html index cee9b1932a6..c177e8038ad 100755 --- src/hg/htdocs/FAQ/FAQdownloads.html +++ src/hg/htdocs/FAQ/FAQdownloads.html @@ -38,31 +38,31 @@
Return to FAQ Table of Contents
Sequence and annotation data downloads are usually made available within the first week of the @@ -77,31 +77,31 @@ Downloads page or our DAS server. To download a specific subset of the data or to configure the output format of the data, use the Table Browser. For information on extracting a large set of sequences from an assembly, see Extracting sequence in batch from an assembly.
For more information on using the UCSC DAS server, see Downloading data from the UCSC DAS server.
Another option for querying sequence and annotation data is the REST API. This interface allows for extraction of sequence and annotations from both UCSC assemblies and from hubs.
-To quickly download large volumes of data you can use UDR (UDT Enabled Rysnc): UDR +To quickly download large volumes of data you can use UDR (UDT Enabled Rsync): UDR provides users much faster download rates. Here is an example using UDR, once installed, to download all the mouse mm9 ENCODE information that amounts to several terabytes:
$ udr rsync -avP hgdownload.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/
Optional: download from our secondary download server.
$ udr rsync -avP hgdownload2.gi.ucsc.edu::goldenPath/mm9/encodeDCC/ /my/local/mm9/
Please read more about the new UDR method here.
You can migrate sequences from one assembly to another by using the Blat alignment tool or by converting assembly coordinates. There are two conversion tools available on the Genome Browser web site: the Convert utility and the LiftOver tool. The Convert utility, which is accessed from the View menu on the Genome Browser annotation tracks page, supports forward, reverse, and cross-species conversions, but does not accept batch input. The LiftOver tool, accessed via the Tools link on the Genome Browser home page, also supports forward, reverse, and cross-species conversions, as well as batch conversions.
-Note: It is not recommeneded to use LiftOver to convert SNPs between assemblies, +Note: It is not recommended to use LiftOver to convert SNPs between assemblies, and more information about how to convert SNPs between assemblies can be found on the following FAQ entry.
If you wish to update a large number of coordinates to a different assembly and have access to a Linux platform, you may find it useful to try the command-line version of the LiftOver tool. The executable file for this utility can be downloaded here. LiftOver requires a pre-generated over.chain file as input, available for selected assemblies from the Downloads page. If the desired file is not available, send a request to the genome mailing list and we may be able to provide you with one.
Here is an example on how to set up and run LiftOver from the command line:
@@ -1187,31 +1187,31 @@Most of our tables have a special first column called "bin" that helps with quickly displaying data on the Genome Browser. This (chrom,bin) index causes query results to be ordered first by bin, then by chromStart. This allows us to query and return results more quickly than if they were sorted by chromStart.
A quick way to sort an output BED file by position is to use the following UNIX command on our Table Browser output BED file:
sort -k1,1 -k2n,2n example.bed > example.sorted.bed
-
In order for your computer to run a freshly downloaded utility, you will need to update the file
system permissions to allow your operating system to run the program.
To make utilities usable, turn on its 'executable' bit:
$ chmod +x ./filePath
$ ./filePath/utility_name
Example:
$ chmod +x /home/user/liftover/liftOver
See also: http://en.wikipedia.org/wiki/Chmod
@@ -1229,31 +1229,31 @@
For certain genomes (GRCm38/mm10, GRCh37/hg19, GRCh38/hg38), NCBI provides an analysis set in addition to the standard genome files. These are FASTA files with modified sequence identifiers and index files convenient for analysis with Next Generation Sequencing tools. These files are particularly helpful for NGS pipelines including variant calling and RNA-Seq analysis.
Though not all analysis sets contain the same information, features include:
For more information on analysis sets, see the NCBI FAQ. Information on what is contained in each specific assembly analysis set can be found in the README by clicking the Genome sequence files link for the assembly of interest in our Downloads page.
https://genome.ucsc.edu/h/GCF_000951035.1
The downloads data for these assemblies is stored in a different location than our goldenPath, SQL, or gbdb file directories. There are two ways to access this data for download. First, you can go to the GenArk page and select your clade (primates, mammals, birds, etc.) and then you will be brought to a page with a table of species and GCA/GCF assembly identifiers. Find your genome and click on the third column, labeled "Scientific name and data download", which will take you to the download directory for that species.
Alternatively, you can enter your GCA/GCF identifier -in the URL in groups of three characters, seperated by slashes. For example, the +in the URL in groups of three characters, separated by slashes. For example, the identifier "GCA_004027835.1" has data in the following directory:
https://hgdownload.gi.ucsc.edu/hubs/GCA/004/027/835/
The difference in the conservation scores, for both PhastCons and PhyloP, is that the wiggle database format (from which the details page and Table Browser scores are extracted) uses lossy compression that keeps enough resolution to display the pixelated scores in the browser graphic display but does not reconstruct the true original scores. This is why we make the original score files available for download.