cb718a9f9f9d2c9d43a92a618da91be0de85ed3d
hiram
Mon Jul 21 16:32:16 2025 -0700
document the findGenome endpoint and reveal the /list/genarkGenomes endpoing function refs #35468
diff --git src/hg/htdocs/goldenPath/help/api.html src/hg/htdocs/goldenPath/help/api.html
index b712d0a1358..95bda4c5860 100755
--- src/hg/htdocs/goldenPath/help/api.html
+++ src/hg/htdocs/goldenPath/help/api.html
@@ -96,30 +96,31 @@
What is the access URL?
This access URL: https://api.genome.ucsc.edu/ is used to access
the endpoint functions. For example:
wget -O- 'https://api.genome.ucsc.edu/list/publicHubs'
What type of data can be accessed?
The following data sets can be accessed at this time:
+- Find a genome in the UCSC browser with a search string
- List of available public hubs
- List of available UCSC Genome Browser genome assemblies
- List of files available for download for UCSC Browser genome assemblies
- List of genomes from a specified assembly or track hub
- List of available data tracks from a specified hub or UCSC Genome Browser genome assembly
(see also: track definition help)
- List of chromosomes contained in an assembly hub or UCSC Genome Browser genome assembly
- List of chromosomes contained in a specific track of an assembly or track hub, or UCSC Genome
Browser genome assembly
- Return DNA sequence from an assembly hub 2bit file, or UCSC Genome Browser assembly
- Return track data from a specified assembly or track hub, or UCSC Genome Browser assembly
- Return search matches to words in track data, track names, track descriptions, public hub
track names, and public hub descriptions within a UCSC Genome Browser genome assembly
@@ -127,100 +128,107 @@
BLAT FAQ for more info.
Endpoint functions to return data
The URL https://api.genome.ucsc.edu/ is used to access
the endpoint functions. For example:
curl -L 'https://api.genome.ucsc.edu/list/ucscGenomes'
+- /findGenome - search for a genome in the UCSC browser
- /list/publicHubs - list public hubs
- /list/ucscGenomes - list UCSC Genome Browser database genomes from database host
-
- /list/hubGenomes - list genomes from specified hub
- /list/files - list download files available for specified genome
- /list/tracks - list data tracks available in specified hub or database genome
(see also: track definition help)
- /list/chromosomes - list chromosomes from a data track in specified hub or database
- /list/schema - list the schema for a data track in specified hub or database
genome
- /getData/sequence - return sequence from specified hub or database genome
- /getData/track - return data from specified track in hub or database genome
- /search - return search matches within a UCSC Genome Browser genome assembly
Parameters to endpoint functions
+- maxItemsOutput=1000000 - limit number of items to output, default: 1,000,000, maximum limit:
+1,000,000 (use -1 to get maximum output)
- hubUrl=<url> - specify track hub or assembly hub URL
- genome=<name> - specify genome assembly in UCSC Genome Browser or track/assembly hub. Use with with /list/genarkGenomes to test for existence.
- track=<trackName> - specify data track in track/assembly hub or UCSC database genome
assembly
- chrom=<chrN> - specify chromosome name for sequence or track data
- start=<123> - specify start coordinate (0 relative) for data from track or sequence
retrieval (start and end required together). See also: UCSC browser coordinate counting systems
- end=<456> - specify end coordinate (1 relative) for data from track or sequence
retrieval (start and end required together). See also: UCSC browser coordinate counting systems
-- revComp=1 - on /getData/sequence function, return reverse complement of sequence data
-- maxItemsOutput=1000 - limit number of items to output, default: 1,000, maximum limit:
-1,000,000 (use -1 to get maximum output)
+- q=<search word(s)> - used with /findGenome, a search string
+- browser=<mustExist|mayExist|notExist> - used with /findGenome, mustExist result only for assemblies in the UCSC browser, mayExist may exist in the UCSC browser, or may not, notExist not yet available in the browser. default is mustExist
+- statsOnly=1 - on /findGenome function, only show statistics about search result
+- year=<2025> - on /findGenome function, only show search result for given year, default is any year
+- category=<reference|representative> - on /findGenome function, show search result only for given NCBI category of assembly
+- status=<reference|representative> - on /findGenome function, show search result only for given NCBI status of assembly
+- level=<complete|chromosome|scaffold|contig> - on /findGenome function, show search result only for given NCBI level of assembly
- trackLeavesOnly=1 - on /list/tracks function, only show tracks, do not show
composite container information
+- revComp=1 - on /getData/sequence function, return reverse complement of sequence data
- jsonOutputArrays=1 - on /getData/track function, JSON format is array type
for each item of data, instead of the default object type
- format=text - on /list/files function, return plain text listing
of download files instead of JSON format output (which includes more meta-data information). Text output contains less meta-data in comment lines prefixed by the '#' hash character.
- search=<term>&genome=<name> - on /search function, specify term to be
search within a UCSC Genome Browser genome assembly
- categories=helpDocs - on /search?search=<term>&genome=<name> function, restrict the search
within the UCSC Genome Browser help documentation
- categories=publicHubs - on /search?search=<term>&genome=<name> function, restrict the search
within the UCSC Genome Browser Public Hubs
- categories=trackDb - on /search?search=<term>&genome=<name> function, restrict the search
within the track database (trackDb) settings
The parameters are added to the endpoint URL beginning with a
question mark ?, and multiple parameters are separated with
the semi-colon ;. For example:
https://api.genome.ucsc.edu/getData/sequence?genome=hg38;chrom=chrM
Required and optional parameters
| Endpoint function | Required | Optional |
+| /findGenome | q | statsOnly, browser, year, category, status, level, maxItemsOutput |
| /list/publicHubs | (none) | (none) |
| /list/ucscGenomes | (none) | (none) |
| /list/genarkGenomes | (none) | genome, maxItemsOutput |
| /list/hubGenomes | hubUrl | (none) |
| /list/files | genome | format=text, maxItemsOutput |
| /list/tracks | genome or (hubUrl and genome) | trackLeavesOnly=1 |
| /list/chromosomes | genome or (hubUrl and genome) | track |
| /list/schema | (genome or (hubUrl and genome)) and track | (none) |
| /getData/sequence | (genome or (hubUrl and genome)) and chrom | start, end, revComp=1 |
| /getData/track | (genome or (hubUrl and genome)) and track | chrom,
(start and end), maxItemsOutput, jsonOutputArrays |
| /search | search and genome | categories=helpDocs,
categories=publicHubs, categories=trackDb |
@@ -236,30 +244,39 @@
to the single specified chromosome. To limit the request to a specific
position, both start=4321 and end=5678 must be given together.
Using the revComp=1 parameter returns the reverse complement.
Use the genome argument with the /list/genarkGenomes function
to test for the existence of a specific genome assembly in the
Genark set
of assembly hubs.
The /list/files endpoint only works for UCSC hosted genome assemblies,
not for external hosted assembly hubs.
+The /findGenome endpoint can find genome assemblies in the browser or
+any other assembly available at NCBI even when not in the browser. Note,
+there are almost 4 million assemblies available at NCBI. All searches are
+case insensitive. Force inclusion: Use a + sign before +word to ensure
+it appears in the result. Exclude words: Use a - sign before -word to
+exclude it from the search result. Wildcard search: Add an * (asterisk) at
+end of word* to search for all terms starting with that prefix.
+
+
Any extra parameters not allowed in a function will be flagged as an error.
Supported track types for getData functions