7f7928d7115d32d6018254b9f8f241cc6e6c7716 dschmelt Tue Nov 2 16:03:21 2021 -0700 Adding doc about Batch Queries using positions refs #28436 diff --git src/hg/htdocs/goldenPath/help/hgTablesHelp.html src/hg/htdocs/goldenPath/help/hgTablesHelp.html index 72a3fa5..c9c48e3 100755 --- src/hg/htdocs/goldenPath/help/hgTablesHelp.html +++ src/hg/htdocs/goldenPath/help/hgTablesHelp.html @@ -1,36 +1,37 @@

Table Browser User's Guide

Introduction

Correlating data from two tables

@@ -117,60 +118,59 @@ (2) genome-euro-mysql.soe.ucsc.edu (Europe). More information can be found on our MariaDB Access page. Alternatively, the database may be downloaded to a local computer for MariaDB access. See the mirror site documentation for information on setting up a local copy of the database.

About the Table Browser databases and tables

The Table Browser is built on top of the Genome Browser database, which actually consists of several separate databases, one for each genome assembly.

Tables within the databases may be differentiated by whether the data are based on genome start-stop coordinates (positional tables) or are independent of position (non-positional tables).Some output formats and query options are applicable only to positional tables, hence the distinction.

- - - - -

Non-positional tables

-Non-positional tables contain data not tied to genomic location, for example a table that correlates -a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA -IDs to extended information such as author, tissue, or keyword. Some "meta" tables in -this category contain information about the structure of the database itself or describe external -files containing sequence data.

Positional tables

Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments, and other annotations. Each of the annotation tracks displayed in the Genome Browser is based on a positional table. In some instances, data from other positional and non-positional tables may also be incorporated into the track. Data associated with custom annotation tracks active within the user's Table Browser session are also available as positional tables.

Positional tables can be further subdivided into several categories based on the type of data they describe. Alignment data can be best described by using a block structure to represent each element. Other tables require only start and end coordinate data for each element. Some tables specify a translation start and end in addition to the transcription start and end. Some tables contain strand information, others don't. Most tables, but not all, specify a name for each element. Based on the format of the data described by a table, different query and output formatting options may be offered.

+ + + +

Non-positional tables

+Non-positional tables contain data not tied to genomic location, for example a table that correlates +a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA +IDs to extended information such as author, tissue, or keyword. Some "meta" tables in +this category contain information about the structure of the database itself or describe external +files containing sequence data.

Getting started - simple queries

In its most basic form, the Table Browser can be used to retrieve a specific subset of records from a track or positional table in a selected genome assembly. The query may be based on a specific position or a set of one or more identifiers.

@@ -273,35 +273,35 @@ Select the RefSeq Genes option in the track list.

Type chr7:26906938-26940301 in the position box (the Table Browser will automatically select the position option button).

Click the Get Output button.

The Table Browser will display the records for the RefSeq accessions NM_005522, NM_153620, NM_006735, NM_153632, NM_030661, and NM_153631.

Batch query using identifiers

-In many cases, you may want to retrieve data based on a list of one or more accessions or names, +In many cases, you may want to retrieve data based on a list of one or more accessions, IDs, or names, rather than querying by genomic position. Many tracks in the Table Browser, such as those in the -Genes and Gene Prediction track group, support identifier queries. The identifier type used +Genes and Gene Prediction or Variationtrack groups, support identifier queries. The identifier type used in the query must match the kind of identifiers present in the track data, e.g., mRNA accession IDs -must be used to query the mRNA table.

+must be used to query the mRNA table and rsIDs must match those in the dbSNP table.

Follow these steps to display a list of records that correspond to a set of accessions or names entered as query input.

Step 1. Pick the genome assembly, track, and table

Step 2. Select the genome region setting

Step 3. Load the identifiers into the browser
Click the Paste List button to type or paste in the identifiers or the Upload List button to load the data from a file existing on your local computer.

If you are loading multiple identifiers, entries must be separated by a space, tab, or line.
@@ -310,53 +310,74 @@
The Table Browser will retain the identifier list until you delete the information by clicking the Clear List button.

Step 4. Click the Get Output button
See the Output formats section for information about configuring the query output. + + +

Batch query from positions

+If you have a list of genomic positions and want to retrieve information +about their properties, you can use the Define Regions button to input +multiple positions to query a chosen table. In this example, you want to determine +the dbSNP rsID names for your list of positions. + +

Step 1. Select genome assembly and track
+To determine dbSNP rsIDs we will be using Human genome hg38 and dbSNP153.

+ +

Step 2. Select the define regions button, enter regions
+You can find the define regions button under the Define region of +interest section. Upload, type, or paste in your regions of interest, making sure they are +in the desired 0/1 base notation. They will only be accepted in BED or positional format.

+ +

Step 3. Select output format and get output
+If you want all data from a table, you need not change the output format from the default. +If you want only particular columns from the table, you can change it to selected fields +from primary and related tables. Once you hit the get output button, +you will be redirected to a column selection page or if you did not change the output format, +your output data itself.

Get gene symbols in a query

Follow the example below to obtain gene symbols in your query:

1. Select the clade, genome, assembly, group, table, and region as desired.
2. Change the output format to selected fields from primary and related tables.
3. Click get output to go to the next step of selecting fields from related tables.
4. Select the fields you would like from your primary table.
5. On the same Select Fields form, find the table for the related kgXref table. For example, look for the hg38.kgXref table, and then check the checkbox next to Gene Symbol to add gene symbols to your query results.
6. Click get output again to get the final query output.

- -

Filtering output by constraining field values

The Table Browser filter option can be used to:

apply constraints on table field values to restrict which records should appear in the query output
conduct batch queries using wildcards
include fields from multiple tables in the query output

@@ -420,30 +441,35 @@

Step 5. Click the Submit button to apply the filter

Note: In the current implementation of the Table Browser, the selected fields from primary and related tables output format option must be used when including fields from multiple tables in a filter. Check the boxes for all tables in the Linked Tables list on which filter constraints have been applied, then click the Allow Selection From Checked Tables button to include them in the output.

Filter constraints

Strings
Text fields are compared to words or patterns containing wildcard characters. Valid wildcards are +i + + + + "*" (matches 0 or more characters) and "?" (matches a single character). Each space-separated word or pattern in a text field box is matched against the value of that field in each record. If any word or pattern matches the value, then the record meets the constraint on that field.

Numbers
Numeric fields are compared to table data using an operator such as <, >, != (not equals) followed by a number. To specify a range, enter two numbers (start and end) separated by white space and/or a comma.

Free-form queries
When the filters on individual fields aren't sufficiently flexible, the free-form query text box allows the application of more complex constraints that typically relate two or more field names of the selected table. Valid free-form queries use the syntax of the SQL where clause

Table Browser User's Guide

Contents

Introduction

About the Table Browser databases and tables

Getting started - simple queries

Filtering output by constraining field values

Intersecting data from multiple tables

Correlating data from two tables

About the Table Browser databases and tables

Non-positional tables

Positional tables

Non-positional tables

Getting started - simple queries

Batch query using identifiers

Batch query from positions

Get gene symbols in a query

Filtering output by constraining field values

Filter constraints