7f7928d7115d32d6018254b9f8f241cc6e6c7716 dschmelt Tue Nov 2 16:03:21 2021 -0700 Adding doc about Batch Queries using positions refs #28436 diff --git src/hg/htdocs/goldenPath/help/hgTablesHelp.html src/hg/htdocs/goldenPath/help/hgTablesHelp.html index 72a3fa5..c9c48e3 100755 --- src/hg/htdocs/goldenPath/help/hgTablesHelp.html +++ src/hg/htdocs/goldenPath/help/hgTablesHelp.html @@ -1,36 +1,37 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Table Browser Help" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>Table Browser User's Guide </h1> <h2>Contents</h2> <h6><a href="#Introduction">Introduction</a></h6> <h6><a href="#Tables">About the Table Browser databases and tables</a></h6> <ul> - <li><a href="#NonPositional"><strong>Non-positional tables</strong></a></li> <li><a href="#Positional"><strong>Position-oriented tables</strong></a></li> + <li><a href="#NonPositional"><strong>Non-positional tables</strong></a></li> </ul> <h6><a href="#GettingStarted">Getting started - simple queries</a></h6> <ul> <li><a href="#PositionQuery"><strong>Simple position-based query</strong></a></li> <li><a href="#BatchQuery"><strong>Batch query using identifiers</strong></a></li> + <li><a href="#GetInfoFromPositions"><strong>Batch query using positions</strong></a></li> <li><a href="#GetGeneSymbols"><strong>Query to get gene symbols</strong></a></li> </ul> <h6><a href="#Filter">Filtering output by constraining field values</a></h6> <ul> <li><a href="#FilterSingle"><strong>Filtering on fields from a single table</strong></a></li> <li><a href="#FilterMultiple"><strong>Filtering on fields from multiple tables</strong></a></li> <li><a href="#FilterConstraints"><strong>Filter constraints</strong></a></li> </ul> <h6><a href="#Intersection">Intersecting data from multiple tables</a></h6> <ul> <li><a href="#SimpleIntersection"><strong>Intersecting data from two tables</strong></a></li> <li><a href="#MultiIntersection"><strong>Intersecting data from multiple tables</strong></a></li> <li><a href="#IntersectionOptions"><strong>Intersection options</strong></a></li> </ul> <h6><a href="#Correlation">Correlating data from two tables</a></h6> @@ -117,60 +118,59 @@ (2) genome-euro-mysql.soe.ucsc.edu (Europe). More information can be found on our <a href="mysql.html">MariaDB Access</a> page. Alternatively, the database may be downloaded to a local computer for MariaDB access. See the <a href="mirror.html">mirror site</a> documentation for information on setting up a local copy of the database.</p> <!-- ====Tables======================== --> <a name="Tables"></a> <h2>About the Table Browser databases and tables</h2> <p> The Table Browser is built on top of the Genome Browser database</a>, which actually consists of several separate databases, one for each genome assembly.</p> <p> Tables within the databases may be differentiated by whether the data are based on genome start-stop coordinates (positional tables) or are independent of position (non-positional tables).Some output formats and query options are applicable only to positional tables, hence the distinction.</p> - -<!-- ====Non-positional======================= --> -<a name="NonPositional"></a> - -<h3>Non-positional tables</h3> -<p> -Non-positional tables contain data not tied to genomic location, for example a table that correlates -a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA -IDs to extended information such as author, tissue, or keyword. Some "meta" tables in -this category contain information about the structure of the database itself or describe external -files containing sequence data.</p> - <!-- ====Positional========================== --> <a name="Positional"></a> <h3>Positional tables</h3> <p> Positional tables contain data associated with specific locations in the genome, such as mRNA alignments, gene predictions, cross-species alignments, and other annotations. Each of the annotation tracks displayed in the Genome Browser is based on a positional table. In some instances, data from other positional and non-positional tables may also be incorporated into the track. Data associated with custom annotation tracks active within the user's Table Browser session are also available as positional tables.</p> <p> Positional tables can be further subdivided into several categories based on the type of data they describe. Alignment data can be best described by using a block structure to represent each element. Other tables require only start and end coordinate data for each element. Some tables specify a translation start and end in addition to the transcription start and end. Some tables contain strand information, others don't. Most tables, but not all, specify a name for each element. Based on the format of the data described by a table, different query and output formatting options may be offered.</p> +<!-- ====Non-positional======================= --> +<a name="NonPositional"></a> + +<h3>Non-positional tables</h3> +<p> +Non-positional tables contain data not tied to genomic location, for example a table that correlates +a Known Gene ID with a RefSeq accession ID. Some non-positional tables relate internal numeric mRNA +IDs to extended information such as author, tissue, or keyword. Some "meta" tables in +this category contain information about the structure of the database itself or describe external +files containing sequence data.</p> + <!-- REPLACE OR REMOVE <p> For descriptions of the Genome Browser database tables, see the <a href="/goldenPath/gbdDescriptions.html">annotation database</a> documentation. --> <!-- ====Getting Started===================== --> <a name="GettingStarted"></a> <h2>Getting started - simple queries</h2> <p> In its most basic form, the Table Browser can be used to retrieve a specific subset of records from a track or positional table in a selected genome assembly. The query may be based on a specific position or a set of one or more identifiers.</p> <p> @@ -273,35 +273,35 @@ Select the <em>RefSeq Genes</em> option in the <code>track</code> list.</li> <li> Type <em>chr7:26906938-26940301</em> in the <code>position</code> box (the Table Browser will automatically select the <code>position</code> option button).</li> <li> Click the <code>Get Output</code> button.</li> </ol> <p> The Table Browser will display the records for the RefSeq accessions NM_005522, NM_153620, NM_006735, NM_153632, NM_030661, and NM_153631.</p> <!-- ====Batch Query========================= --> <a name="BatchQuery"></a> <h3>Batch query using identifiers</h3> <p> -In many cases, you may want to retrieve data based on a list of one or more accessions or names, +In many cases, you may want to retrieve data based on a list of one or more accessions, IDs, or names, rather than querying by genomic position. Many tracks in the Table Browser, such as those in the -<em>Genes and Gene Prediction</em> track group, support identifier queries. The identifier type used +<em>Genes and Gene Prediction</em> or <em>Variation</em>track groups, support identifier queries. The identifier type used in the query must match the kind of identifiers present in the track data, e.g., mRNA accession IDs -must be used to query the mRNA table.</p> +must be used to query the mRNA table and rsIDs must match those in the dbSNP table.</p> <p> Follow these steps to display a list of records that correspond to a set of accessions or names entered as query input.</p> <p> <strong>Step 1. Pick the genome assembly, track, and table</strong></p> <p> <strong>Step 2. Select the <em>genome</em> <code>region</code> setting</strong></p> <p> <strong>Step 3. Load the identifiers into the browser</strong><br> Click the <code>Paste List</code> button to type or paste in the identifiers or the <code>Upload List</code> button to load the data from a file existing on your local computer.</p> <ul> <li> If you are loading multiple identifiers, entries must be separated by a space, tab, or line.</li> <li> @@ -310,53 +310,74 @@ <li> The Table Browser will retain the identifier list until you delete the information by clicking the <code>Clear List</code> button.</li> </ul> <p> <strong>Step 4. Click the <code>Get Output</code> button</strong><br> See the <a href="#OutputFormats">Output formats</a> section for information about configuring the query output. <!-- <p> <em><strong>Example:</strong></em><br> [FIXME - add example] --> <!-- ====Query to get gene symbols========================== --> +<!-- ====Batch Query with positions========================== --> +<a name="GetInfoFromPositions"></a> +<h3>Batch query from positions</h3> +<p> +If you have a list of genomic positions and want to retrieve information +about their properties, you can use the <code>Define Regions</code> button to input +multiple positions to query a chosen table. In this example, you want to determine +the dbSNP rsID names for your list of positions. + +<p><strong>Step 1. Select genome assembly and track</strong></br> +To determine dbSNP rsIDs we will be using Human genome hg38 and dbSNP153.</p> + +<p><strong>Step 2. Select the <code>define regions</code> button, enter regions</strong></br> +You can find the <code>define regions</code> button under the <code>Define region of +interest</code> section. Upload, type, or paste in your regions of interest, making sure they are +in the desired 0/1 base notation. They will only be accepted in BED or positional format.</p> + +<p><strong>Step 3. Select output format and <code>get output</code></strong></br> +If you want all data from a table, you need not change the output format from the default. +If you want only particular columns from the table, you can change it to <code>selected fields +from primary and related tables</code>. Once you hit the <code>get output</code> button, +you will be redirected to a column selection page or if you did not change the output format, +your output data itself.</p> <a name="GetGeneSymbols"></a> <h3>Get gene symbols in a query</h3> <p> Follow the example below to obtain gene symbols in your query: <p> <ul> <li>1. Select the clade, genome, assembly, group, table, and region as desired.</li> <li>2. Change the <code>output format</code> to <em>selected fields from primary and related tables</em>.</li> <li>3. Click <code> get output</code> to go to the next step of selecting fields from related tables.</li> <li>4. Select the fields you would like from your primary table.</li> <li>5. On the same <em>Select Fields</em> form, find the table for the related <em>kgXref</em> table. For example, look for the <em>hg38.kgXref</em> table, and then check the checkbox next to <em>Gene Symbol</em> to add gene symbols to your query results.</li> <li>6. Click <code> get output</code> again to get the final query output.</li> </ul> </p> - - <!-- ====Filtering Output================== --> <a name="Filter"></a> <h2>Filtering output by constraining field values</h2> <p> The Table Browser <code>filter</code> option can be used to:</p> <ul> <li> apply constraints on table field values to restrict which records should appear in the query output</li> <li> conduct batch queries using wildcards</li> <li> include fields from multiple tables in the query output</li> </ul> @@ -420,30 +441,35 @@ <p> <strong>Step 5. Click the Submit button to apply the filter</strong></p> <p> <strong>Note:</strong> In the current implementation of the Table Browser, the <em>selected fields from primary and related tables</em> output format option must be used when including fields from multiple tables in a filter. Check the boxes for all tables in the <code>Linked Tables</code> list on which filter constraints have been applied, then click the <code>Allow Selection From Checked Tables</code> button to include them in the output.</p> <!-- ====Filter Constraints================ --> <a name="FilterConstraints"></a> <h3>Filter constraints</h3> <p> <strong>Strings</strong><br> Text fields are compared to words or patterns containing wildcard characters. Valid wildcards are +i + + + + "*" (matches 0 or more characters) and "?" (matches a single character). Each space-separated word or pattern in a text field box is matched against the value of that field in each record. If any word or pattern matches the value, then the record meets the constraint on that field.</p> <p> <strong>Numbers</strong><br> Numeric fields are compared to table data using an operator such as <, >, != (not equals) followed by a number. To specify a range, enter two numbers (start and end) separated by white space and/or a comma.</p> <p> <strong>Free-form queries</strong><br> When the filters on individual fields aren't sufficiently flexible, the <code>free-form query</code> text box allows the application of more complex constraints that typically relate two or more field names of the selected table. Valid free-form queries use the syntax of the SQL <em><a href="http://www.w3schools.com/sql/sql_where.asp" target="_blank">where</a></em> clause