73e5ffe87841338f4519f9a88558d9d7e7aa6bc8 kuhn Wed Feb 12 12:33:24 2020 -0800 expanded descripiton of doWiggle a little diff --git src/hg/htdocs/goldenPath/help/bam.html src/hg/htdocs/goldenPath/help/bam.html index b4c6cc6..1d86685 100755 --- src/hg/htdocs/goldenPath/help/bam.html +++ src/hg/htdocs/goldenPath/help/bam.html @@ -1,197 +1,197 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser BAM Track Format" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>BAM Track Format</h1> <p> BAM is the compressed binary version of the <a href="http://samtools.sourceforge.net/" target="_blank">Sequence Alignment/Map (SAM)</a> format, a compact and index-able representation of nucleotide sequence alignments. Many <a href="http://samtools.sourceforge.net/swlist.shtml" target="_blank">next-generation sequencing and analysis tools</a> work with SAM/BAM. For custom track display, the main advantage of indexed BAM over PSL and other human-readable alignment formats is that only the portions of the files needed to display a particular region are transferred to UCSC. This makes it possible to display alignments from files that are so large that the connection to UCSC would time out when attempting to upload the whole file to UCSC. Both the BAM file and its associated index file remain on your web-accessible server (http, https, or ftp), not on the UCSC server. UCSC temporarily caches the accessed portions of the files to speed up interactive display. If you do not have access to a web-accessible server and need hosting space for your BAM files, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p> <p> The typical workflow for generating a BAM custom track is this:</p> <ol> <li> If you haven't done so already, <a href="http://sourceforge.net/projects/samtools/files/" target="_blank">download</a> and build the <a href="http://samtools.sourceforge.net/" target="_blank">samtools</a> program. Test your installation by running <code>samtools</code> with no command line arguments; it should print a brief usage message. For help with <code>samtools</code>, please contact the <a href="http://sourceforge.net/mail/?group_id=246254" target="_blank">SAM tools mailing list</a>.</li> <li> Align sequences using a tool that outputs SAM directly, or outputs a format that can be converted to SAM. (See <a href="http://samtools.sourceforge.net/swlist.shtml" target="_blank">list of tools and converters</a>)</li> <li> Convert SAM to BAM using the samtools program: <pre><code> samtools view -S -b -o my.bam my.sam</code></pre> If converting a SAM file that does not have a proper header, the <code>-t</code> or <code>-T</code> option is necessary. For more information about the command, run <code>samtools view</code> with no other arguments.</li> <li> Sort and create an index for the BAM: <pre><code> samtools sort my.bam my.sorted samtools index my.sorted.bam</code></pre> The sort command appends <code>.bam</code> to <code>my.sorted</code>, creating a BAM file of alignments ordered by leftmost position on the reference assembly. The index command generates a new file, <code>my.sorted.bam.bai</code>, with which genomic coordinates can quickly be translated into file offsets in <code>my.sorted.bam</code>.</li> <li> Move both the BAM file and index file (<code>my.sorted.bam</code> and <code>my.sorted.bam.bai</code>) to an http, https, or ftp location.</li> <li> If the file URL ends with .bam, you can paste the URL directly into the <a href="../../cgi-bin/hgCustom">custom track</a> management page, click submit and view in the Genome Browser. The track name will then be the name of the file. If you want to configure the track name and descriptions, you will need to create a <a href="hgTracksHelp.html#TRACK">track line</a>, as shown below in step 7.</li> <li> Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single <a href="hgTracksHelp.html#TRACK">track line</a>. The most basic version of the "track" line will look something like this: <pre><code> track type=bam name="My BAM" bigDataUrl=<em>http://myorg.edu/mylab/my.sorted.bam</em></code></pre> Again, in addition to <em>http://myorg.edu/mylab/my.sorted.bam</em>, the associated index file <em>http://myorg.edu/mylab/my.sorted.bam.bai</em> must also be available at the same location.</li> <li> Paste the custom track line into the text box in the <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a>, click submit and view in the Genome Browser.</li> </ol> <h3>Parameters for BAM custom track definition lines</h3> <p> All options are placed in a single line separated by spaces. In the example below, the lines are broken only for readability. If you copy/paste this example, you must remove the line breaks. Click <a href="examples/bamExample.txt">here</a> for a text version that you can paste without editing.</p> <pre><code> <strong>track type=bam bigDataUrl=</strong><em>http://...</em> <strong>pairEndsByName=</strong><em>.</em> <strong>pairSearchRange=</strong><em>N</em> <strong>bamColorMode=</strong><em>strand|gray|tag|off</em> <strong>bamGrayMode=</strong><em>aliQual|baseQual|unpaired</em> <strong>bamColorTag=</strong><em>XX</em> <strong>minAliQual=</strong><em>N</em> <strong>showNames=</strong><em>on|off</em> <strong>name=</strong><em>track_label</em> <strong>description=</strong><em>center_label</em> <strong>visibility=</strong><em>display_mode</em> <strong>priority=</strong><em>priority</em> <strong>db=</strong><em>db</em> <strong>maxWindowToDraw=</strong><em>N</em> <strong>chromosomes=</strong><em>chr1,chr2,...</em></code></pre> <p> The track type and bigDataUrl are REQUIRED:</p> <pre><code> <strong>type=bam bigDataUrl=</strong><em>http://myorg.edu/mylab/my.sorted.bam</em></strong></code></pre> <p> The remaining settings are OPTIONAL. Some are specific to BAM:</p> <pre><code> <strong>pairEndsByName </strong><em>any value </em> # presence indicates paired-end alignments <strong>pairSearchRange </strong><em>N </em> # max distance between paired alignments, default 20,000 bases <strong>bamColorMode </strong><em>strand|gray|tag|off </em> # coloring method, default is strand <strong>bamGrayMode </strong><em>aliQual|baseQual|unpaired </em> # grayscale metric, default is aliQual <strong>bamColorTag </strong><em>XX </em> # optional tag for RGB color, default is "YC" <strong>minAliQual </strong><em>N </em> # display only items with alignment quality at least N, default 0 <strong>showNames </strong><em>on|off </em> # if off, don't display query names, default is on </code></pre> Other optional settings are not specific to BAM, but relevant: <pre><code> <strong>name </strong><em>track label </em> # default is "User Track" <strong>description </strong><em>center label </em> # default is "User Supplied Track" <strong>visibility </strong><em>squish|pack|full|dense|hide</em> # default is hide (will also take numeric values 4|3|2|1|0) <strong>priority </strong><em>N </em> # default is 100 <strong>db </strong><em>genome database </em> # e.g. hg18 for Human Mar. 2006 <strong>maxWindowToDraw </strong><em>N </em> # don't display track when viewing more than N bases <strong>chromosomes </strong><em>chr1,chr2,... </em> # track contains data only on listed reference assembly sequences - <strong>doWiggle </strong><em>on|off </em> # if on, show density graph, default is off + <strong>doWiggle </strong><em>on|off </em> # if on, show data as density graph, default is off </code></pre> <p> The <a href="hgBamTrackHelp.html" target="_blank">BAM track configuration</a> help page describes the BAM track configuration page options corresponding to <code>pairEndsByName</code>, <code>minAliQual</code>, <code>bamColorMode</code>, <code>bamGrayMode</code> and <code>bamColorTag</code> in more detail.</p> <p> <code>pairSearchRange</code> applies only when <code>pairEndsByName</code> is given. It allows for a tradeoff of display speed vs. completeness of pairing the paired-end alignments. When paired ends are split or separated by large gaps or introns, but one is viewing a small genomic region, it is necessary to search a large number of bases upstream and downstream of the viewed region in order to find mates of the alignments in the viewed region. However, searching a very large region can be slow, especially when the alignments have deep coverage of the genome. To ensure that all properly paired mates will be found, <code>pairSearchRange</code> should be set to the largest genomic size of a mapped pair. However, it can be set to a smaller size if necessary to speed up the display, at the cost of some items being displayed as unpaired when the mate is too far outside the viewed window.</p> <h3>Example #1</h3> <p> In this example, you will create a custom track for an indexed BAM file that is already on a public server — alignments of sequence generated by the <a href="http://1000genomes.org/" target="_blank">1000 Genomes Project</a>.</p> <p> You can paste the URL <code>http://genome.ucsc.edu/goldenPath/help/examples/bamExample.bam</code> directly into the <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a> for the human assembly hg18 (May 2006), then press the <em>submit</em> button. On the following page, press the <em>chr21</em> link in the custom track and navigate to position chr21:33,038,946-33,039,092 to see the reads in the new BAM track.</p> <p> Alternatively, you can specify more visualization options by creating a "track" line. The line breaks inserted here for readability must be removed before submitting the track line:</p> <pre><code> track type=bam name="BAM Example One" description="Bam Ex. 1: 1000 Genomes read alignments (individual NA12878)" pairEndsByName=. pairSearchRange=10000 chromosomes=chr21 bamColorMode=gray maxWindowToDraw=200000 db=hg18 visibility=pack bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bamExample.bam</code></pre> <p> Include the following "browser" line to view a small region of chromosome 21 with alignments from the .bam file:</p> <pre><code> browser position chr21:33,038,946-33,039,092</code></pre> <p> Note if you copy/paste the above example, you must remove the line breaks (or, click <a href="examples/bamExampleOne.txt">here</a> for a text version that you can paste without editing).</p> <p> Paste the "browser" line and "track" line into the <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a> for the human assembly hg18 (May 2006), then press the "submit" button. On the following page, press the <em>chr21</em> link in the custom track listing to view the BAM track in the Genome Browser.</p> <h3>Example #2</h3> <p> In this example, you will create indexed BAM from an existing SAM file. First, save this SAM file <a href="examples/samExample.sam" target="_blank">samExample.sam</a> to your machine. Perform steps 1 and 3-7 in the workflow described above, but substituting <code>samExample.sam</code> for <code>my.sam</code>. On the <a href="../../cgi-bin/hgCustom" target="_blank">custom track management page</a>, click the "add custom tracks" button if necessary and make sure that the genome is set to Human and the assembly is set to Mar. 2006 (hg18) before pasting the track line and submitting. This track line is a little nicer than the one shown in step 6, but remember to remove the line breaks that have been added to the track line for readability (or, click <a href="examples/bamExampleTwo.txt">here</a> for a text version that you can paste without editing):</p> <pre><code> track type=bam name="BAM Example Two" bigDataUrl=<em>http://myorg.edu/mylab/my.sorted.bam</em> description="Bam Ex. 2: Simulated RNA-seq read alignments" visibility=squish db=hg18 chromosomes=chr21 browser position chr21:33,037,317-33,038,137 browser pack mrna</code></pre> <h3>Sharing Your Data with Others</h3> <p> If you would like to share your BAM data track with a colleague, learn how to create a sharable URL by looking at <a href="customTrack.html#SHARE">this page</a>.</p> <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->