478ca956c630cc79efc533d1cc19f765faea38e1 brianlee Wed May 4 14:19:47 2022 -0700 Improving the bigMaf page to have hints about the frames and summary options, and the autoSql about those files (inspired by doc work on bigRmsk). refs #29372 diff --git src/hg/htdocs/goldenPath/help/bigMaf.html src/hg/htdocs/goldenPath/help/bigMaf.html index 9c5e19c..3d5de5e 100755 --- src/hg/htdocs/goldenPath/help/bigMaf.html +++ src/hg/htdocs/goldenPath/help/bigMaf.html @@ -19,39 +19,97 @@ those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigMaf files have considerably faster display performance than regular MAF files when working with large data sets. The bigMaf file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigMaf files, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p> <h2 id="bigMaf">bigMaf file definition</h2> <p> The following autoSql definition is used to specify bigMaf multiple alignment files. This definition, contained in the file <a href="examples/bigMaf.as"><em>bigMaf.as</em></a>, is pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigMaf.as</code> option.</p> +<h6>bigMaf.as</h6> <pre><code>table bedMaf "Bed3 with MAF block" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position in chromosome" uint chromEnd; "End position in chromosome" lstring mafBlock; "MAF block" )</code></pre> <p> +An example: <code>bedToBigBed -type=bed3+1 -as=bigMaf.as -tab bigMaf.txt +hg38.chrom.sizes bigMaf.bb</code></p> + +<h3>Supporting <code>frame</code> and <code>summary</code> definitions</h3> +<p> +Alongside the bigMaf file, two other summary and frame bigBeds are created. The +following autoSql definition is used to create the first file, pointed to online +with <code>summary <url></code>, rather than the standard +<code>bigDataUrl <url></code> used with bigMaf. The file +<a href="examples/mafSummary.as"><em>mafSummary.as</em></a>, is pulled in when +the <code>bedToBigBed</code> utility is run with the <code>-as=mafSummary.as</code> +option.</p> +<h6>mafSummary.as</h6> +<pre><code>table mafSummary +"Positions and scores for alignment blocks" + ( + string chrom; "Reference sequence chromosome or scaffold" + uint chromStart; "Start position in chromosome" + uint chromEnd; "End position in chromosome" + string src; "Sequence name or database of alignment" + float score; "Floating point score." + char[1] leftStatus; "Gap/break annotation for preceding block" + char[1] rightStatus; "Gap/break annotation for following block" + )</code></pre> +<p> +An example, <code>bedToBigBed -type=bed3+4 -as=mafSummary.as +-tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb</code>. +Another tool, <code>hgLoadMafSummary</code> generates the input +<code>bigMafSummary.bed</code> file.</p> +<p> +The following autoSql definition is used to create the second file, +pointed to online with <code>frames <url></code>. The file +<a href="examples/mafFrames.as"><em>mafFrames.as</em></a>, is pulled in when +the <code>bedToBigBed</code> utility is run with the <code>-as=mafFrames.as</code> +option.</p> +<h6>mafFrames.as</h6> +<pre><code>table mafFrames +"codon frame assignment for MAF components" + ( + string chrom; "Reference sequence chromosome or scaffold" + uint chromStart; "Start range in chromosome" + uint chromEnd; "End range in chromosome" + string src; "Name of sequence source in MAF" + ubyte frame; "frame (0,1,2) for first base(+) or last bast(-)" + char[1] strand; "+ or -" + string name; "Name of gene used to define frame" + int prevFramePos; "target position of the previous base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" + int nextFramePos; "target position of the next base (in transcription direction) that continues this frame, or -1 if none, or frame not contiguous" + ubyte isExonStart; "does this start the CDS portion of an exon?" + ubyte isExonEnd; "does this end the CDS portion of an exon?" + )</code></pre> +<p> +An example, <code>bedToBigBed -type=bed3+8 -as=mafFrames.as +-tab bigMafFrames.txt hg38.chrom.sizes bigMafFrames.bb</code>. Another tool, +<code>genePredToMafFrames</code> generates the input +<code>bigMafFrames.txt</code> file.</p> +<p> Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file.</p> <h2 id="steps">Creating a bigMaf track</h2> <p> To create a bigMaf track, follow these steps: <p> <strong>Step 1.</strong> If you already have a MAF file you would like to convert to a bigMaf, skip to <em>Step 3</em>. Otherwise, download <a href="examples/chr22_KI270731v1_random.maf">this example MAF file</a> for the human GRCh38 (hg38) assembly.</p> <p> <strong>Step 2.</strong> If you would like to include optional reading frame and block summary information, download the <a href="examples/chr22_KI270731v1_random.gp">chr22_KI270731v1_random.gp</a> genePred file.</p> @@ -106,76 +164,76 @@ hgLoadMafSummary -minSeqSize=1 -test hg38 bigMafSummary chr22_KI270731v1_random.maf cut -f2- bigMafSummary.tab | sort -k1,1 -k2,2n > bigMafSummary.bed bedToBigBed -type=bed3+4 -as=mafSummary.as -tab bigMafSummary.bed hg38.chrom.sizes bigMafSummary.bb </code></pre> <p> <strong>Step 7.</strong> Move the newly created bigMaf file (<em>bigMaf.bb</em>) to a web-accessible http, https or ftp location. If you generated the <em>bigMafSummary.bb</em> and/or <em>bigMafFrames.bb</em> files, move those to a web accessible location, likely same location as the <em>bigMaf.bb</em> file.</p> <p> <strong>Step 8.</strong> Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:</p> -<pre><code>track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb</code></pre> +<pre><code>track type=bigMaf name="My Big MAF" description="A Multiple Alignment" bigDataUrl=http://myorg.edu/mylab/bigMaf.bb summary=http://myorg.edu/mylab/bigMafSummary.bb frames=http://myorg.edu/mylab/bigMafFrames.bb</code></pre> <p> <strong>Step 9.</strong> Paste the custom track line into the text box on the <a href="../../cgi-bin/hgCustom">custom track management page</a>. Navigate to chr22_KI270731v1_random to see the example data for this track.</p> <p> The <code>bedToBigBed</code> program can be run with several additional options. For a full list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line to display the usage message. </p> <h2>Examples</h2> <h3 id="example1">Example #1</h3> <p> In this example, you will create a bigMaf custom track using an existing bigMaf file, <em>bigMaf.bb</em>, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.</p> <p> To create a custom track using this bigMaf file: <ol> <li> Construct a track line that references the file:</p> - <pre><code>track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb</code></pre> + <pre><code>track type=bigMaf name="bigMaf Example One" description="A bigMaf file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb summary=http://genome.ucsc.edu/goldenPath/help/examples/bigMafSummary.bb </code></pre> <li> Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a> for the human assembly hg38 (Dec. 2013).</li> <li> Click the "submit" button.</li> </ol> <p> Note that additional track line options exist that are specific to the <a href="../../FAQ/FAQformat.html#format5">MAF format</a>. For instance, adding the parameter setting <code>speciesOrder="panTro4 rheMac3 mm10 rn5 canFam3 monDom5"</code> to the above example will specify the order of sequences by species.</p> <p> Custom tracks can also be loaded via one URL line. <a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack%20frames=http://genome.ucsc.edu/goldenPath/help/examples/bigMafFrames.bb" target="_blank">This link</a> loads the same <em>bigMaf.bb</em> track and sets additional display parameters in the URL:</p> <pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random&hgct_customText=track%20type=bigMaf%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigMaf.bb%20visibility=pack</code></pre> <p> After this example bigMaf is loaded in the Genome Browser, click into an alignment on the browser's track display. Note that the details page displays information about the individual alignments, similar to that which is available for a standard MAF track.</p> -<h3 id="example2">Example #2</h2> +<h3 id="example2">Example #2</h3> <p> In this example, you will create a bigMaf file from an existing bigMaf input file, <em>bigMaf.txt</em>, located on the UCSC Genome Browser http server.</p> <ol> <li> Save the bed3+1 example file, <a href="examples/bigMaf.txt"><em>bigMaf.txt</em></a>, to your computer (<em>Step 6</em>, above).</li> <li> Save the autoSql file <a href="examples/bigMaf.as"><em>bigMaf.as</em></a> to your computer (<em>Step 3</em>, above).</li> <li> Download the <a href="http://hgdownload.soe.ucsc.edu/admin/exe/"><code>bedToBigBed</code> utility</a> (<em>Step 4</em>, above).</li> <li>