8eadcb210a91b4c9aefdd603cd91dfea41340f47 markd Wed Sep 7 12:08:48 2022 -0700 update bigRmsk doc to point to the new RepeatMasker release diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html index e89362c..72eac77 100755 --- src/hg/htdocs/goldenPath/help/bigRmsk.html +++ src/hg/htdocs/goldenPath/help/bigRmsk.html @@ -1,202 +1,192 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>bigRmsk Track Format</h1> <p> The bigRmsk format allows for the display of annotations of a genome generated by the <a href="http://www.repeatmasker.org/" target="_blank">RepeatMasker</a> program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. It is the recommend method of adding RepeatMaster tracks to assembly hubs. For a descriptions of this features of this track type, with examples, see <a href="bigRmskTrackDescExample.html">standard bigRmsk track description</a>. <p> The bigRmsk format enables taking the annotation output of RepeatMasker and converting it into a compressed and indexed <a href="/goldenPath/help/bigBed.html">bigBed</a> file. Please see this page for a details of the bigBed format, its use, and associated tools. </p> <h2 id="bigRmsk">bigRmsk track definitions</h2> <p> The bigRmsk tracks consist of two bigBed files define by <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> schema: </p> <ul> <li>The primary bigRmsk file, define by <a href="examples/bigRmskBed.as"><em>bigRmskBed.as</em></a>, which has the annotations of repeats. <li>The secondary bigRmskAlign file, define by <a href="examples/bigRmskAlignBed.as"><em>bigRmskAlignBed.as</em></a>, which contains the alignments of the consensus repeats to the genome. This file is optional, if omitted, the bigRmsk track will function, without the ability to view the alignments. </ul> <p> The input files for the bigRmsk files are create from the RepeatMasker <em>*.out</em> and <em>*.align</em> files using the <em>rmToTrackHub.pl</em> program that is include with RepeatMasker. The bigRmsk format is not designed to work with any other type of data. </p> <h2 id="steps">Creating a bigRmsk track</h2> <p> To create a bigRmsk track, and its supporting files, follow the below steps. This assumes that you have already run RepeatMasker and have a <em>*.out</em>, and optionally <em>*.align</em> file. </p> <p> RepeatMasker output files are convert to the bigRmsk textual form using the - <em>RepeatMasker/util/rmToTrackHub.pl</em> program that is part of the RepeatMasker distribution. + <em>RepeatMasker/util/rmToTrackHub.pl</em> program that is part of the + <a href="http://www.repeatmasker.org/RepeatMasker/">RepeatMasker 4.1.3 or newer distribution</a>. </p> <p> - NOTE: The current version of RepeatMasker (4.1.2-p1) does not contain the - <em>rmToTrackHub.pl</em> program. Until it is available in, obtain a copy - from the RepeatMasker GitHub development branch: -</p> - <pre> - <code> - git clone -b development git@github.com:rmhubley/RepeatMasker.git - </code> - </pre> - -<p> <strong>Step 1.</strong> If you wish to experiment with quickly building an example track, download the example RepeatMasker output files for the human GRCh38 (hg38) assembly <a href="examples/bigRmskExample.out">bigRmskExample.out</a> and <a href="examples/bigRmskExample.align">bigRmskExample.align</a> used in this tutorial: <pre> <code> wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.out wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align </code> </pre> <p> Otherwise, substitute your <em>*.out</em> and <em>*.align</em> in theses instructions. Generating the alignment bigRmsk file is optional if you don't have the <em>*.align</em> files from RepeatMasker, the track will function with reduced functionality without them. Just skip the steps involved in build the alignment files. <p> <strong>Step 2.</strong> Download the autoSql schemes <a href="examples/bigRmskBed.as">bigRmskBed.as</a> and <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>: <pre> <code> wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as </code> </pre> <p> You will also need a file of chromosome sizes for your genome, or download the hg38 file for the example: <pre> <code> wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes </code> </pre> <p> <strong>Step 3.</strong> Convert the RepeatMasker files to the text format bigRmsk files for conversion to the bigRmsk files with <em>rmToTrackHub.pl</em>, which sorts the output for direct input to <em>bedToBigBed</em>: <pre> <code> RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align </code> </pre> <p> <strong>Step 4.</strong> Build the bigRmsk and optional bigRmskAlign files: <pre> <code> bedToBigBed -tab -type=bed9+5 -as=bigRmskBed.as bigRmskExample.join.tsv hg38.chrom.sizes bigRmskExample.bb bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb </code> </pre> <p> <strong>Step 6.</strong> Place the newly created bigRmsk file (<em>bigRmskExample.bb</em>), and optional bigRmskAlign (<em>bigRmskExampleAlign.bb</em>) to a web-accessible http, https or ftp location. </p> <strong>Step 7.</strong> <p> As with other bigBed-based tracks, bigRmsk tracks can be displayed as <a href="hgTracksHelp.html#CustomTracks">custom tracks</a>, included in <a href="hubQuickStart.html">track hubs</a>, or <a href="hubQuickStartAssembly.html">assembly hubs</a>. </p> <p> The following options are used for bigRmsk custom tracks or trackDb entries: <ul> <li> <code>type bigRmsk</code> <li> <code>bigDataUrl<em><url></em></code> - URL or relative path of bigRmsk file <li> <code>xrefDataUrl<em><url></em></code> - URL or relative path of optional bigRmskAlign file </ul> A standard bigRmsk track description is available at <a href="../trackDescriptions/bigRmskTrackDesc.html">bigRmskTrackDesc.html</a>, which can be directly to with as the URL:<br> <em>http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html</em>. <p> See the <a href="#examples">Examples</a> section below for detailed examples of bigRmsk custom tracks and track hub definitions. </p> <h2 id="examples">Examples</h2> <h3 id="example1">Example of a bigRmsk custom track</h3> <p> Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. <p> To create a custom track using the example bigRmsk file: <ol> <li> Construct a track line that references the file:<br> <pre><code>track type=bigRmsk name="bigRmsk Example" description="RepeatMasker example" visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb</code></pre> </li> <li> Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a> for the human assembly hg38 (Dec. 2013). </li> <li> Click the "submit" button. </li> <li> Navigate to <code>chr1:8,890-35,190</code> to see the track. </li> </ol> <h3 id="example2">Example of a bigRmsk track hub </h3> <p> This example can also be loaded in a Track or Assembly Hub <em>trackDb.txt</em> with a stanza such as the following:</p> <pre> track bigRmskExample shortLabel Example bigRmsk longLabel This is an example bigRmsk Track Hub Stanza type bigRmsk visibility full html http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb </pre> <h2 id="share">Additional information</h2> <p> See the <a href="bigBed.html">bigBed documentation</a> for guidance on sharing, trouble shooting and extracting data from bigRmsk files. </p> <h2 id="credits">Credits</h2> The bigRmsk system was developed by Robert Hubley of the Institute for Systems Biology. <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->