8c520f540bf2410ce30bbf71f16f292b90144a5a Merge parents cd059ca 11bfdf3 dschmelt Wed Sep 7 14:06:50 2022 -0700 Fixing merge, going with markD's changes refs #29356 diff --cc src/hg/htdocs/goldenPath/help/bigRmsk.html index ab2bf64,72eac77..839b9ea --- src/hg/htdocs/goldenPath/help/bigRmsk.html +++ src/hg/htdocs/goldenPath/help/bigRmsk.html @@@ -1,201 -1,192 +1,195 @@@
The bigRmsk format allows for the display of annotations of a genome generated by the RepeatMasker program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. + It is the recommend method of adding RepeatMaster tracks to assembly hubs. + For a descriptions of this features of this track type, with examples, see + standard bigRmsk track description. +
The bigRmsk format enables taking the annotation output of RepeatMasker and converting it into a compressed and indexed bigBed file. Please see this page for a details of the bigBed format, its use, and associated tools.
The bigRmsk tracks consist of two bigBed files define by autoSql schema:
- The input files for the bigRmsk files are create from the RepeatMasker *.out and *.align files - using the rmToTrackHub.pl program that is include with RepeatMasker. The bigRmsk + The input files for the bigRmsk files are created from the RepeatMasker *.out and + *.align files + using the rmToTrackHub.pl program that is included with RepeatMasker. The bigRmsk format is not designed to work with any other type of data.
- To create a bigRmsk track, and its supporting files, follow the below steps. + To create a bigRmsk track and its supporting files, follow the below steps. This assumes that you have already run RepeatMasker and have a *.out, and optionally *.align file.
- RepeatMasker output files are converted to the bigRmsk textual form using the - RepeatMasker/util/rmToTrackHub.pl program that is part of the RepeatMasker distribution. -
-- NOTE: The April 2021 version of RepeatMasker (4.1.2-p1) does not contain the - rmToTrackHub.pl program. Until it is included, obtain a copy - from the RepeatMasker GitHub development branch: + RepeatMasker output files are convert to the bigRmsk textual form using the + RepeatMasker/util/rmToTrackHub.pl program that is part of the + RepeatMasker 4.1.3 or newer distribution.
-
-
- git clone -b development git@github.com:rmhubley/RepeatMasker.git
-
-
-
Step 1. If you wish to experiment with quickly building an example track, download the example RepeatMasker output files for the human GRCh38 (hg38) assembly bigRmskExample.out and bigRmskExample.align used in this tutorial:
wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.out
wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align
Otherwise, substitute your *.out and *.align in theses instructions. Generating the alignment bigRmsk file is optional if you don't have the *.align files from RepeatMasker, the track will function with reduced functionality without them. Just skip the steps involved in build the alignment files.
Step 2. Download the autoSql schemes bigRmskBed.as and bigRmskAlignBed.as:
wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as
wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as
You will also need a file of chromosome sizes for your genome, or download the hg38 file for the example:
wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
Step 3. Convert the RepeatMasker files to the text format bigRmsk files for conversion to the bigRmsk files with rmToTrackHub.pl, which sorts the output for direct input to bedToBigBed:
RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align
Step 4. Build the bigRmsk and optional bigRmskAlign files:
bedToBigBed -tab -type=bed9+5 -as=bigRmskBed.as bigRmskExample.join.tsv hg38.chrom.sizes bigRmskExample.bb
bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb
Step 6. Place the newly created bigRmsk file (bigRmskExample.bb), and optional bigRmskAlign (bigRmskExampleAlign.bb) to a web-accessible http, https or ftp location.
Step 7.As with other bigBed-based tracks, bigRmsk tracks can be displayed as custom tracks, included in track hubs, or assembly hubs.
The following options are used for bigRmsk custom tracks or trackDb entries:
type bigRmsk
bigDataUrl<url>
- URL or relative path of bigRmsk file
xrefDataUrl<url>
- URL or relative path of optional bigRmskAlign file
See the Examples section below for detailed examples of bigRmsk custom tracks and track hub definitions.
Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed.
To create a custom track using the example bigRmsk file:
track type=bigRmsk name="bigRmsk Example" description="RepeatMasker example" visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb
chr1:8,890-35,190
to see the track.
This example can also be loaded in a Track or Assembly Hub trackDb.txt with a stanza such as the following:
track bigRmskExample shortLabel Example bigRmsk longLabel This is an example bigRmsk Track Hub Stanza type bigRmsk visibility full - html /bigRmskTrackDesc.html + html http://genome.ucsc.edu/goldenPath/trackDescriptions/bigRmskTrackDesc.html bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb - html /bigRmskTrackDesc.html
See the bigBed documentation for guidance on sharing, trouble shooting and extracting data from bigRmsk files.