e417eb7ff33edef9c0b33bb67360a9fc39fc9105 markd Sun Aug 21 08:49:17 2022 -0700 update bigRmsk track build documentation and provide a standard track description page diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html index c644765..ffe9ef3 100755 --- src/hg/htdocs/goldenPath/help/bigRmsk.html +++ src/hg/htdocs/goldenPath/help/bigRmsk.html @@ -1,306 +1,200 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>bigRmsk Track Format</h1> <p> The bigRmsk format allows for the display of annotations of a genome generated by the -<a href="http://www.repeatmasker.org" "target=_blank">RepeatMasker</a> +<a href="http://www.repeatmasker.org/" target="_blank">RepeatMasker</a> program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. -The output of RepeatMasker is a detailed annotation of the repeats that are present in -the "query" sequence as well as a modified version of this query sequence -in which all the annotated repeats have been masked, where the default replaces -the discovered repeats by Ns. The bigRmsk format enables taking the annotation output -of RepeatMasker and converting it into a compressed and indexed version of a -<a href="/goldenPath/help/bigBed.html">bigBed</a> file, where the results can be -identified as <code>type bigRmsk</code> in a Track Hub and can be visualized as described -below.</p> -<p> -The bigRmsk files are created using the program <code>bedToBigBed</code>. It must be run with the -<code>-as</code> option to pull in a special <a href="http://www.linuxjournal.com/article/5949" -target="_blank">autoSql</a> (<em>.as</em>) file, <code>bigRmskBed.as</code> that defines the fields -of bigRmsk. Along with the bigRmsk file, an auxiliary data bigBed can be made, with its own .as -definitions file (<code>bigRmskAlignBed.as</code>) and referenced with a special <code>xrefDataUrl</code> -setting, whereas the bigRmsk file location is named with the standard <code>bigDataUrl</code> setting.</p> +</p> <p> -The bigRmsk files are in an indexed binary format. The main advantage of this format is that only -those portions of the file needed to display a particular region are transferred to the Genome -Browser server. Because of this, bigRmsk files have considerably faster display performance than -if they were stored in a text-based format. The bigRmsk file remains on your local -web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for -the currently displayed chromosomal position is locally cached as a "sparse file". If you -do not have access to a web-accessible server and need hosting space for your bigRmsk files, please -see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help -documentation.</p> +The bigRmsk format enables taking the annotation output of RepeatMasker and +converting it into a compressed and indexed +<a href="/goldenPath/help/bigBed.html">bigBed</a> file. Please see this page +for a details of the bigBed format, its use, and associated tools. +</p> -<h2 id="bigRmsk">bigRmsk file definitions</h2> +<h2 id="bigRmsk">bigRmsk track definitions</h2> <p> -The following autoSql definition is used to specify the main bigRmsk files. This -definition, contained in the file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a>, is -pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmsk.as</code> -option. </p> -<h6>bigRmsk.as</h6> -<pre><code>table bigRmskBed -"Repetitive Element Annotation" - ( - string chrom; "Reference sequence chromosome or scaffold" - uint chromStart; "Start position of visualization on chromosome" - uint chromEnd; "End position of visualization on chromosome" - string name; "Name repeat, including the type/subtype suffix" - uint score; "Divergence score" - char[1] strand; "+ or - for strand" - uint thickStart; "Start position of aligned sequence on chromosome" - uint thickEnd; "End position of aligned sequence on chromosome" - uint reserved; "Reserved" - uint blockCount; "Count of sequence blocks" - lstring blockSizes; "A comma-separated list of the block sizes(+/-)" - lstring blockStarts; "A comma-separated list of the block starts(+/-)" - uint id; "A unique identifier for the joined annotations in this record" - lstring description; "A comma separated list of technical annotation descriptions" - )</code></pre> -<p>An example: <code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt -hg38.chrom.sizes bigRmsk.bb</code>.</p> + The bigRmsk tracks consist of two bigBed files define by + <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> schema: +</p> +<ul> + <li>The primary bigRmsk file, define by <a href="examples/bigRmskBed.as"><em>bigRmskBed.as</em></a>, + which has the annotations of repeats. + <li>The secondary bigRmskAlign file, define by <a href="examples/bigRmskAlignBed.as"><em>bigRmskAlignBed.as</em></a>, + which contains the alignments of the consensus repeats to the genome. This file is optional, if omitted, + the bigRmsk track will function, without the ability to view the alignments. +</ul> -<h3 id="supporting">Supporting bigRmskAlign.bb auxiliary data</h3> <p> -Alongside the bigRmsk file, a supporting bigBed can provide alignment data. The following autoSql -definition is used to create this supporting file, pointed to online with <code>xrefDataUrl</code>, -rather than the standard <code>bigDataUrl</code> used with bigRmsk. The file -<a href="examples/bigRmskAlignBed.as"><em>bigRmskAlignBed.as</em></a>, is pulled in when -the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmskAlignBed.as</code> -option.</p> -<h6>bigRmskAlignedBed.as</h6> -<pre><code>table bigRmskAlignBed -"Repetitive Element Alignment Auxiliary Data" - ( - string chrom; "Reference sequence chromosome or scaffold" - uint chromStart; "Start position of alignment on chromosome" - uint chromEnd; "End position of alignment on chromosome" - uint chromRemain; "Remaining bp in the chromosome or scaffold" - float score; "alignment score (sw, bits or evalue)" - float percSubst; "Base substitution percentage" - float percDel; "Base deletion percentage" - float percIns; "Bases insertion percentage" - char[1] strand; "Strand - either + or -" - string repName; "Name of repeat" - string repType; "Type of repeat" - string repSubtype; "Subtype of repeat" - uint repStart; "Start in repeat sequence" - uint repEnd; "End in repeat sequence" - uint repRemain; "Remaining unaligned bp in the repeat sequence" - uint id; "The ID of the hit. Used to link related fragments" - lstring calignData; "The alignment data stored as a single string" - )</code></pre> -<p>An example: <code>bedToBigBed -tab -as=bigRmskAlignBed.as -type=bed3+14 -bigRmskAlign.tsv.sorted.txt hg38.chrom.sizes bigRmskAlign.bb -</code>.CHECK - ISSUE IS xrefDataUrl doesn't work on this data yet.</p> + The input files for the bigRmsk files are create from the RepeatMasker <em>*.out</em> and <em>*.align</em> files + using the <em>rmToTrackHub.pl</em> program that is include with RepeatMasker. The bigRmsk + format is not designed to work with any other type of data. </p> -<p> -Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately -25% more RAM than the uncompressed BED input file.</p> + <h2 id="steps">Creating a bigRmsk track</h2> <p> -To create a bigRmsk track, and its supporting file, follow the below steps. All input -files into <code>bedToBigBed</code> must be sorted on the coordinates of the first two columns, -<code>sort -k1,1 -k2,2n input.tsv.txt > input.tsv.sorted.txt</code>. To learn about a perl -program that can build the tab-separated values (tsv) input bedToBigBed text files from the -RepeatMasker output files, contact Robert Hubley: <a href="https://github.com/rmhubley" -target="_blank">https://github.com/rmhubley</a>.</p> + To create a bigRmsk track, and its supporting files, follow the below steps. + This assumes that you have already run RepeatMasker and have a <em>*.out</em>, and + optionally <em>*.align</em> file. +</p> + +<p> + RepeatMasker output files are convert to the bigRmsk textual form using the + <em>RepeatMasker/util/rmToTrackHub.pl</em> program that is part of the RepeatMasker distribution. +</p> +<p> + NOTE: The current version of RepeatMasker (4.1.2-p1) does not contain the + <em>rmToTrackHub.pl</em> program. Until it is available in, obtain a copy + from the RepeatMasker GitHub development branch: +</p> + <pre> + <code> + git clone -b development git@github.com:rmhubley/RepeatMasker.git + </code> + </pre> + <p> <strong>Step 1.</strong> -If you already have an input file you would like to convert to a bigRmsk, skip to <em>Step 3</em>. -Otherwise, download <a href="examples/bigRmsk.txt">this example bigRmsk.txt -file</a> for the human GRCh38 (hg38) assembly.</p> + If you wish to experiment with quickly building an example track, download the + example RepeatMasker output files for the human GRCh38 (hg38) assembly + <a href="examples/bigRmskExample.out">bigRmskExample.out</a> + and <a href="examples/bigRmskExample.align">bigRmskExample.align</a> + used in this tutorial: + <pre> + <code> + wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.out + wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.align + </code> + </pre> <p> -<strong>Step 2.</strong> -If you would like to include the optional auxiliary alignment data <code>bigRmskAlign.bb</code> file, -download the bigRmskAlign.txt file.</p> + Otherwise, substitute your <em>*.out</em> and <em>*.align</em> in theses instructions. + Generating the alignment bigRmsk file is optional if you don't have the <em>*.align</em> + files from RepeatMasker, the track will function with reduced functionality without them. Just skip the + steps involved in build the alignment files. + <p> -<strong>Step 3.</strong> -Download the autoSql file <em><a href="examples/bigRmsk.as">bigRmsk.as</a></em> needed by -<code>bedToBigBed</code>. If you have opted to include the optional auxiliary alignment data file, -bigRmskAlign.bb, with your bigRmsk file, you must also download the autoSql file -<a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>.</p> + <strong>Step 2.</strong> + Download the autoSql schemes <a href="examples/bigRmskBed.as">bigRmskBed.as</a> and + <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>: + <pre> + <code> + wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskBed.as + wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlignBed.as + </code> + </pre> <p> -Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below: -<pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/ -wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.txt -wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.txt -wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.as -wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.as + You will also need a file of chromosome sizes for your genome, or download the hg38 + file for the example: + <pre> + <code> wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes -</code></pre> + </code> + </pre> <p> -<strong>Step 4.</strong> -Download the <code>bedToBigBed</code> program from the UCSC -<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> + <strong>Step 3.</strong> + Convert the RepeatMasker files to the text format bigRmsk files for conversion to the bigRmsk files with + <em>rmToTrackHub.pl</em>, which sorts the output for direct input to <em>bedToBigBed</em>: + <pre> + <code> + RepeatMasker/util/rmToTrackHub.pl -out bigRmskExample.out -align bigRmskExample.align + </code> + </pre> <p> -<strong>Step 5.</strong> -Download the <em>chrom.sizes</em> file for any assembly hosted at UCSC from our -<a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on "Full -data set" for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 -database is located at -<a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" -target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p> + <strong>Step 4.</strong> + Build the bigRmsk and optional bigRmskAlign files: <pre> -<code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre> + <code> + bedToBigBed -tab -type=bed9+5 -as=bigRmskBed.as bigRmskExample.join.tsv hg38.chrom.sizes bigRmskExample.bb + bedToBigBed -tab -type=bed3+14 -as=bigRmskAlignBed.as bigRmskExample.align.tsv hg38.chrom.sizes bigRmskExampleAlign.bb + </code> + </pre> + <p> <strong>Step 6.</strong> -Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible http, https or ftp -location. If you generated the <em>bigRmskAlign.bb</em> files move those to a web accessible -location, likely same location as the <em>bigRmsk.bb</em> file.</p> -<p> +Place the newly created bigRmsk file (<em>bigRmskExample.bb</em>), and optional +bigRmskAlign (<em>bigRmskExampleAlign.bb</em>) to a web-accessible http, https +or ftp location. +</p> <strong>Step 7.</strong> -Construct a <a href="hgTracksHelp.html#CustomTracks?db=hg38">custom track</a> using a single -<a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed -<a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic -version of the track line will look something like this:</p> -<pre>track type=bigRmsk name="My bigRmsk" description="A RepeatMasker Track" bigDataUrl=http://myorg.edu/mylab/bigRmsk.bb</pre> <p> -<strong>Step 8.</strong> -Paste the custom track line into the text box on the <a href="../../cgi-bin/hgCustom?db=hg38">custom -track management page</a>. Navigate to chr1:1-21,571 to see the example data for this track.</p> + As with other bigBed-based tracks, bigRmsk tracks can be displayed as + <a href="hgTracksHelp.html#CustomTracks">custom tracks</a>, + included in <a href="hubQuickStart.html">track hubs</a>, + or <a href="hubQuickStartAssembly.html">assembly hubs</a>. +</p> + +<p> + The following options are used for bigRmsk custom tracks or trackDb entries: + <ul> + <li> <code>type bigRmsk</code> + <li> <code>bigDataUrl<em><url></em></code> - URL or relative path of bigRmsk file + <li> <code>xrefDataUrl<em><url></em></code> - URL or relative path of optional bigRmskAlign file + </ul> + + A standard bigRmsk track description is viable at <a href="/bigRmskTrackDesc.html">/bigRmskTrackDesc.html</a>, + which can be directly to with as the file URL <em>/bigRmskTrackDesc.html</em>. + <p> -The <code>bedToBigBed</code> program can be run with several additional options. For a full -list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line -to display the usage message. </p> + See the <a href="#examples">Examples</a> section below for detailed examples of bigRmsk custom tracks + and track hub definitions. +</p> <h2 id="examples">Examples</h2> -<h3 id="example1">Example #1</h3> +<h3 id="example1">Example of a bigRmsk custom track</h3> <p> -In this example, you will create a bigRmsk custom track using an existing bigRmsk file, -<em>bigRmsk.bb</em>, located on the UCSC Genome Browser http server. This file contains data for -the hg38 assembly.</p> +Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single +<a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed +<a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. <p> -To create a custom track using this bigRmsk file: +To create a custom track using the example bigRmsk file: <ol> <li> - Construct a track line that references the file:</p> - <pre><code>track type=bigRmsk name="bigRmsk Example One" description="A bigRmsk file" visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb</code></pre></li> + Construct a track line that references the file:<br> + <pre><code>track type=bigRmsk name="bigRmsk Example" description="RepeatMasker example" visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb xrefDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb</code></pre> + </li> <li> - Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management - page</a> for the human assembly hg38 (Dec. 2013).</li> + Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a> + for the human assembly hg38 (Dec. 2013). + </li> <li> - Click the "submit" button.</li> + Click the "submit" button. + </li> <li> - Navigate to <code>chr1:1-21,571</code> to see the track. + Navigate to <code>chr1:8,890-35,190</code> to see the track. + </li> </ol> +<h3 id="example2">Example of a bigRmsk track hub </h3> <p> -Custom tracks can also be loaded via one URL line. -<a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full" -target="_blank">This link</a> loads the same <em>bigRmsk.bb</em> track and sets additional display -parameters in the URL:</p> -<pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full</code></pre> -<p> -After this example bigRmsk is loaded in the Genome Browser, click into an item on the browser's -track display. Note that the details page display lacks information about the individual alignments, -as this example does not include the optional supporting alignment file.</p> -<p> -This example can also be loaded in a Track Hub with a stanza such as the following:</p> + This example can also be loaded in a Track or Assembly Hub <em>trackDb.txt</em> + with a stanza such as the following:</p> <pre> -track ExBigRmsk + track bigRmskExample shortLabel Example bigRmsk -longLabel This is an example Track Hub Stanza + longLabel This is an example bigRmsk Track Hub Stanza type bigRmsk visibility full -bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb + html /bigRmskTrackDesc.html + bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExample.bb + xrefDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmskExampleAlign.bb + html /bigRmskTrackDesc.html </pre> -<!--- -NOTE: FOR WHEN REDOING PAGE, only Track Hubs now allow clicking into hgc. -NOTE: The below is innaccurate and just a holder for when <b>xrefDataUrl works</b> to give an example building it. - -Adding potential input file (this is from RobertH T2T hub), both the align.bb and bigRmsk.bb for a region are stashed below (not for hg38 though). - -$ bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.align.bb stdout -chr14082453324838279596227.3212.421.00-LTR60BLTRERV126476503TA/GTTACT/CGGGG/AAGG/TGCT/GGA/GT/AG/ATCC+T+CA/GGTTCTT+A+GTT/CTA/TACTTGGA/GAGAAAGAT/ATTT/CC/GA/GCCAAGAGG/ACAG/ATA/TC/TAA/GA/CG/CATG/AG/AC/AAGAT/GAAC/TTT-C-ATTGAAA/GA/GG/AAAAC/TAC/GAGT/AGT/CAA/GAGAGC/TTTATT+TAAAGAGACA+GTA+CACTCT+GAAAA/GATA/GG/AGGA/CG/AGAGT/CGGGCTG+CTGAAAG+AGC/AGTGC/AA/GT/CT/CAA/G+C+AA/GCAGCCT/C+C+A/GAGAGTC/TCTGT/CT/GC/TA/TGGA/GA/GA/TTTTTATT/N+ATG+TG/CGGACTTC/TTTC+TTG+AC/AA/GTTCCT/CGCCTCTGTCTC/TAAG-T-CTCCA/GCCTG/TTTTTCTTTGTCTG/AG+T+TTTTC/TCT+TAA+GC/TT/CA/CCT/CGCCTT-AG-C/NTCCCCGA/CCT+AG+TG/TCCCC/GA/CCT/CT/CAGGCTTGTGGGACC+CT+T/CCCTC/TACTGTG/CG/AGTTGA/GG/TGT/CA/GCATGT/CG+CGGGCC+T/CGGTGA/TTC/GA/GATACGAATC/TCA/TC/AT/CCTG/AG/AC/TA/GC/GCA/NGC+GTTG+CTCC/ATTC/ACCGCCAT/CCCCAGGC/AAC/GGC/TT/CG+T+AC/TAGCGA/GTCAC/AG/ATT/CTGTACC/TTAC/TTGT/CGCCTGC+GTAT+CTCTTT/AT/GGAAT-G-TC/TCTT/CCTC/TTGCCCT -chr14533466024838266850520.477.870.00-LTR60BLTRERV11783144514AATCTGTACTTATG/TGG/CGCCA/TG+C+GTT/ATCTCTTAA/GGAATG/TTCC/TCT/CTTTG+CCCTCTT+G/TCCTT/CCTTAC/TCAA/GCATGTAGCTAGCA/TAT/CATTCTGACAT/GT/GTTT/AAT/CTGCAGAGG/TGAA/GT/CGATTG/A+CT+GGGCA/GTCTTC/AAGA/GGGA/CGTTC -chr146635139248382189130422.814.201.43+L1MC_orf2LINEL12804329225GGTGA/CG/TGGAAC-GATT-AAT/CTGGAA/CA+T+CCAT/CAA/TGA/CAAT/AG/AAT/AATGC/AATA/CTAGAT/CG/AA/CAA/GACT/CTTACAA/CCT/CC/TA/TCACAAC/AT/AAA/TTC/AACTCAAAAT+GGAT+CATC/A+GA+CT/CTAC/AAC/TT/GA/TAAAAT/CGCT/AAAACTATAC/AAAT/CTT/CCTAGAAGA+T+AACA-A-TAGA/GAGAAAAG/TCTAT/GG/ATGC/ACT/CTTGGGTTTGGT/CA/GATGAA/CTTTTA/TAC/GAA/TAT/CG/AAT/CACA/CAAAGGT/C+A+T/CGAT+CC+ATA/GC/AAC/AA/GAAAG/NAAA/TTGAC/TAT/AT/GG/CTGGT/AT/CTTC+A+TTAAT/AATTT/AAAAG/AT/CTTA/CTA/GCTCTG+CG+G/AAAGACAC-CT-TGTT/CAAGAGAAC/TA/GAAAAGACAAGCCACAT/GAT/CTGA/G+G+AGAAAATATTTGCAAAAT/GACAC/TATCTGAG/TAAAGA/GAT/CTT/GG/TTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCA/GAC/TT+N+AAAAATGC/GA/GCAC/AAC/A+G+AT/CCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCA/NACAT -chr14997526324838206588715.4612.782.74+L1MC3_3endLINEL129322395TTGTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCAAC/TTAAAAA+A+TGC/GA/GCAC/AAC/A+G+ATCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCAACAT+CATTTGTC+AC/TTAGA/GGAAC/TTG+CAAATT+AAAACC/AACAATGAGATAG/CCAC-AGCTGG-TC/AT/CAT/CAT/CCTC/ATTAGAAC/TT/GGCTAAAC/ATCC-CT-AAAAAA+C+TGACA+ATACC+AAT/NTGCTG+GCGAGGAT+GA/CGGAA/GA/CAACAA/GGAACTCTT/C+A+TTCATTGCC/TGGTGGA/GA -chr152745528248381800140314.681.970.78+MER34C_vLTRERV12633226AGA/NCCAA/GAATATGCCACCCCAAAATATA/GAT/CG/TGTAGGAA/GACCAGAATATGCCACCCCAAAATATGT/CCC/TCTTTGT/GCT/ATAAGA/GATTATTC/TC/TA/GAGCTGATTATTTTGAA/GAAAA/CTA/GA/CAT/GG/AC-TA-ACAA/GA/GG/AGAAGT/CTCTGAAAACAGAGTAGAAGTTACCCTTG/TTGTAAGGA/GAAATTTACATCTATAAAGGAAATCC/TCCATTTA/G+T+AAA/GGC/GTA/GC/TCT+CC+CTCTCTA/GC/TACCAA/GGAAGAGAAGGATA/GA+CT+CTAAATCACTAA/GAGAG/CTCTT -chr155285686248381642354424.566.810.65+L1MC3_3endLINEL181296815645TAATA/GGTGG-G-ATAT/CC/ATGACACA/TAC/TGCATTTA/GTCAAG/AAT/CA/CCAC/TAGAAT/CTTTAT/CG/AGC+A+CAAA-T-GG/AGTA/GAAT/CCA/TA/TAT/ATC/GTATT/GCAAATTA/TA/TAC/AAAAATT/CAC/TTC/TAGGAT/GGT+C+GGC/GGT/GATCCCAGGAC/TA/GGAATGCAT/GC/AA/NTGTGA+C+AAAAG/NAATT/CTA-T-G/ACTA/GC/TAA/T-A-T -chr156866131248381197244214.805.841.29+MSTA1LTRERVL-MaLR46507TA/GCTATGGTTTGGATGT-GGT-TTGTCCCCGCA/CAAAACTCATGTTGAAATTTGAC/TCCCCAATGTGGCAGTGTG/TGGG/A-C-GGTGGGGCCTAGTGGA/GT/AGGTGTTTGGGTCATGGGGA/GT/CGGATCCCTCATGAATAGATTAATGT/CCCTCC+CTCGNG+A/GTGGGG/NGTGAGTGAGTA/TCT+C+GCTCT+NN+CA/GT/CA/GGGAATGGATTAA/GTTCCT/CGCA/GG/AGAGT/CA/GGGTA/TA/GTTAAAAAGAGTCTGGC+GNC+TT/CCCTT/CG/CG/TCT+CTC+TCC/TCTT+GC+TTGCTTT/CCA/TCTT/CTT/CGCT/CATGTGATCTCTG-G-T/CG/ACACC/GCCT/C-T-GCTCCCCTTCC+NCTTC+GCTTTCCA/GCCATGAGG/TT/NGAAA/GA/CAGA/CCTGAA/GGCCC+T+CACCAGATGCAA/GCTGCCCA/GA/NT/ACT/CC/NG/TGA/CC/TA/TTTC+GNC+CAGCT/CACCAGT/AATT/CGTGAGCCAAATG/AAAT/CCTT/CTTTTA/CC/TTTATAAATTACCCAGCCTCAGGTATTCT/CGTTAC/TAGA/CAG/ACACAAG/AAT/CGGACTAAGACA -chr161317132248380196354424.566.810.65+L1MC3_3endLINEL196920414915CAAATGTAG/TGT/AAAA/CAAC+C+TCACTGAAGGT/GGG+TG+A/GGGGAAAAT/AGGTGT/CTGACCTAAGTC/AACTTTGA/GAAATGAA/GTA/GGAA/GTCTG+T+G/AAGG/ACTG/AAAGGCAC/AA+A+T/GGAACTA/GTACT/ATC/AAT/GA/CAT/CTGG/TAT/CTA/CC/TAT/GTTT/GATAAAGTTA/GTTTCCA/CACA/GGA/GA/GGC/TAA/CC/GT/GGTG/TAACAATTG/CTA/GAA/NACCA/GCA/TG/ATG/AT/CC/ATGTAT+A+CTGGAG/ATA/TA/GAACAATG/TAC/AT/GTAC/AATA/GA/GG/ATC/GGCA/GGATGGTGGGAA/GCCAGC/GTTTCTCACTGTTGA/GAGTGGGAGG/NTTACAA/GATT/AAGCAAGA/GC/GGAGA/GAGGCTAGAATGATT/CCC/ATGTGA/GTAG/ATA/GGATC/TAGAGG/TTGGAGACATCAA/GC/TG/ATA/GAACTT/CATGC/TTTAGT/CTTAATATAGATACAC/GAC/TA/GGTTC/AT/CAC/TATAGAAAA/TC/ATTTATAA/GT/ATAG/TGTGTG/ATG/ATAG/CG/AT/CA/GGGTTAG+T+AC/TACACACATATAC/TTTCCTA/TGCA/TT/CTGC/TT/CAA/GT/C+TGA+GAGGGA/CCA/TAGAT/AA/GCAAT/C+GACACCCCAGTAGCAACGA+GT/CG/ACAT/CT/CC/TAGCA/GG/CCCAC/GATG/CTA/TA/GGTTT+C+TC/AC/AC/TACCATTC+TCCAA+TG/AAAAGGAAT/CCA+G+GGCTCT/CTTGA/GAGAAATGT/GCTGATA/TCTAGA/GACTGGGA/GCAGT/GAAATAT+ACAAG+AG/TGAGCCA/TGGAT/GA/CATCTG/TGA/TAGTA/GT/CCAGAAAGT+AAGG+AAGTA/GCT+C+AAAAAAAT/CT/CA/CAA/CA+ATGATGGGGG+TATA/GTCAAAC/GA/GA/GAA/CAT/CAA/GA/GAGCCAAT/CA/TA/GAAAC/GAGCTA/CCCG/AATGGCCAAC/AA/GCA/TGGAAG/CG/AAA/TTTGT/AGCAACAT/AAAT+A+G/AC/ATA+AAG+TAGTG/ATC/TGA/GATA/TATAA+C+CT/CAAAGC/TT/ATAAAG/ATAAT/ATATCT/CAG/TGT/AGTCT/CG/ATAT/CTT/GG/ATATAC/AC/ATAG/AG/ATG+ATTGAATA+AATAAG/AC/TAAATGGA/GGT/GT/AGC/AAT/GAG+AC+AAATCTCCT/CT/GTGCAA/GAAGAATTCCAAATAAC/TTG/TATGTAGAC/TACTCA/CGCCA/CTCAAGA/GAGGTGGAGC+A+C/TAACTCCT/CCACTCCG/TTAAGTGTGGGCTC/GT/CGCATAGTGACTTG/CCTC/TCA/CAAAGA-ACAC-A/GTG/ACAGTATGGAC/AAA/GGGA/GGGAAAAA+AGAG+TAACTTC/TACAGTGGAGAAAT/CCTGACAAACAG/CTAG/CCTCT/AGCCAA/GA/GTGATCC/AAA/GGTG/CAACAC/TCAAA/CG/AC/GTGAC/TAG/AT/GTCAC/TC/GTTGAG/TAA/GC/TATG -chr171417533248379795121520.504.597.89+L1MC3_3endLINEL12150252935TGG/AGGGACATTCTACAAAAA/TT/ACCTGACCAA/GTC/ACTCCTCAG/AT/AG/ACTA/GTG/CAAGGTCATCAT/AG/AAG/A+C+AT/AGGAAAGC/TCTA/GAC/GAC/AACTGTCACAGCCAG/AGAA/GGAGCCTAT/AG+GAGACA+TGAT/CGT/ACTAC/AATGTC/AG/ATGC/TGGG/TATCCTGGATGGGATCCTGGG/AT/ACAGAG/AT/AAAGA/G+ACAT+TAG+GTAAA+AACTAAGGG/AAATCC/TA/GAATG/AAAA/GTATGA/GACTTTAGTTAATAAC/TAG/ATC/GTATCAG/ATATTGGTTCATTAAC/TTGTGG/ACAA-ATT-ATGTA-AGATATTAATAAG-CCAT-GTGAGACAC-ACTG/AATA/GG/TAAGATGTTAATAAG/TAGA/GGGAAACTA/GGGT+G+TG-C-GGC/GTAC/TATGGGAAA/CTCTCTG-CTTT-TT/AT/CTT/ATT/CTTG/CA/GCG/AATTTC/TTG/CTGTAAG/ATA/CA/TAAAAA/CA/TG/AA/TC/TG/CTAAAATAAAAC/A+G+TTTATTTA/TA/NAA - -That is matched with - -$ bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.bb stdout -chr107536L1MC3#LINE/L1223+46637533094912,600,518,158,0,1001,108,392,3-1,4663,-1,5528,-1,6131,-1,7141,-151304 19.6 8.0 2.0 chr1 4664 5263 (248382065) + L1MC3 LINE/L1 4913 5546 (2239) 5 ,3544 24.6 6.8 0.7 chr1 5529 5686 (248381642) + L1MC3 LINE/L1 6065 6221 (1564) 5 ,3544 24.6 6.8 0.7 chr1 6132 7132 (248380196) + L1MC3 LINE/L1 6222 7294 (491) 5 ,1215 20.5 4.6 7.9 chr1 7142 7533 (248379795) + L1MC3 LINE/L1 7403 7782 (3) 5 -chr140824796LTR60B#LTR/ERV1273-40824533030,451,263-1,0,-1962 27.3 12.4 1.0 chr1 4083 4533 (248382795) C LTR60B LTR/ERV1 (0) 765 264 3 -chr140824837LTR60B#LTR/ERV1205-4533466003451,127,177-1,451,-505 20.5 7.9 0.0 chr1 4534 4660 (248382668) C LTR60B LTR/ERV1 (451) 314 178 4 -chr152675850MER34C_v#LTR/ERV1147+52745528036,254,322-1,7,-161403 14.7 2.0 0.8 chr1 5275 5528 (248381800) + MER34C_v LTR/ERV1 7 263 (322) 6 -chr156856131MSTA1#LTR/ERVL-MaLR148+56866131030,445,0-1,1,-12442 14.8 5.8 1.3 chr1 5687 6131 (248381197) + MSTA1 LTR/ERVL-MaLR 1 465 (0) 7 - -End of excerpt from 2 bigBed files in T2T that could be potential input in future examples (could be colors are wrong in this second file). - -<h3 id="example2">Example #2</h2> +<h2 id="share">Additional information</h2> <p> -In this example, you will create a bigRmsk file from an existing bigRmsk input file, -<em>bigRmsk.txt</em>, located on the UCSC Genome Browser http server.</p> -<ol> - <li> - Save the bed3+1 example file, <a href="examples/bigRmsk.txt"><em>bigRmsk.txt</em></a>, to your - computer (<em>Step 6</em>, above).</li> - <li> - Save the autoSql file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a> to your computer - (<em>Step 3</em>, above).</li> - <li> - Download the - <a href="http://hgdownload.soe.ucsc.edu/admin/exe/"><code>bedToBigBed</code> utility</a> - (<em>Step 4</em>, above).</li> - <li> - Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. - This file contains the chrom.sizes for the human (hg38) assembly (<em>Step 5</em>, above).</li> - <li> - Run the <code>bedToBigBed</code> utility to create a binary indexed MAF file (<em>Step 6</em>, - above): -<pre><code>bedToBigBed -type=bed3+1 -tab -as=bigRmsk.as bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre></li> - <li> - Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible location (<em>Step - 7</em>, above).</li> - <li> - Construct a track line that points to the bigRmsk file (<em>Step 8</em>, above).</li> - <li> - Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser - (<em>step 9</em>, above).</li> -</ol> ---> -<h2 id="share">Sharing your data with others</h2> -<p> -If you would like to share your bigRmsk data track with a colleague, learn how to create a URL by -looking at Example 6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p> - -<h2 id="extract">Extracting data from the bigRmsk format</h2> -<p> -Because bigRmsk files are an extension of bigBed files, which are indexed binary files, it can -be difficult to extract data from them. UCSC has developed the following programs to assist -in working with bigBed formats, available from the -<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> -<ul> - <li> - <code>bigBedToBed</code> — converts a bigBed file to ASCII BED format.</li> - <li> - <code>bigBedSummary</code> — extracts summary information from a bigBed file.</li> - <li> - <code>bigBedInfo</code> — prints out information about a bigBed file.</li> -</ul> -<p> -As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the -command line to view the usage statement.</p> + See the <a href="bigBed.html">bigBed documentation</a> for guidance on + sharing, trouble shooting and extracting data from bigRmsk files. +</p> -<h2 id="trouble">Troubleshooting</h2> -<p> -If you encounter an error when you run the <code>bedToBigBed</code> program, check your input -file for data coordinates that extend past the the end of the chromosome. If these are present, run -the <code>bedClip</code> program -(<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic -row(s) in your input file before running the <code>bedToBigBed</code> program.</p> +<h2 id="credits">Credits</h2> +The bigRmsk system was developed by Robert Hubley of the Institute for Systems Biology. <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->