909449391962ec9312052bf186158d9f5162fcc4 brianlee Fri May 27 07:27:01 2022 -0700 Not for Code Review, see ticket #29356, check-in of temporary examples not fully working for bigRmsk -on T2T CHM13 and align.bb not working yet diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html index 7e0361f..e33d738 100755 --- src/hg/htdocs/goldenPath/help/bigRmsk.html +++ src/hg/htdocs/goldenPath/help/bigRmsk.html @@ -1,282 +1,307 @@ <!DOCTYPE html> <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" --> <!--#set var="ROOT" value="../.." --> <!-- Relative paths to support mirror sites with non-standard GB docs install --> <!--#include virtual="$ROOT/inc/gbPageStart.html" --> <h1>bigRmsk Track Format</h1> <h3>This page is under development and is not ready for public use.</h3> <p> The bigRmsk format allows for the display of annotations of a genome generated by the <a href="http://www.repeatmasker.org" "target=_blank">RepeatMasker</a> program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of RepeatMasker is a detailed annotation of the repeats that are present in the "query" sequence as well as a modified version of this query sequence in which all the annotated repeats have been masked, where the default replaces the discovered repeats by Ns. The bigRmsk format enables taking the annotation output of RepeatMasker and converting it into a compressed and indexed version of a <a href="/goldenPath/help/bigBed.html">bigBed</a> file, where the results when identified as <code>type bigRmsk</code> in a Track Hub can be visualized as described <a href="#linkToVisualizationSECTION_TOCOME">below</a>.</p> <p> The bigRmsk files are created using the program <code>bedToBigBed</code>. It must be run with the <code>-as</code> option to pull in a special <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a> (<em>.as</em>) file, <code>bigRmskBed.as</code> that defines the fields of bigRmsk. Along side the bigRmsk file, an auxilary data bigBed can be made, with its own .as definitions file (<code>bigRmskAlignBed.as</code>) and referenced with a special <code>xrefDataUrl</code> setting, whereas the bigRmsk file location is named with the standard <code>bigDataUrl</code> setting.</p> <p> The bigRmsk files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigRmsk files have considerably faster display performance than if they were stored in a text-based format. The bigRmsk file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigRmsk files, please see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p> <h2 id="bigRmsk">bigRmsk file definitions</h2> <p> The following autoSql definition is used to specify the main bigRmsk files. This definition, contained in the file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a>, is pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmsk.as</code> option. </p> <h6>bigRmsk.as</h6> <pre><code>table bigRmskBed "Repetitive Element Annotation" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position of visualization on chromosome" uint chromEnd; "End position of visualation on chromosome" string name; "Name repeat, including the type/subtype suffix" uint score; "Divergence score" char[1] strand; "+ or - for strand" uint thickStart; "Start position of aligned sequence on chromosome" uint thickEnd; "End position of aligned sequence on chromosome" uint reserved; "Reserved" uint blockCount; "Count of sequence blocks" lstring blockSizes; "A comma-separated list of the block sizes(+/-)" lstring blockStarts; "A comma-separated list of the block starts(+/-)" uint id; "A unique identifier for the joined annotations in this record" lstring description; "A comma separated list of technical annotation descriptions" )</code></pre> <p>An example: <code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code>.</p> <h3 id="supporting">Supporting bigRmskAlign.bb auxilary data</h3> <p> Alongside the bigRmsk file, a supporting bigBed can provide alignment data. The following autoSql definition is used to create this supporting file, pointed to online with <code>xrefDataUrl</code>, rather than the standard <code>bigDataUrl</code> used with bigRmsk. The file <a href="examples/bigRmskAlignBed.as"><em>bigRmskAlignBed.as</em></a>, is pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmskAlignBed.as</code> option.</p> <h6>bigRmskAlignedBed.as</h6> <pre><code>table bigRmskAlignBed "Repetitive Element Alignment Auxilary Data" ( string chrom; "Reference sequence chromosome or scaffold" uint chromStart; "Start position of alignment on chromosome" uint chromEnd; "End position of alignment on chromosome" uint chromRemain; "Remaining bp in the chromosome or scaffold" float score; "alignment score (sw, bits or evalue)" float percSubst; "Base substitution percentage" float percDel; "Base deletion percentage" float percIns; "Bases insertion percentage" char[1] strand; "Strand - either + or -" string repName; "Name of repeat" string repType; "Type of repeat" string repSubtype; "Subtype of repeat" uint repStart; "Start in repeat sequence" uint repEnd; "End in repeat sequence" uint repRemain; "Remaining unaligned bp in the repeat sequence" uint id; "The ID of the hit. Used to link related fragments" lstring calignData; "The alignment data stored as a single string" )</code></pre> <p>An example: <code>bedToBigBed -tab -as=bigRmskAlignBed.as -type=bed3+14 bigRmskAlign.tsv.sorted.txt hg38.chrom.sizes bigRmskAlign.bb </code>.CHECK - ISSUE IS xrefDataUrl doesn't work on this data yet.</p> </p> <p> Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately 25% more RAM than the uncompressed BED input file.</p> <h2 id="steps">Creating a bigRmsk track</h2> <p> To create a bigRmsk track, and its supporting file, follow the below steps. All input files into <code>bedToBigBed</code> must be sorted on the coordinates of the first two columns, <code>sort -k1,1 -k2,2n input.tsv.txt > input.tsv.sorted.txt</code>. To learn about a perl program that can build the tab-separated values (tsv) input bedToBigBed text files from the RepeatMasker output files, contact Robert Hubley: <a href="https://github.com/rmhubley" target="_blank">https://github.com/rmhubley</a>.</p> <p> <strong>Step 1.</strong> If you already have an input file you would like to convert to a bigRmsk, skip to <em>Step 3</em>. Otherwise, download <a href="examples/bigRmsk.txt">this example bigRmsk.txt file</a> for the human GRCh38 (hg38) assembly.</p> <p> <strong>Step 2.</strong> If you would like to include the optional auxilary alignment data <code>bigRmskAlign.bb</code> file, download this <a href="examples/bigRmskAlign.txt">bigRmskAlign.txt file</a>.</p> <p> <strong>Step 3.</strong> Download the autoSql file <em><a href="examples/bigRmsk.as">bigRmsk.as</a></em> needed by <code>bedToBigBed</code>. If you have opted to include the optional auxilary alignment data file, bigRmskAlign.bb, with your bigRmsk file, you must also download the autoSql file <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>.</p> <p> Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below: <pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/ wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.txt wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.txt wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.as wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.as wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes </code></pre> <p> <strong>Step 4.</strong> Download the <code>bedToBigBed</code> program from the UCSC <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> <p> <strong>Step 5.</strong> Download the <em>chrom.sizes</em> file for any assembly hosted at UCSC from our <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on "Full data set" for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 database is located at <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p> <pre> <code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre> <p> <strong>Step 6.</strong> Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible http, https or ftp location. If you generated the <em>bigRmskAlign.bb</em> files move those to a web accessible location, likely same location as the <em>bigRmsk.bb</em> file.</p> <p> <strong>Step 7.</strong> Construct a <a href="hgTracksHelp.html#CustomTracks?db=hg38">custom track</a> using a single <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:</p> <pre>track type=bigRmsk name="My bigRmsk" description="A RepeatMasker Track" bigDataUrl=http://myorg.edu/mylab/bigRmsk.bb</pre> <p> <strong>Step 8.</strong> Paste the custom track line into the text box on the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a>. Navigate to chr1:1-21,571 to see the example data for this track.</p> <p> The <code>bedToBigBed</code> program can be run with several additional options. For a full list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line to display the usage message. </p> <h2 id="examples">Examples</h2> <h3 id="example1">Example #1</h3> <p> In this example, you will create a bigRmsk custom track using an existing bigRmsk file, <em>bigRmsk.bb</em>, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.</p> <p> To create a custom track using this bigRmsk file: <ol> <li> Construct a track line that references the file:</p> <pre><code>track type=bigRmsk name="bigRmsk Example One" description="A bigRmsk file" visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb</code></pre></li> <li> Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management page</a> for the human assembly hg38 (Dec. 2013).</li> <li> Click the "submit" button.</li> <li> Navigate to <code>chr1:1-21,571</code> to see the track. </ol> <p> Custom tracks can also be loaded via one URL line. <a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full" target="_blank">This link</a> loads the same <em>bigRmsk.bb</em> track and sets additional display parameters in the URL:</p> <pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full</code></pre> <p> After this example bigRmsk is loaded in the Genome Browser, click into an item on the browser's track display. Note that the details page display lacks information about the individual alignments, as this example does not include the optional supporting alignment file.</p> <p> This example can also be loaded in a Track Hub with a stanza such as the following:</p> <pre> track ExBigRmsk shortLabel Example bigRmsk longLabel This is an example Track Hub Stanza type bigRmsk visibility full bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb </pre> NOTE: FOR WHEN REDOING PAGE, only Track Hubs now allow clicking into hgc. <!--- NOTE: The below is innaccurate and just a holder for when <b>xrefDataUrl works</b> to give an example building it. + +Adding potential input file (this is from RobertH T2T hub), both the align.bb and bigRmsk.bb for a region are stashed below (not for hg38 though). + +$ bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.align.bb stdout +chr14082453324838279596227.3212.421.00-LTR60BLTRERV126476503TA/GTTACT/CGGGG/AAGG/TGCT/GGA/GT/AG/ATCC+T+CA/GGTTCTT+A+GTT/CTA/TACTTGGA/GAGAAAGAT/ATTT/CC/GA/GCCAAGAGG/ACAG/ATA/TC/TAA/GA/CG/CATG/AG/AC/AAGAT/GAAC/TTT-C-ATTGAAA/GA/GG/AAAAC/TAC/GAGT/AGT/CAA/GAGAGC/TTTATT+TAAAGAGACA+GTA+CACTCT+GAAAA/GATA/GG/AGGA/CG/AGAGT/CGGGCTG+CTGAAAG+AGC/AGTGC/AA/GT/CT/CAA/G+C+AA/GCAGCCT/C+C+A/GAGAGTC/TCTGT/CT/GC/TA/TGGA/GA/GA/TTTTTATT/N+ATG+TG/CGGACTTC/TTTC+TTG+AC/AA/GTTCCT/CGCCTCTGTCTC/TAAG-T-CTCCA/GCCTG/TTTTTCTTTGTCTG/AG+T+TTTTC/TCT+TAA+GC/TT/CA/CCT/CGCCTT-AG-C/NTCCCCGA/CCT+AG+TG/TCCCC/GA/CCT/CT/CAGGCTTGTGGGACC+CT+T/CCCTC/TACTGTG/CG/AGTTGA/GG/TGT/CA/GCATGT/CG+CGGGCC+T/CGGTGA/TTC/GA/GATACGAATC/TCA/TC/AT/CCTG/AG/AC/TA/GC/GCA/NGC+GTTG+CTCC/ATTC/ACCGCCAT/CCCCAGGC/AAC/GGC/TT/CG+T+AC/TAGCGA/GTCAC/AG/ATT/CTGTACC/TTAC/TTGT/CGCCTGC+GTAT+CTCTTT/AT/GGAAT-G-TC/TCTT/CCTC/TTGCCCT +chr14533466024838266850520.477.870.00-LTR60BLTRERV11783144514AATCTGTACTTATG/TGG/CGCCA/TG+C+GTT/ATCTCTTAA/GGAATG/TTCC/TCT/CTTTG+CCCTCTT+G/TCCTT/CCTTAC/TCAA/GCATGTAGCTAGCA/TAT/CATTCTGACAT/GT/GTTT/AAT/CTGCAGAGG/TGAA/GT/CGATTG/A+CT+GGGCA/GTCTTC/AAGA/GGGA/CGTTC +chr146635139248382189130422.814.201.43+L1MC_orf2LINEL12804329225GGTGA/CG/TGGAAC-GATT-AAT/CTGGAA/CA+T+CCAT/CAA/TGA/CAAT/AG/AAT/AATGC/AATA/CTAGAT/CG/AA/CAA/GACT/CTTACAA/CCT/CC/TA/TCACAAC/AT/AAA/TTC/AACTCAAAAT+GGAT+CATC/A+GA+CT/CTAC/AAC/TT/GA/TAAAAT/CGCT/AAAACTATAC/AAAT/CTT/CCTAGAAGA+T+AACA-A-TAGA/GAGAAAAG/TCTAT/GG/ATGC/ACT/CTTGGGTTTGGT/CA/GATGAA/CTTTTA/TAC/GAA/TAT/CG/AAT/CACA/CAAAGGT/C+A+T/CGAT+CC+ATA/GC/AAC/AA/GAAAG/NAAA/TTGAC/TAT/AT/GG/CTGGT/AT/CTTC+A+TTAAT/AATTT/AAAAG/AT/CTTA/CTA/GCTCTG+CG+G/AAAGACAC-CT-TGTT/CAAGAGAAC/TA/GAAAAGACAAGCCACAT/GAT/CTGA/G+G+AGAAAATATTTGCAAAAT/GACAC/TATCTGAG/TAAAGA/GAT/CTT/GG/TTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCA/GAC/TT+N+AAAAATGC/GA/GCAC/AAC/A+G+AT/CCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCA/NACAT +chr14997526324838206588715.4612.782.74+L1MC3_3endLINEL129322395TTGTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCAAC/TTAAAAA+A+TGC/GA/GCAC/AAC/A+G+ATCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCAACAT+CATTTGTC+AC/TTAGA/GGAAC/TTG+CAAATT+AAAACC/AACAATGAGATAG/CCAC-AGCTGG-TC/AT/CAT/CAT/CCTC/ATTAGAAC/TT/GGCTAAAC/ATCC-CT-AAAAAA+C+TGACA+ATACC+AAT/NTGCTG+GCGAGGAT+GA/CGGAA/GA/CAACAA/GGAACTCTT/C+A+TTCATTGCC/TGGTGGA/GA +chr152745528248381800140314.681.970.78+MER34C_vLTRERV12633226AGA/NCCAA/GAATATGCCACCCCAAAATATA/GAT/CG/TGTAGGAA/GACCAGAATATGCCACCCCAAAATATGT/CCC/TCTTTGT/GCT/ATAAGA/GATTATTC/TC/TA/GAGCTGATTATTTTGAA/GAAAA/CTA/GA/CAT/GG/AC-TA-ACAA/GA/GG/AGAAGT/CTCTGAAAACAGAGTAGAAGTTACCCTTG/TTGTAAGGA/GAAATTTACATCTATAAAGGAAATCC/TCCATTTA/G+T+AAA/GGC/GTA/GC/TCT+CC+CTCTCTA/GC/TACCAA/GGAAGAGAAGGATA/GA+CT+CTAAATCACTAA/GAGAG/CTCTT +chr155285686248381642354424.566.810.65+L1MC3_3endLINEL181296815645TAATA/GGTGG-G-ATAT/CC/ATGACACA/TAC/TGCATTTA/GTCAAG/AAT/CA/CCAC/TAGAAT/CTTTAT/CG/AGC+A+CAAA-T-GG/AGTA/GAAT/CCA/TA/TAT/ATC/GTATT/GCAAATTA/TA/TAC/AAAAATT/CAC/TTC/TAGGAT/GGT+C+GGC/GGT/GATCCCAGGAC/TA/GGAATGCAT/GC/AA/NTGTGA+C+AAAAG/NAATT/CTA-T-G/ACTA/GC/TAA/T-A-T +chr156866131248381197244214.805.841.29+MSTA1LTRERVL-MaLR46507TA/GCTATGGTTTGGATGT-GGT-TTGTCCCCGCA/CAAAACTCATGTTGAAATTTGAC/TCCCCAATGTGGCAGTGTG/TGGG/A-C-GGTGGGGCCTAGTGGA/GT/AGGTGTTTGGGTCATGGGGA/GT/CGGATCCCTCATGAATAGATTAATGT/CCCTCC+CTCGNG+A/GTGGGG/NGTGAGTGAGTA/TCT+C+GCTCT+NN+CA/GT/CA/GGGAATGGATTAA/GTTCCT/CGCA/GG/AGAGT/CA/GGGTA/TA/GTTAAAAAGAGTCTGGC+GNC+TT/CCCTT/CG/CG/TCT+CTC+TCC/TCTT+GC+TTGCTTT/CCA/TCTT/CTT/CGCT/CATGTGATCTCTG-G-T/CG/ACACC/GCCT/C-T-GCTCCCCTTCC+NCTTC+GCTTTCCA/GCCATGAGG/TT/NGAAA/GA/CAGA/CCTGAA/GGCCC+T+CACCAGATGCAA/GCTGCCCA/GA/NT/ACT/CC/NG/TGA/CC/TA/TTTC+GNC+CAGCT/CACCAGT/AATT/CGTGAGCCAAATG/AAAT/CCTT/CTTTTA/CC/TTTATAAATTACCCAGCCTCAGGTATTCT/CGTTAC/TAGA/CAG/ACACAAG/AAT/CGGACTAAGACA +chr161317132248380196354424.566.810.65+L1MC3_3endLINEL196920414915CAAATGTAG/TGT/AAAA/CAAC+C+TCACTGAAGGT/GGG+TG+A/GGGGAAAAT/AGGTGT/CTGACCTAAGTC/AACTTTGA/GAAATGAA/GTA/GGAA/GTCTG+T+G/AAGG/ACTG/AAAGGCAC/AA+A+T/GGAACTA/GTACT/ATC/AAT/GA/CAT/CTGG/TAT/CTA/CC/TAT/GTTT/GATAAAGTTA/GTTTCCA/CACA/GGA/GA/GGC/TAA/CC/GT/GGTG/TAACAATTG/CTA/GAA/NACCA/GCA/TG/ATG/AT/CC/ATGTAT+A+CTGGAG/ATA/TA/GAACAATG/TAC/AT/GTAC/AATA/GA/GG/ATC/GGCA/GGATGGTGGGAA/GCCAGC/GTTTCTCACTGTTGA/GAGTGGGAGG/NTTACAA/GATT/AAGCAAGA/GC/GGAGA/GAGGCTAGAATGATT/CCC/ATGTGA/GTAG/ATA/GGATC/TAGAGG/TTGGAGACATCAA/GC/TG/ATA/GAACTT/CATGC/TTTAGT/CTTAATATAGATACAC/GAC/TA/GGTTC/AT/CAC/TATAGAAAA/TC/ATTTATAA/GT/ATAG/TGTGTG/ATG/ATAG/CG/AT/CA/GGGTTAG+T+AC/TACACACATATAC/TTTCCTA/TGCA/TT/CTGC/TT/CAA/GT/C+TGA+GAGGGA/CCA/TAGAT/AA/GCAAT/C+GACACCCCAGTAGCAACGA+GT/CG/ACAT/CT/CC/TAGCA/GG/CCCAC/GATG/CTA/TA/GGTTT+C+TC/AC/AC/TACCATTC+TCCAA+TG/AAAAGGAAT/CCA+G+GGCTCT/CTTGA/GAGAAATGT/GCTGATA/TCTAGA/GACTGGGA/GCAGT/GAAATAT+ACAAG+AG/TGAGCCA/TGGAT/GA/CATCTG/TGA/TAGTA/GT/CCAGAAAGT+AAGG+AAGTA/GCT+C+AAAAAAAT/CT/CA/CAA/CA+ATGATGGGGG+TATA/GTCAAAC/GA/GA/GAA/CAT/CAA/GA/GAGCCAAT/CA/TA/GAAAC/GAGCTA/CCCG/AATGGCCAAC/AA/GCA/TGGAAG/CG/AAA/TTTGT/AGCAACAT/AAAT+A+G/AC/ATA+AAG+TAGTG/ATC/TGA/GATA/TATAA+C+CT/CAAAGC/TT/ATAAAG/ATAAT/ATATCT/CAG/TGT/AGTCT/CG/ATAT/CTT/GG/ATATAC/AC/ATAG/AG/ATG+ATTGAATA+AATAAG/AC/TAAATGGA/GGT/GT/AGC/AAT/GAG+AC+AAATCTCCT/CT/GTGCAA/GAAGAATTCCAAATAAC/TTG/TATGTAGAC/TACTCA/CGCCA/CTCAAGA/GAGGTGGAGC+A+C/TAACTCCT/CCACTCCG/TTAAGTGTGGGCTC/GT/CGCATAGTGACTTG/CCTC/TCA/CAAAGA-ACAC-A/GTG/ACAGTATGGAC/AAA/GGGA/GGGAAAAA+AGAG+TAACTTC/TACAGTGGAGAAAT/CCTGACAAACAG/CTAG/CCTCT/AGCCAA/GA/GTGATCC/AAA/GGTG/CAACAC/TCAAA/CG/AC/GTGAC/TAG/AT/GTCAC/TC/GTTGAG/TAA/GC/TATG +chr171417533248379795121520.504.597.89+L1MC3_3endLINEL12150252935TGG/AGGGACATTCTACAAAAA/TT/ACCTGACCAA/GTC/ACTCCTCAG/AT/AG/ACTA/GTG/CAAGGTCATCAT/AG/AAG/A+C+AT/AGGAAAGC/TCTA/GAC/GAC/AACTGTCACAGCCAG/AGAA/GGAGCCTAT/AG+GAGACA+TGAT/CGT/ACTAC/AATGTC/AG/ATGC/TGGG/TATCCTGGATGGGATCCTGGG/AT/ACAGAG/AT/AAAGA/G+ACAT+TAG+GTAAA+AACTAAGGG/AAATCC/TA/GAATG/AAAA/GTATGA/GACTTTAGTTAATAAC/TAG/ATC/GTATCAG/ATATTGGTTCATTAAC/TTGTGG/ACAA-ATT-ATGTA-AGATATTAATAAG-CCAT-GTGAGACAC-ACTG/AATA/GG/TAAGATGTTAATAAG/TAGA/GGGAAACTA/GGGT+G+TG-C-GGC/GTAC/TATGGGAAA/CTCTCTG-CTTT-TT/AT/CTT/ATT/CTTG/CA/GCG/AATTTC/TTG/CTGTAAG/ATA/CA/TAAAAA/CA/TG/AA/TC/TG/CTAAAATAAAAC/A+G+TTTATTTA/TA/NAA + +That is matched with + +$ bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.bb stdout +chr107536L1MC3#LINE/L1223+46637533094912,600,518,158,0,1001,108,392,3-1,4663,-1,5528,-1,6131,-1,7141,-151304 19.6 8.0 2.0 chr1 4664 5263 (248382065) + L1MC3 LINE/L1 4913 5546 (2239) 5 ,3544 24.6 6.8 0.7 chr1 5529 5686 (248381642) + L1MC3 LINE/L1 6065 6221 (1564) 5 ,3544 24.6 6.8 0.7 chr1 6132 7132 (248380196) + L1MC3 LINE/L1 6222 7294 (491) 5 ,1215 20.5 4.6 7.9 chr1 7142 7533 (248379795) + L1MC3 LINE/L1 7403 7782 (3) 5 +chr140824796LTR60B#LTR/ERV1273-40824533030,451,263-1,0,-1962 27.3 12.4 1.0 chr1 4083 4533 (248382795) C LTR60B LTR/ERV1 (0) 765 264 3 +chr140824837LTR60B#LTR/ERV1205-4533466003451,127,177-1,451,-505 20.5 7.9 0.0 chr1 4534 4660 (248382668) C LTR60B LTR/ERV1 (451) 314 178 4 +chr152675850MER34C_v#LTR/ERV1147+52745528036,254,322-1,7,-161403 14.7 2.0 0.8 chr1 5275 5528 (248381800) + MER34C_v LTR/ERV1 7 263 (322) 6 +chr156856131MSTA1#LTR/ERVL-MaLR148+56866131030,445,0-1,1,-12442 14.8 5.8 1.3 chr1 5687 6131 (248381197) + MSTA1 LTR/ERVL-MaLR 1 465 (0) 7 + +End of excerpt from 2 bigBed files in T2T that could be potential input in future examples (could be colors are wrong in this second file). + <h3 id="example2">Example #2</h2> <p> In this example, you will create a bigRmsk file from an existing bigRmsk input file, <em>bigRmsk.txt</em>, located on the UCSC Genome Browser http server.</p> <ol> <li> Save the bed3+1 example file, <a href="examples/bigRmsk.txt"><em>bigRmsk.txt</em></a>, to your computer (<em>Step 6</em>, above).</li> <li> Save the autoSql file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a> to your computer (<em>Step 3</em>, above).</li> <li> Download the <a href="http://hgdownload.soe.ucsc.edu/admin/exe/"><code>bedToBigBed</code> utility</a> (<em>Step 4</em>, above).</li> <li> Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. This file contains the chrom.sizes for the human (hg38) assembly (<em>Step 5</em>, above).</li> <li> Run the <code>bedToBigBed</code> utility to create a binary indexed MAF file (<em>Step 6</em>, above): <pre><code>bedToBigBed -type=bed3+1 -tab -as=bigRmsk.as bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre></li> <li> Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible location (<em>Step 7</em>, above).</li> <li> Construct a track line that points to the bigRmsk file (<em>Step 8</em>, above).</li> <li> Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser (<em>step 9</em>, above).</li> </ol> --> <h2 id="share">Sharing your data with others</h2> <p> If you would like to share your bigRmsk data track with a colleague, learn how to create a URL by looking at Example 6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p> <h2 id="extract">Extracting data from the bigRmsk format</h2> <p> Because bigRmsk files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p> <ul> <li> <code>bigBedToBed</code> — converts a bigBed file to ASCII BED format.</li> <li> <code>bigBedSummary</code> — extracts summary information from a bigBed file.</li> <li> <code>bigBedInfo</code> — prints out information about a bigBed file.</li> </ul> <p> As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.</p> <h2 id="trouble">Troubleshooting</h2> <p> If you encounter an error when you run the <code>bedToBigBed</code> program, check your input file for data coordinates that extend past the the end of the chromosome. If these are present, run the <code>bedClip</code> program (<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic row(s) in your input file before running the <code>bedToBigBed</code> program.</p> <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->