909449391962ec9312052bf186158d9f5162fcc4
brianlee
  Fri May 27 07:27:01 2022 -0700
Not for Code Review, see ticket #29356, check-in of temporary examples not fully working for bigRmsk -on T2T CHM13 and align.bb not working yet

diff --git src/hg/htdocs/goldenPath/help/bigRmsk.html src/hg/htdocs/goldenPath/help/bigRmsk.html
index 7e0361f..e33d738 100755
--- src/hg/htdocs/goldenPath/help/bigRmsk.html
+++ src/hg/htdocs/goldenPath/help/bigRmsk.html
@@ -1,282 +1,307 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigRmsk RepeatMasker Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigRmsk Track Format</h1>
 
 <h3>This page is under development and is not ready for public use.</h3>
 <p>
 The bigRmsk format allows for the display of annotations of a genome generated by the
 <a href="http://www.repeatmasker.org" "target=_blank">RepeatMasker</a>
 program that screens DNA sequences for interspersed repeats and low complexity DNA sequences.
 The output of RepeatMasker is a detailed annotation of the repeats that are present in
 the &quot;query&quot; sequence as well as a modified version of this query sequence
 in which all the annotated repeats have been masked, where the default replaces
 the discovered repeats by Ns. The bigRmsk format enables taking the annotation output
 of RepeatMasker and converting it into a compressed and indexed version of a
 <a href="/goldenPath/help/bigBed.html">bigBed</a> file, where the results when
 identified as <code>type bigRmsk</code> in a Track Hub can be visualized as described
 <a href="#linkToVisualizationSECTION_TOCOME">below</a>.</p>
 <p>
 The bigRmsk files are created using the program <code>bedToBigBed</code>. It must be run with the 
 <code>-as</code> option to pull in a special <a href="http://www.linuxjournal.com/article/5949" 
 target="_blank">autoSql</a> (<em>.as</em>) file, <code>bigRmskBed.as</code> that defines the fields
 of bigRmsk. Along side the bigRmsk file, an auxilary data bigBed can be made, with its own .as
 definitions file (<code>bigRmskAlignBed.as</code>) and referenced with a special <code>xrefDataUrl</code>
 setting, whereas the bigRmsk file location is named with the standard <code>bigDataUrl</code> setting.</p>
 <p>
 The bigRmsk files are in an indexed binary format. The main advantage of this format is that only 
 those portions of the file needed to display a particular region are transferred to the Genome 
 Browser server. Because of this, bigRmsk files have considerably faster display performance than
 if they were stored in a text-based format. The bigRmsk file remains on your local 
 web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for 
 the currently displayed chromosomal position is locally cached as a &quot;sparse file&quot;. If you
 do not have access to a web-accessible server and need hosting space for your bigRmsk files, please
 see the <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help
 documentation.</p>
 
 <h2 id="bigRmsk">bigRmsk file definitions</h2>
 <p>
 The following autoSql definition is used to specify the main bigRmsk files. This
 definition, contained in the file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a>, is 
 pulled in when the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmsk.as</code> 
 option. </p>
 <h6>bigRmsk.as</h6>
 <pre><code>table bigRmskBed
 "Repetitive Element Annotation" 
     (
     string  chrom;        "Reference sequence chromosome or scaffold" 
     uint    chromStart;    "Start position of visualization on chromosome" 
     uint    chromEnd;    "End position of visualation on chromosome" 
     string  name;        "Name repeat, including the type/subtype suffix" 
     uint    score;        "Divergence score" 
     char[1] strand;        "+ or - for strand" 
     uint    thickStart;    "Start position of aligned sequence on chromosome" 
     uint    thickEnd;    "End position of aligned sequence on chromosome" 
     uint      reserved;    "Reserved" 
     uint    blockCount;    "Count of sequence blocks" 
     lstring blockSizes;     "A comma-separated list of the block sizes(+/-)" 
     lstring blockStarts;    "A comma-separated list of the block starts(+/-)" 
     uint    id;             "A unique identifier for the joined annotations in this record" 
     lstring description;    "A comma separated list of technical annotation descriptions"
     )</code></pre>
 <p>An example: <code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt
 hg38.chrom.sizes bigRmsk.bb</code>.</p>
 
 <h3 id="supporting">Supporting bigRmskAlign.bb auxilary data</h3>
 <p>
 Alongside the bigRmsk file, a supporting bigBed can provide alignment data. The following autoSql
 definition is used to create this supporting file, pointed to online with <code>xrefDataUrl</code>,
 rather than the standard <code>bigDataUrl</code> used with bigRmsk. The file
 <a href="examples/bigRmskAlignBed.as"><em>bigRmskAlignBed.as</em></a>, is pulled in when
 the <code>bedToBigBed</code> utility is run with the <code>-as=bigRmskAlignBed.as</code>
 option.</p>
 <h6>bigRmskAlignedBed.as</h6>
 <pre><code>table bigRmskAlignBed
 "Repetitive Element Alignment Auxilary Data" 
     (
     string  chrom;        "Reference sequence chromosome or scaffold" 
     uint    chromStart;    "Start position of alignment on chromosome" 
     uint    chromEnd;    "End position of alignment on chromosome" 
     uint    chromRemain;    "Remaining bp in the chromosome or scaffold" 
     float   score;          "alignment score (sw, bits or evalue)" 
     float   percSubst;      "Base substitution percentage" 
     float   percDel;        "Base deletion percentage" 
     float   percIns;        "Bases insertion percentage" 
     char[1] strand;         "Strand - either + or -" 
     string  repName;        "Name of repeat" 
     string  repType;        "Type of repeat" 
     string  repSubtype;     "Subtype of repeat" 
     uint    repStart;       "Start in repeat sequence" 
     uint    repEnd;         "End in repeat sequence" 
     uint    repRemain;      "Remaining unaligned bp in the repeat sequence" 
     uint    id;             "The ID of the hit. Used to link related fragments" 
     lstring calignData;     "The alignment data stored as a single string" 
     )</code></pre>
 <p>An example: <code>bedToBigBed -tab -as=bigRmskAlignBed.as -type=bed3+14
 bigRmskAlign.tsv.sorted.txt hg38.chrom.sizes bigRmskAlign.bb
 </code>.CHECK - ISSUE IS xrefDataUrl doesn't work on this data yet.</p>
 </p>
 <p>
 Note that the <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately 
 25% more RAM than the uncompressed BED input file.</p>
 
 <h2 id="steps">Creating a bigRmsk track</h2>
 <p>
 To create a bigRmsk track, and its supporting file, follow the below steps. All input
 files into <code>bedToBigBed</code> must be sorted on the coordinates of the first two columns,
 <code>sort -k1,1 -k2,2n input.tsv.txt >  input.tsv.sorted.txt</code>. To learn about a perl
 program that can build the tab-separated values (tsv) input bedToBigBed text files from the
 RepeatMasker output files, contact Robert Hubley: <a href="https://github.com/rmhubley"
 target="_blank">https://github.com/rmhubley</a>.</p> 
 <p>
 <strong>Step 1.</strong> 
 If you already have an input file you would like to convert to a bigRmsk, skip to <em>Step 3</em>.
 Otherwise, download <a href="examples/bigRmsk.txt">this example bigRmsk.txt
 file</a> for the human GRCh38 (hg38) assembly.</p>
 <p>
 <strong>Step 2.</strong> 
 If you would like to include the optional auxilary alignment data <code>bigRmskAlign.bb</code> file,
 download this <a href="examples/bigRmskAlign.txt">bigRmskAlign.txt file</a>.</p>
 <p>
 <strong>Step 3.</strong> 
 Download the autoSql file <em><a href="examples/bigRmsk.as">bigRmsk.as</a></em> needed by 
 <code>bedToBigBed</code>. If you have opted to include the optional auxilary alignment data file,
 bigRmskAlign.bb, with your bigRmsk file, you must also download the autoSql file
 <a href="examples/bigRmskAlignBed.as">bigRmskAlignBed.as</a>.</p>
 <p>
 Here are wget commands to obtain the above files and the hg38.chrom.sizes file mentioned below:
 <pre><code>wget https://genome.ucsc.edu/goldenPath/help/examples/
 wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.txt
 wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.txt
 wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.as
 wget https://genome.ucsc.edu/goldenPath/help/examples/bigRmskAlign.as
 wget http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes
 </code></pre>
 <p>
 <strong>Step 4.</strong> 
 Download the <code>bedToBigBed</code> program from the UCSC
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <p>
 <strong>Step 5.</strong> 
 Download the  <em>chrom.sizes</em> file for any assembly hosted at UCSC from our 
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on &quot;Full 
 data set&quot; for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 
 database is located at 
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" 
 target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p>
 <pre>
 <code>bedToBigBed -tab -as=bigRmsk.as -type=bed9+5 bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre>
 <p>
 <strong>Step 6.</strong> 
 Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible http, https or ftp
 location. If you generated the <em>bigRmskAlign.bb</em> files move those to a web accessible
 location, likely same location as the <em>bigRmsk.bb</em> file.</p> 
 <p>
 <strong>Step 7.</strong> 
 Construct a <a href="hgTracksHelp.html#CustomTracks?db=hg38">custom track</a> using a single 
 <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed 
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic
 version of the track line will look something like this:</p>
 <pre>track type=bigRmsk name="My bigRmsk" description="A RepeatMasker Track" bigDataUrl=http://myorg.edu/mylab/bigRmsk.bb</pre>
 <p>
 <strong>Step 8.</strong> 
 Paste the custom track line into the text box on the <a href="../../cgi-bin/hgCustom?db=hg38">custom
 track management page</a>. Navigate to chr1:1-21,571 to see the example data for this track.</p>
 <p>
 The <code>bedToBigBed</code> program can be run with several additional options. For a full
 list of the available options, type <code>bedToBigBed</code> (with no arguments) on the command line
 to display the usage message. </p>
 
 <h2 id="examples">Examples</h2>
 
 <h3 id="example1">Example #1</h3>
 <p>
 In this example, you will create a bigRmsk custom track using an existing bigRmsk file,
 <em>bigRmsk.bb</em>, located on the UCSC Genome Browser http server. This file contains data for 
 the hg38 assembly.</p>
 <p>
 To create a custom track using this bigRmsk file: 
 <ol>
   <li>
   Construct a track line that references the file:</p>
   <pre><code>track type=bigRmsk name=&quot;bigRmsk Example One&quot; description=&quot;A bigRmsk file&quot; visibility=full bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb</code></pre></li>
   <li>
   Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management 
   page</a> for the human assembly hg38 (Dec. 2013).</li> 
   <li>
   Click the &quot;submit&quot; button.</li>
   <li>
   Navigate to <code>chr1:1-21,571</code> to see the track.
 </ol>
 <p>
 Custom tracks can also be loaded via one URL line. 
 <a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full"
 target="_blank">This link</a> loads the same <em>bigRmsk.bb</em> track and sets additional display 
 parameters in the URL:</p>
 <pre><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:1-21,571&hgct_customText=track%20type=bigRmsk%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb%20visibility=full</code></pre>
 <p>
 After this example bigRmsk is loaded in the Genome Browser, click into an item on the browser's 
 track display. Note that the details page display lacks information about the individual alignments, 
 as this example does not include the optional supporting alignment file.</p>
 <p>
 This example can also be loaded in a Track Hub with a stanza such as the following:</p>
 <pre>
 track ExBigRmsk
 shortLabel Example bigRmsk
 longLabel This is an example Track Hub Stanza
 type bigRmsk
 visibility full
 bigDataUrl http://genome.ucsc.edu/goldenPath/help/examples/bigRmsk.bb
 </pre>
 NOTE: FOR WHEN REDOING PAGE, only Track Hubs now allow clicking into hgc. 
 
 <!---
 NOTE: The below is innaccurate and just a holder for when <b>xrefDataUrl works</b> to give an example building it.
+
+Adding potential input file (this is from RobertH T2T hub), both the align.bb and bigRmsk.bb for a region are stashed below (not for hg38 though).
+
+$  bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.align.bb stdout
+chr14082453324838279596227.3212.421.00-LTR60BLTRERV126476503TA/GTTACT/CGGGG/AAGG/TGCT/GGA/GT/AG/ATCC+T+CA/GGTTCTT+A+GTT/CTA/TACTTGGA/GAGAAAGAT/ATTT/CC/GA/GCCAAGAGG/ACAG/ATA/TC/TAA/GA/CG/CATG/AG/AC/AAGAT/GAAC/TTT-C-ATTGAAA/GA/GG/AAAAC/TAC/GAGT/AGT/CAA/GAGAGC/TTTATT+TAAAGAGACA+GTA+CACTCT+GAAAA/GATA/GG/AGGA/CG/AGAGT/CGGGCTG+CTGAAAG+AGC/AGTGC/AA/GT/CT/CAA/G+C+AA/GCAGCCT/C+C+A/GAGAGTC/TCTGT/CT/GC/TA/TGGA/GA/GA/TTTTTATT/N+ATG+TG/CGGACTTC/TTTC+TTG+AC/AA/GTTCCT/CGCCTCTGTCTC/TAAG-T-CTCCA/GCCTG/TTTTTCTTTGTCTG/AG+T+TTTTC/TCT+TAA+GC/TT/CA/CCT/CGCCTT-AG-C/NTCCCCGA/CCT+AG+TG/TCCCC/GA/CCT/CT/CAGGCTTGTGGGACC+CT+T/CCCTC/TACTGTG/CG/AGTTGA/GG/TGT/CA/GCATGT/CG+CGGGCC+T/CGGTGA/TTC/GA/GATACGAATC/TCA/TC/AT/CCTG/AG/AC/TA/GC/GCA/NGC+GTTG+CTCC/ATTC/ACCGCCAT/CCCCAGGC/AAC/GGC/TT/CG+T+AC/TAGCGA/GTCAC/AG/ATT/CTGTACC/TTAC/TTGT/CGCCTGC+GTAT+CTCTTT/AT/GGAAT-G-TC/TCTT/CCTC/TTGCCCT
+chr14533466024838266850520.477.870.00-LTR60BLTRERV11783144514AATCTGTACTTATG/TGG/CGCCA/TG+C+GTT/ATCTCTTAA/GGAATG/TTCC/TCT/CTTTG+CCCTCTT+G/TCCTT/CCTTAC/TCAA/GCATGTAGCTAGCA/TAT/CATTCTGACAT/GT/GTTT/AAT/CTGCAGAGG/TGAA/GT/CGATTG/A+CT+GGGCA/GTCTTC/AAGA/GGGA/CGTTC
+chr146635139248382189130422.814.201.43+L1MC_orf2LINEL12804329225GGTGA/CG/TGGAAC-GATT-AAT/CTGGAA/CA+T+CCAT/CAA/TGA/CAAT/AG/AAT/AATGC/AATA/CTAGAT/CG/AA/CAA/GACT/CTTACAA/CCT/CC/TA/TCACAAC/AT/AAA/TTC/AACTCAAAAT+GGAT+CATC/A+GA+CT/CTAC/AAC/TT/GA/TAAAAT/CGCT/AAAACTATAC/AAAT/CTT/CCTAGAAGA+T+AACA-A-TAGA/GAGAAAAG/TCTAT/GG/ATGC/ACT/CTTGGGTTTGGT/CA/GATGAA/CTTTTA/TAC/GAA/TAT/CG/AAT/CACA/CAAAGGT/C+A+T/CGAT+CC+ATA/GC/AAC/AA/GAAAG/NAAA/TTGAC/TAT/AT/GG/CTGGT/AT/CTTC+A+TTAAT/AATTT/AAAAG/AT/CTTA/CTA/GCTCTG+CG+G/AAAGACAC-CT-TGTT/CAAGAGAAC/TA/GAAAAGACAAGCCACAT/GAT/CTGA/G+G+AGAAAATATTTGCAAAAT/GACAC/TATCTGAG/TAAAGA/GAT/CTT/GG/TTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCA/GAC/TT+N+AAAAATGC/GA/GCAC/AAC/A+G+AT/CCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCA/NACAT
+chr14997526324838206588715.4612.782.74+L1MC3_3endLINEL129322395TTGTC/ATT/CCAAAATATAT/CAAAA/GAAA/CTA/CTTAAAACTA/CAACAATAAGT/A+AAA+T/CAAACAG/ACCCAAC/TTAAAAA+A+TGC/GA/GCAC/AAC/A+G+ATCTGAACAGACACCTCACCAAAGAAGATC/ATACAGATGGCAAG/ATAAA/GCATAC/TA/GAAAAGATGCTCAACAT+CATTTGTC+AC/TTAGA/GGAAC/TTG+CAAATT+AAAACC/AACAATGAGATAG/CCAC-AGCTGG-TC/AT/CAT/CAT/CCTC/ATTAGAAC/TT/GGCTAAAC/ATCC-CT-AAAAAA+C+TGACA+ATACC+AAT/NTGCTG+GCGAGGAT+GA/CGGAA/GA/CAACAA/GGAACTCTT/C+A+TTCATTGCC/TGGTGGA/GA
+chr152745528248381800140314.681.970.78+MER34C_vLTRERV12633226AGA/NCCAA/GAATATGCCACCCCAAAATATA/GAT/CG/TGTAGGAA/GACCAGAATATGCCACCCCAAAATATGT/CCC/TCTTTGT/GCT/ATAAGA/GATTATTC/TC/TA/GAGCTGATTATTTTGAA/GAAAA/CTA/GA/CAT/GG/AC-TA-ACAA/GA/GG/AGAAGT/CTCTGAAAACAGAGTAGAAGTTACCCTTG/TTGTAAGGA/GAAATTTACATCTATAAAGGAAATCC/TCCATTTA/G+T+AAA/GGC/GTA/GC/TCT+CC+CTCTCTA/GC/TACCAA/GGAAGAGAAGGATA/GA+CT+CTAAATCACTAA/GAGAG/CTCTT
+chr155285686248381642354424.566.810.65+L1MC3_3endLINEL181296815645TAATA/GGTGG-G-ATAT/CC/ATGACACA/TAC/TGCATTTA/GTCAAG/AAT/CA/CCAC/TAGAAT/CTTTAT/CG/AGC+A+CAAA-T-GG/AGTA/GAAT/CCA/TA/TAT/ATC/GTATT/GCAAATTA/TA/TAC/AAAAATT/CAC/TTC/TAGGAT/GGT+C+GGC/GGT/GATCCCAGGAC/TA/GGAATGCAT/GC/AA/NTGTGA+C+AAAAG/NAATT/CTA-T-G/ACTA/GC/TAA/T-A-T
+chr156866131248381197244214.805.841.29+MSTA1LTRERVL-MaLR46507TA/GCTATGGTTTGGATGT-GGT-TTGTCCCCGCA/CAAAACTCATGTTGAAATTTGAC/TCCCCAATGTGGCAGTGTG/TGGG/A-C-GGTGGGGCCTAGTGGA/GT/AGGTGTTTGGGTCATGGGGA/GT/CGGATCCCTCATGAATAGATTAATGT/CCCTCC+CTCGNG+A/GTGGGG/NGTGAGTGAGTA/TCT+C+GCTCT+NN+CA/GT/CA/GGGAATGGATTAA/GTTCCT/CGCA/GG/AGAGT/CA/GGGTA/TA/GTTAAAAAGAGTCTGGC+GNC+TT/CCCTT/CG/CG/TCT+CTC+TCC/TCTT+GC+TTGCTTT/CCA/TCTT/CTT/CGCT/CATGTGATCTCTG-G-T/CG/ACACC/GCCT/C-T-GCTCCCCTTCC+NCTTC+GCTTTCCA/GCCATGAGG/TT/NGAAA/GA/CAGA/CCTGAA/GGCCC+T+CACCAGATGCAA/GCTGCCCA/GA/NT/ACT/CC/NG/TGA/CC/TA/TTTC+GNC+CAGCT/CACCAGT/AATT/CGTGAGCCAAATG/AAAT/CCTT/CTTTTA/CC/TTTATAAATTACCCAGCCTCAGGTATTCT/CGTTAC/TAGA/CAG/ACACAAG/AAT/CGGACTAAGACA
+chr161317132248380196354424.566.810.65+L1MC3_3endLINEL196920414915CAAATGTAG/TGT/AAAA/CAAC+C+TCACTGAAGGT/GGG+TG+A/GGGGAAAAT/AGGTGT/CTGACCTAAGTC/AACTTTGA/GAAATGAA/GTA/GGAA/GTCTG+T+G/AAGG/ACTG/AAAGGCAC/AA+A+T/GGAACTA/GTACT/ATC/AAT/GA/CAT/CTGG/TAT/CTA/CC/TAT/GTTT/GATAAAGTTA/GTTTCCA/CACA/GGA/GA/GGC/TAA/CC/GT/GGTG/TAACAATTG/CTA/GAA/NACCA/GCA/TG/ATG/AT/CC/ATGTAT+A+CTGGAG/ATA/TA/GAACAATG/TAC/AT/GTAC/AATA/GA/GG/ATC/GGCA/GGATGGTGGGAA/GCCAGC/GTTTCTCACTGTTGA/GAGTGGGAGG/NTTACAA/GATT/AAGCAAGA/GC/GGAGA/GAGGCTAGAATGATT/CCC/ATGTGA/GTAG/ATA/GGATC/TAGAGG/TTGGAGACATCAA/GC/TG/ATA/GAACTT/CATGC/TTTAGT/CTTAATATAGATACAC/GAC/TA/GGTTC/AT/CAC/TATAGAAAA/TC/ATTTATAA/GT/ATAG/TGTGTG/ATG/ATAG/CG/AT/CA/GGGTTAG+T+AC/TACACACATATAC/TTTCCTA/TGCA/TT/CTGC/TT/CAA/GT/C+TGA+GAGGGA/CCA/TAGAT/AA/GCAAT/C+GACACCCCAGTAGCAACGA+GT/CG/ACAT/CT/CC/TAGCA/GG/CCCAC/GATG/CTA/TA/GGTTT+C+TC/AC/AC/TACCATTC+TCCAA+TG/AAAAGGAAT/CCA+G+GGCTCT/CTTGA/GAGAAATGT/GCTGATA/TCTAGA/GACTGGGA/GCAGT/GAAATAT+ACAAG+AG/TGAGCCA/TGGAT/GA/CATCTG/TGA/TAGTA/GT/CCAGAAAGT+AAGG+AAGTA/GCT+C+AAAAAAAT/CT/CA/CAA/CA+ATGATGGGGG+TATA/GTCAAAC/GA/GA/GAA/CAT/CAA/GA/GAGCCAAT/CA/TA/GAAAC/GAGCTA/CCCG/AATGGCCAAC/AA/GCA/TGGAAG/CG/AAA/TTTGT/AGCAACAT/AAAT+A+G/AC/ATA+AAG+TAGTG/ATC/TGA/GATA/TATAA+C+CT/CAAAGC/TT/ATAAAG/ATAAT/ATATCT/CAG/TGT/AGTCT/CG/ATAT/CTT/GG/ATATAC/AC/ATAG/AG/ATG+ATTGAATA+AATAAG/AC/TAAATGGA/GGT/GT/AGC/AAT/GAG+AC+AAATCTCCT/CT/GTGCAA/GAAGAATTCCAAATAAC/TTG/TATGTAGAC/TACTCA/CGCCA/CTCAAGA/GAGGTGGAGC+A+C/TAACTCCT/CCACTCCG/TTAAGTGTGGGCTC/GT/CGCATAGTGACTTG/CCTC/TCA/CAAAGA-ACAC-A/GTG/ACAGTATGGAC/AAA/GGGA/GGGAAAAA+AGAG+TAACTTC/TACAGTGGAGAAAT/CCTGACAAACAG/CTAG/CCTCT/AGCCAA/GA/GTGATCC/AAA/GGTG/CAACAC/TCAAA/CG/AC/GTGAC/TAG/AT/GTCAC/TC/GTTGAG/TAA/GC/TATG
+chr171417533248379795121520.504.597.89+L1MC3_3endLINEL12150252935TGG/AGGGACATTCTACAAAAA/TT/ACCTGACCAA/GTC/ACTCCTCAG/AT/AG/ACTA/GTG/CAAGGTCATCAT/AG/AAG/A+C+AT/AGGAAAGC/TCTA/GAC/GAC/AACTGTCACAGCCAG/AGAA/GGAGCCTAT/AG+GAGACA+TGAT/CGT/ACTAC/AATGTC/AG/ATGC/TGGG/TATCCTGGATGGGATCCTGGG/AT/ACAGAG/AT/AAAGA/G+ACAT+TAG+GTAAA+AACTAAGGG/AAATCC/TA/GAATG/AAAA/GTATGA/GACTTTAGTTAATAAC/TAG/ATC/GTATCAG/ATATTGGTTCATTAAC/TTGTGG/ACAA-ATT-ATGTA-AGATATTAATAAG-CCAT-GTGAGACAC-ACTG/AATA/GG/TAAGATGTTAATAAG/TAGA/GGGAAACTA/GGGT+G+TG-C-GGC/GTAC/TATGGGAAA/CTCTCTG-CTTT-TT/AT/CTT/ATT/CTTG/CA/GCG/AATTTC/TTG/CTGTAAG/ATA/CA/TAAAAA/CA/TG/AA/TC/TG/CTAAAATAAAAC/A+G+TTTATTTA/TA/NAA
+
+That is matched with
+
+$ bigBedToBed -chrom=chr1 -start=4513 -end=7608 https://hgdownload.soe.ucsc.edu/hubs/GCA/009/914/755/GCA_009914755.4/bbi/GCA_009914755.4_T2T-CHM13v2.0.t2tRepeatMasker/chm13v2.0_rmsk.bb stdout
+chr107536L1MC3#LINE/L1223+46637533094912,600,518,158,0,1001,108,392,3-1,4663,-1,5528,-1,6131,-1,7141,-151304 19.6 8.0 2.0 chr1 4664 5263 (248382065) + L1MC3 LINE/L1 4913 5546 (2239) 5 ,3544 24.6 6.8 0.7 chr1 5529 5686 (248381642) + L1MC3 LINE/L1 6065 6221 (1564) 5 ,3544 24.6 6.8 0.7 chr1 6132 7132 (248380196) + L1MC3 LINE/L1 6222 7294 (491) 5 ,1215 20.5 4.6 7.9 chr1 7142 7533 (248379795) + L1MC3 LINE/L1 7403 7782 (3) 5 
+chr140824796LTR60B#LTR/ERV1273-40824533030,451,263-1,0,-1962 27.3 12.4 1.0 chr1 4083 4533 (248382795) C LTR60B LTR/ERV1 (0) 765 264 3 
+chr140824837LTR60B#LTR/ERV1205-4533466003451,127,177-1,451,-505 20.5 7.9 0.0 chr1 4534 4660 (248382668) C LTR60B LTR/ERV1 (451) 314 178 4 
+chr152675850MER34C_v#LTR/ERV1147+52745528036,254,322-1,7,-161403 14.7 2.0 0.8 chr1 5275 5528 (248381800) + MER34C_v LTR/ERV1 7 263 (322) 6 
+chr156856131MSTA1#LTR/ERVL-MaLR148+56866131030,445,0-1,1,-12442 14.8 5.8 1.3 chr1 5687 6131 (248381197) + MSTA1 LTR/ERVL-MaLR 1 465 (0) 7 
+
+End of excerpt from 2 bigBed files in T2T that could be potential input in future examples (could be colors are wrong in this second file).
+
 <h3 id="example2">Example #2</h2>
 <p>
 In this example, you will create a bigRmsk file from an existing bigRmsk input file, 
 <em>bigRmsk.txt</em>, located on the UCSC Genome Browser http server.</p>
 <ol>
   <li>
   Save the bed3+1 example file, <a href="examples/bigRmsk.txt"><em>bigRmsk.txt</em></a>, to your 
   computer (<em>Step 6</em>, above).</li>
   <li>
   Save the autoSql file <a href="examples/bigRmsk.as"><em>bigRmsk.as</em></a> to your computer 
   (<em>Step 3</em>, above).</li>
   <li>
   Download the 
   <a href="http://hgdownload.soe.ucsc.edu/admin/exe/"><code>bedToBigBed</code> utility</a> 
  (<em>Step 4</em>, above).</li>
   <li>
   Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. 
   This file contains the chrom.sizes for the human (hg38) assembly (<em>Step 5</em>, above).</li>
   <li>
   Run the <code>bedToBigBed</code> utility to create a binary indexed MAF file (<em>Step 6</em>,
   above):
 <pre><code>bedToBigBed -type=bed3+1 -tab -as=bigRmsk.as bigRmsk.txt hg38.chrom.sizes bigRmsk.bb</code></pre></li>
   <li>
   Move the newly created bigRmsk file (<em>bigRmsk.bb</em>) to a web-accessible location (<em>Step 
   7</em>, above).</li>
   <li>
   Construct a track line that points to the bigRmsk file (<em>Step 8</em>, above).</li>
   <li>
   Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser 
   (<em>step 9</em>, above).</li>
 </ol>
 -->
 <h2 id="share">Sharing your data with others</h2>
 <p>
 If you would like to share your bigRmsk data track with a colleague, learn how to create a URL by 
 looking at Example 6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p>
 
 <h2 id="extract">Extracting data from the bigRmsk format</h2>
 <p>
 Because bigRmsk files are an extension of bigBed files, which are indexed binary files, it can 
 be difficult to extract data from them. UCSC has developed the following programs to assist
 in working with bigBed formats, available from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <ul>
   <li>
   <code>bigBedToBed</code> &mdash; converts a bigBed file to ASCII BED format.</li>
   <li>
   <code>bigBedSummary</code> &mdash; extracts summary information from a bigBed file.</li>
   <li>
   <code>bigBedInfo</code> &mdash; prints out information about a bigBed file.</li>
 </ul>
 <p>
 As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the 
 command line to view the usage statement.</p>
 
 <h2 id="trouble">Troubleshooting</h2>
 <p>
 If you encounter an error when you run the <code>bedToBigBed</code> program, check your input 
 file for data coordinates that extend past the the end of the chromosome. If these are present, run 
 the <code>bedClip</code> program 
 (<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic
 row(s) in your input file before running the <code>bedToBigBed</code> program.</p> 
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->