ac67c40af43801ce6a8bb6e7fe1beeb2b49fb355
gperez2
  Mon Jun 3 04:50:49 2024 -0700
Adding a additional fields note for the bigPsl Track Format page, refs #17950

diff --git src/hg/htdocs/goldenPath/help/bigPsl.html src/hg/htdocs/goldenPath/help/bigPsl.html
index 3761f45..2fbcabf 100755
--- src/hg/htdocs/goldenPath/help/bigPsl.html
+++ src/hg/htdocs/goldenPath/help/bigPsl.html
@@ -1,245 +1,250 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser bigPsl Track Format" -->
 <!--#set var="ROOT" value="../.." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>bigPsl Track Format</h1> 
 <p>
 <p>
 The bigPsl format stores alignments between two sequences just as 
 <a href="../../FAQ/FAQformat.html#format2">PSL</a> files do; however, bigPsl files are compressed 
 and indexed as <a href="/goldenPath/help/bigBed.html">bigBeds</a>. PSL files are converted to bigPsl
 files using the program <code>bedToBigBed</code>, run with the <code>-as</code> option to pull in a
 special <a href="http://www.linuxjournal.com/article/5949" target="_blank">autoSql</a>
 (<em>.as</em>) file that defines the fields of the bigPsl.</p>
 <p>
 The bigPsl files are in an indexed binary format. The main advantage of this format is that only 
 those portions of the file needed to display a particular region are transferred to the Genome 
 Browser server. Because of this, bigPsl files have considerably faster display performance than 
 regular PSL files when working with large data sets. The bigPsl file remains on your local 
 web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for 
 the currently displayed chromosomal position is locally cached as a &quot;sparse file&quot;. If you
 do not have access to a web-accessible server and need hosting space for your bigPsl files, please see the 
 <a href="hgTrackHubHelp.html#Hosting">Hosting</a> section of the Track Hub Help documentation.</p>
 
 <a name=bigPsl></a>
 <h2>bigPSL format definition</h2>
 <p>
 The following autoSql definition is used to specify bigPsl alignment files. This
 definition, contained in the file <a href="examples/bigPsl.as"><em>bigPsl.as</em></a>, is pulled in 
 when the <code>bedToBigBed</code> utility is run with the <code>-as=bigPsl.as</code> option.</p> 
 <!--Click this <a href="examples/bigPsl.txt"><code>bigPsl.txt</code></a> file for
 an example of bigPsl input. -->
 <pre><code>table bigPsl
 "bigPsl pairwise alignment"  
     ( 
     string chrom;       	"Reference sequence chromosome or scaffold"
     uint   chromStart;  	"Start position in chromosome"
     uint   chromEnd;    	"End position in chromosome"
     string name;        	"Name or ID of item, ideally both human readable and unique"
     uint score;         	"Score (0-1000)"
     char[1] strand;     	"+ or - indicates whether the query aligns to the + or - strand on the reference"
     uint thickStart;    	"Start of where display should be thick (start codon)"
     uint thickEnd;      	"End of where display should be thick (stop codon)"
     uint reserved;       	"RGB value (use R,G,B string in input file)"
     int blockCount;     	"Number of blocks"
     int[blockCount] blockSizes; "Comma separated list of block sizes"
     int[blockCount] chromStarts;"Start positions relative to chromStart"
 
     uint    oChromStart;	"Start position in other chromosome"
     uint    oChromEnd;  	"End position in other chromosome"
     char[1] oStrand;    	"+ or -, - means that psl was reversed into BED-compatible coordinates" 
     uint    oChromSize; 	"Size of other chromosome."
     int[blockCount] oChromStarts;"Start positions relative to oChromStart or from oChromStart+oChromSize depending on strand"
     
     lstring  oSequence;  	"Sequence on other chrom (or empty)"
     string   oCDS;       	"CDS in NCBI format"
   
     uint    chromSize;		"Size of target chromosome"
   
     uint match;        		"Number of bases matched."
     uint misMatch; 		"Number of bases that don't match "
     uint repMatch; 		"Number of bases that match but are part of repeats "
     uint nCount;   		"Number of 'N' bases "
     uint seqType;   		"0=empty, 1=nucleotide, 2=amino_acid"
     ) </code></pre>
 <p>
 The value of the <code>oStrand</code> field indicates whether or not the stored psl data should be
 reverse-complemented before it is outputted or displayed. This is necessary because the bigPsl file 
 stores reference coordinates on the positive strand, as required by the BED format. The 
 <code>strand</code> field indicates whether the positions in <code>oChromStarts</code> are listed 
 from the chromosome beginning (+) or end (-).</p>
 
-<p><b>Addtional extra fields:</b> Since bigPsl is just a bigBed file, in addition to these fields, bigPsl
-files can have any number of additional, extra bigBed fields, as
-long as they are defined after the seqType field. 
-See <a href="bigBed.html#Ex3">Example 3</a> of the bigBed documentation page for 
-sample .as and .bed files. These fields can also be used for custom <a
-href="https://genome-blog.gi.ucsc.edu/blog/2022/06/28/track-hub-settings/">mouseOvers</a>,
-feature filters and coloring options. Contact us if you run into difficulty
-using them.</p>
+<p><b>Additional fields:</b>
+Since a bigPsl file is a bigBed file, additional fields can be added as bigBed fields. The
+additional bigBed fields are defined after the seqType field of the bigPsl.as file. 
+See <a href="bigBed.html#Ex3">Example 3</a> of the bigBed Track Format page for an example
+on how to create a bigBed file with extra (custom) fields. The additional fields can be used for
+custom
+<a href="https://genome-blog.gi.ucsc.edu/blog/2022/06/28/track-hub-settings/">mouseOvers</a>,
+feature filters, and coloring options. Contact us at
+<A HREF="mailto:&#103;&#101;&#110;&#111;&#109;&#101;&#45;ww&#119;&#64;&#115;&#111;&#101;.
+uc&#115;&#99;.
+&#101;&#100;&#117;">
+&#103;&#101;&#110;&#111;&#109;&#101;&#45;ww&#119;&#64;&#115;&#111;&#101;.uc&#115;&#99;.&#101;&#100;&#117;
+</A> if you run into difficulty using them.</p>
 
 <p>
 The <code>bedToBigBed</code> utility uses a substantial amount of memory: approximately 
 25% more RAM than the uncompressed BED input file. The -maxAllow option to bedToBigBed can be
 used to allow the tool to use more than 16GB of RAM.</p>
 
 <h2>Creating a bigPsl track</h2>
 <p>
 To create a bigPsl track, follow these steps:</p>
 <p> 
 <strong>Step 1.</strong>
 If you already have a PSL file created using BLAT or another tool, skip to <em>Step 2</em>. 
 Otherwise, download the example PSL file <a href="examples/bigPsl.psl"><em>bigPsl.psl</em></a> for 
 the human GRCh38/hg38 assembly. You will also want to download the files 
 <a href="examples/bigPsl.fa"><em>bigPsl.fa</em></a> and 
 <a href="examples/bigPsl.cds"><em>bigPsl.cds</em></a> if you 
 would like to use the alternate options described in <em>Step 4</em>, below.</p>
 <p>
 <strong>Step 2.</strong> 
 Download the <code>bedToBigBed</code> and <code>pslToBigPsl</code> programs from the
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <p>
 <strong>Step 3.</strong> 
 Use the <code>fetchChromSizes</code> script from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">same directory</a> to create a
 <em>chrom.sizes</em> file for the UCSC database with which you are working (e.g., hg38). 
 Alternatively, you can download the <em>chrom.sizes</em> file for any assembly hosted at UCSC from 
 our <a href="http://hgdownload.soe.ucsc.edu/downloads.html">downloads</a> page (click on &quot;Full 
 data set&quot; for any assembly). For example, the <em>hg38.chrom.sizes</em> file for the hg38 
 database is located at 
 <a href="http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes" 
 target="_blank">http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes</a>.</p>
 <p>
 <strong>Step 4.</strong>  
 Use the <code>pslToBigPsl</code> utility to create a bigPsl file in bed12+13 format that contains 
 the 25 fields described in the bigPsl format definition <a href="#bigPsl">above</a>. The file must 
 include the 13 extra fields: <code>oChromStart</code>, 
 <code>oChromEnd</code>, <code>oStrand</code>, <code>oChromSize</code>, <code>oChromStarts</code>, 
 <code>oSequence</code>, <code>oCDS</code>, <code>chromSize</code>, <code>match</code>, 
 <code>misMatch</code>, <code>repMatch</code>, <code>nCount</code>, and <code>seqType</code>.</p> 
 <pre><code>pslToBigPsl bigPsl.psl stdout | sort -k1,1 -k2,2n &gt; bigPsl.txt</code></pre>
 <p>
 If you are using your own PSL file, you may have corresponding FASTA and CDS files that accompany
 it. You can provide these files as input to <code>pslToBigPsl</code> to generate a more informative 
 bigPsl file:</p> 
 <pre><code>pslToBigPsl bigPsl.psl -cds=bigPsl.cds -fa=bigPsl.fa stdout | sort -k1,1 -k2,2n &gt; bigPsl.txt</code></pre>
 <p>
 <strong>Step 5.</strong>  
 Create the binary indexed bigPsl file from your sorted bigPsl input file using the 
 <code>bedToBigBed</code> utility:</p> 
 <pre><code>bedToBigBed -as=bigPsl.as -type=bed12+13 -tab bigPsl.txt chrom.sizes bigPsl.bb</code></pre>
 <p>
 <strong>Step 6.</strong> 
 Move the newly created bigPsl file (<em>bigPsl.bb</em>) to a web-accessible http, https, or ftp 
 location.</p>
 <p>
 <strong>Step 7.</strong> 
 Construct a <a href="hgTracksHelp.html#CustomTracks">custom track</a> using a single 
 <a href="hgTracksHelp.html#TRACK">track line</a>. Note that any of the track attributes listed 
 <a href="customTrack.html#TRACK">here</a> are applicable to tracks of type bigBed. The most basic 
 version of the track line will look something like this:</p>
 <pre><code>track type=bigPsl name="My Big Psl" description="Some mRNAs Discovered from Data from My Lab" bigDataUrl=http://myorg.edu/mylab/myBigPsl.bb</code></pre>
 <p>
 <strong>Step 8.</strong> 
 Paste the custom track line into the text box on the <a href="../../cgi-bin/hgCustom">custom 
 track management page</a>.</p>
 <p>
 The <code>bedToBigBed</code> program can be run with several additional options. For a full list of 
 the available options, type <code>bedToBigBed</code> (with no arguments) on the command line to 
 display the usage message. </p>
 
 <h2>Examples</h2>
 
 <h3>Example #1</h3>
 <p>
 In this example, you will create a bigPsl custom track using an existing bigPsl file,
 <em>bigPsl.bb</em>, located on the UCSC Genome Browser http server. This file contains data for 
 the hg38 assembly.</p>
 <p>
 To create a custom track using this bigPsl file:
 <ol>
   <li>
   Construct a track line that references the file:</p>
   <pre><code>track type=bigPsl name=&quot;bigPsl Example One&quot; description=&quot;A bigPsl file&quot; bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.bb</code></pre></li>
   <li>
   Paste the track line into the <a href="../../cgi-bin/hgCustom?db=hg38">custom track management 
   page</a> for the human assembly hg38. </li>
   <li>
   Click the &quot;submit&quot; button.</li> 
 </ol>
 <p>
 Custom tracks can also be loaded via one URL line. The link below loads the same bigPsl track, and 
 sets additional display parameters in the URL:</p>
 <pre><a href="http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:10000-200000&hgct_customText=track%20type=bigPsl%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.bb%20visibility=pack"
 target="_blank"><code>http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr1:10000-200000&hgct_customText=track%20type=bigPsl%20name=Example%20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigPsl.bb%20visibility=pack</code></a></pre>
 <p>
 After this example bigPsl is loaded in the Genome Browser, click on a track item in the
 Genome Browser's track display. Note that the details page displays information about the alignment,
 similar that which is available for PSL tracks, as well as links that display the browser position 
 of the alignment and other detailed information about the alignment.</p>
 
 <h2>Example #2</h2>
 <p>
 In this example, you will create your own bigPsl file from an existing bigPsl input file.</p>
 <ol>
   <li>
   Save the example bed12+13 file <a href="examples/bigPsl.txt"><em>bigPsl.txt</em></a> to your 
   computer (<em>Step 4</em> in <em>Creating a bigPsl track</em>, above</em>).</li>
   <li>
   Download the 
   <a href="http://hgdownload.soe.ucsc.edu/admin/exe/"><code>bedToBigBed</code> utility</a> 
   (<em>Step 2</em>, above).</li>
   <li>
   Save the <a href="hg38.chrom.sizes"><em>hg38.chrom.sizes</em> text file</a> to your computer. 
   This file contains the chrom.sizes for the human (hg38) assembly (<em>Step 3</em>, above).</li>
   <li>
   Save the autoSql file <a href="examples/bigPsl.as"><em>bigPsl.as</em></a> to your computer.</li>
   <li>
   Run the <code>bedToBigBed</code> utility to create the bigPsl output file (<em>Step 5</em>, 
   above):
 <pre><code>bedToBigBed -as=bigPsl.as -type=bed12+13 -tab  bigPsl.txt hg38.chrom.sizes bigPsl.bb</code></pre></li>
   <li>
   Place the newly created bigPsl file (<em>bigPsl.bb</em>) on a web-accessible server 
   (<em>Step 6</em>, above).</li>
   <li>
   Construct a track line that points to the bigPsl file (<em>Step 7</em>, above).</li>
   <li>
   Create the custom track on the human assembly hg38 (Dec. 2013), and view it in the Genome Browser 
   (<em>Step 8, above</em>).</li>
 </ol>
 
 <h2>Sharing your data with others</h2>
 <p>
 If you would like to share your bigPsl data track with a colleague, learn how to create a URL by 
 looking at Example #6 on <a href="customTrack.html#EXAMPLE6">this page</a>.</p>
 
 <h2>Extracting data from the bigPsl format</h2>
 <p>
 Because bigPsl files are an extension of bigBed files, which are indexed binary files, it can 
 be difficult to extract data from them. UCSC has developed the following programs to assist
 in working with bigBed formats, available from the 
 <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">binary utilities directory</a>.</p>
 <ul>
   <li>
   <code>bigBedToBed</code> &mdash; converts a bigBed file to ASCII BED format.</li>
   <li>
   <code>bigBedSummary</code> &mdash; extracts summary information from a bigBed file.</li>
   <li>
   <code>bigBedInfo</code> &mdash; prints out information about a bigBed file.</li>
 </ul>
 <p>
 As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the 
 command line to view the usage statement.</p>
 
 <h2>Troubleshooting</h2>
 <p>
 If you encounter an error when you run the <code>bedToBigBed</code> program, check your input 
 file for data coordinates that extend past the the end of the chromosome. If these are present, run 
 the <code>bedClip</code> program 
 (<a href="http://hgdownload.soe.ucsc.edu/admin/exe/">available here</a>) to remove the problematic
 row(s) in your input file before running the <code>bedToBigBed</code> program.</p> 
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->