45bd9a6393500fbd7565c85875ccf2e3d82f07ac gperez2 Tue Mar 12 09:49:51 2024 -0700 Edits to Mark Diekhans recent changes for the CRAM refUrl setting, refs #33061 diff --git src/hg/htdocs/goldenPath/help/cram.html src/hg/htdocs/goldenPath/help/cram.html index 23cd8d3..8dae214 100755 --- src/hg/htdocs/goldenPath/help/cram.html +++ src/hg/htdocs/goldenPath/help/cram.html @@ -14,33 +14,33 @@ <p> Since CRAM files are more dense than <a href="/goldenPath/help/bam.html">BAM</a> files, many groups are switching to the CRAM format to save disk space. For CRAM tracks to load there is an expectation that the checksum of the reference sequence used to create the CRAM will be in the CRAM header. A file with a matching checksum is also expected to be accessible from the EBI RefGet CRAM reference registry (see <a href="#refs">References</a> for CRAM resources). Otherwise, users must specify a <code>refUrl</code> setting that will point to a server that is offering up the reference sequences (see <a href="#example4">Example Four</a>).</p> <p> Since the loading of CRAM data requires the specific reference sequence used to create the CRAM file, it is very important that the exact same reference sequence is used for compression and decompression. When a CRAM file is first loaded on a given chromosome, a check for the preexistence in a special browser "cramCache" directory of the specified reference checksum will take place. If the reference sequence information specific for that CRAM for the currently viewed chromosome region does not exist, a message will display about the file not being found along with -a note about downloading the reference from the EBI CRAM reference registry if it is available (or -from another Refget server using <code>refUrl</code>). A refresh of the page once the download is complete will -display the CRAM data as if it were a BAM file.</p> +a note about downloading the reference from the EBI CRAM reference registry if it is available or +from another Refget server using the <code>refUrl</code> setting. A refresh of the page once the download is +complete will display the CRAM data as if it were a BAM file.</p> <p> The track lines to describe CRAM tracks are identical to track lines for BAM tracks. This includes the <code>type</code> parameter, which is still <code>bam</code> even for CRAM tracks. The only difference is that instead of providing the URL to a BAM file, the URL instead points to a CRAM file.</p> <p> Please also note that just as a BAM file requires an associated BAM.bai index file, a CRAM file will require an associated CRAM.crai index file in the same location to load.</p> <h2>Example #1</h2> <p> Here is an example CRAM track that displays around the gene SOD1 on hg19 that can be cut and pasted as text into the <a href="/cgi-bin/hgCustom?db=hg19" target="_blank">Custom Tracks</a> page:</p> <pre><code>track type=bam db=hg19 name=exampleCRAM bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/cramExample.cram </code></pre> <p> @@ -84,37 +84,36 @@ <a href="hgTrackHubHelp.html" target="_blank">User Guide</a> and associated Quick Start Guides to building hubs. Note that <code>type bam</code> is used to display CRAM files in hubs, just as <code>type bam</code> is used in custom CRAM tracks.</p> <pre><code>track cram61 type bam shortLabel HG00361 longLabel This CRAM file is from the 1000 Genomes Project HG00361 visibility pack bigDataUrl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00361/exome_alignment/HG00361.mapped.ILLUMINA.bwa.FIN.exome.20120522.bam.cram</code></pre> <a name="example4"></a> <h2>Example #4</h2> <p> For genomes that are not registered in the EBI CRAM Reference Registry, the <code>refUrl</code> setting is used to point the browser to -the appropriate place to find the reference sequence. The <code>refUrl</code> -takes an argument of the URL of the reference server with a <tt>%s</tt> being -replaced the RefGet MD5 checksum that identifies the reference sequence. +the appropriate place to find the reference sequence. The <code>refUrl</code> setting is used with +the URL of the reference server, such as <code>refUrl http://university.edu/URL/cramRef/%s</code> +where the <tt>%s</tt> gets replaced by the RefGet MD5 checksum which identifies the reference +sequence.</p> <p> -</p> -The below example is a hub track stanza using the <code>refUrl</code> -option:</p> +The example below shows a hub track stanza using the <code>refUrl</code> setting:</p> <pre><code>track cramExample type bam visibility full shortLabel cramExRefUrl longLabel This CRAM file points to a reference sequence specified by refUrl refUrl http://university.edu/URL/cramRef/%s bigDataUrl ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram</code></pre> <p> The use of <code>refUrl</code> can also be employed on a custom track line:</p> <pre><code>track type=bam db=hg19 name=cramExRefUrl refUrl=http://university.edu/URL/cramRef/%s bigDataUrl=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/alignment/HG00096.mapped.ILLUMINA.bwa.GBR.low_coverage.20120522.bam.cram </code></pre> <a name="refs"></a> <h2>References</h2>