96c750edb89fd5bde5eca4dbf167578f25bb00b4 brianlee Mon Jun 7 16:16:21 2021 -0700 Updating CyVerse section and cleaning up some htmlValidate check paragraph items. refs #26511 diff --git src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html index f4caf49..3bab9da 100755 --- src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html +++ src/hg/htdocs/goldenPath/help/hgTrackHubHelp.html @@ -42,31 +42,31 @@ Groupings</a></li> <li> <a href="hubQuickStartAssembly.html" target="_blank">Quick Start Guide to Assembly Hubs with Blat</a></li> <li><a href="hubQuickStartSearch.html" target="_blank">Quick Start Guide to Searchable Track Hubs</a></li> </ul> <div> <form name="googleForm1" method="GET" action="http://www.google.com/search" onSubmit="document.googleForm1.q.value=document.googleForm1.qq.value+' site:genome.ucsc.edu/goldenPath/help';"> <p> Search the Genome Browser help pages: <input type="hidden" name="q" value=""> <input type="hidden" name="num" value="10"> <input type="hidden" name="filter" value="0"> <input type=text name=qq size=30 maxlength=255 value=""> - <input type="submit" value="Submit"> + <input type="submit" value="Submit"></p> </form> </div> <p> <a href="../../contacts.html">Questions and feedback are welcome</a>.</p> <!-- ========== What Are Track Hubs? ============================== --> <a name="Intro"></a> <h2>What Are Track Hubs?</h2> <p> Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser (please note that hosting hub files on HTTP tends to work even better than FTP and local hubs can be displayed on <a href="hubQuickStartAssembly.html#blatGbib" target="_blank">GBiB</a>). Track hubs can be displayed on genomes that UCSC directly supports, or on your own sequence. Hubs are a useful tool for visualizing a large number of genome-wide data sets. For example, a project that has produced several wiggle plots of data can use the hub utility to organize the tracks into @@ -436,36 +436,36 @@ <strong><em>Example 2:</em></strong> Sample hub.txt file defining attributes for the track hub shown in <em>Example 1</em>.</p> <pre><code><strong>hub</strong> UCSCHub <strong>shortLabel</strong> UCSC Hub <strong>longLabel</strong> UCSC Genome Informatics Hub for human DNase and RNAseq data <strong>genomesFile</strong> genomes.txt <strong>email</strong> genome@soe.ucsc.edu <strong>descriptionUrl</strong> ucscHub.html </code></pre> <hr> <p> <strong>Step 5. Create the genomes.txt file</strong><br> Create a genomes.txt file within the track hub directory that contains a two-line stanza that must be separated by a line for each genome assembly that is supported by the hub data. Each stanza shows the location of the trackDb file that defines display properties for each track in that assembly, as well as an optional metadata storage file</p> -<pre><code><strong>genome</strong> <em>assembly_database_1</em> +<pre><strong>genome</strong> <em>assembly_database_1</em> <strong>trackDb</strong> <em>assembly_1_path/trackDb.txt</em> -<strong>metaTab</strong> <em>assembly_1_path/tabSeparatedFile.txt</em> </code></pre> +<strong>metaTab</strong> <em>assembly_1_path/tabSeparatedFile.txt</em></pre> <pre><strong>genome</strong> <em>assembly_database_2</em> -<strong>trackDb</strong> <em>assembly_2_path/trackDb.txt</em> </code> -<strong>metaDb</strong> <em>assembly_2_path/tagStormFile.txt</em> </code></pre> +<strong>trackDb</strong> <em>assembly_2_path/trackDb.txt</em> +<strong>metaDb</strong> <em>assembly_2_path/tagStormFile.txt</em></pre> <p> <em>genome</em> - a valid UCSC database name. Each stanza must begin with this tag and each stanza must be separated by an empty line.</p> <p> <em>trackDb</em> - the relative path of the trackDb file for the assembly designated by the <em>genome</em> tag. By convention, the trackDb file is located in a subdirectory of the hub directory. However, the trackDb tag may also specify a complete URL.</p> <p><em>metaDb</em> - the path to an optional tagStorm file that has the metadata for each track. Each track with metadata should have a "meta" tag specified in the trackDb stanza for that track and a "meta" tag in the tagStorm file.</p> <p><em>metaTab</em> - the path to an optional tab separated file that has the metadata for each track. Each track with metadata should have a "meta" tag specified in the trackDb stanza for that track and a "meta" tag in the tab separated file. The first line of the TSV file should start with a '#' and have the field names for each column, one of them being "meta".</p> @@ -674,56 +674,55 @@ genome.txt, and trackDb.txt settings and displays warnings and errors in bright red font, such as "<font color="red">Missing required setting...</font>" and "<font color="red">Cannot open...</font>". The "Display load times" and "Enable hub refresh" optional settings show the load timing at the bottom of the Genome Browser page and allow instant hub refresh instead of 5 minute refresh. These options can be checked and activated by clicking "View Hub on Genome Browser". The following picture shows <a href="examples/hubExamples/hubGroupings/hub.txt">the example track grouping hub</a> with the warning that the hub has no hub description page, no configuration errors, and "Display load times" checked:</p> <p class='text-center'> <img class='text-center' src="../../images/hubDevelopment.png" alt="The Hub Development tool checks config setting" width="749" height="249"> <p class='gbsCaption text-center'>The Hub Development tool checks for proper configuration files and track hub settings, and allows access to debugging settings.</p> -</p> <h3>Check hub settings using hubCheck utility</h3> <p> It is a good practice to run the command-line utility <em>hubCheck</em> on your track hub when you first bring it online and whenever you make significant changes. This utility by default checks that the files in the hub are correctly formatted, but it can also be configured to check a few other things including that various trackDb settings are correctly spelled and that they are supported by the UCSC Genome Browser. You can read more about using hubCheck to check the -compatibility of your hub with other genome browsers <a href="#Compatibility"</a>below</a>. +compatibility of your hub with other genome browsers <a href="#Compatibility"</a>below</a>.</p> <p> Here is the usage statement for the hubCheck utility: <pre><code>hubCheck - Check a track data hub for integrity. usage: hubCheck http://yourHost/yourDir/hub.txt options: -checkSettings - check trackDb settings to spec -version=[v?|url] - version to validate settings against (defaults to version in hub.txt, or current standard) -extra=[file|url] - accept settings in this file (or url) -level=base|required - reject settings below this support level -settings - just list settings with support level Will create this directory if not existing -noTracks - don't check remote files for tracks, just trackDb (faster) - -udcDir=/dir/to/cache - place to put cache for remote bigBeds and bigWigs </code></pre> + -udcDir=/dir/to/cache - place to put cache for remote bigBeds and bigWigs </code></pre></p> <p> Note that you will have to use the udcDir if /tmp/udcCache is not writable on your machine.</p> <p> The hubCheck program is available from the UCSC downloads server at <a href="http://hgdownload.soe.ucsc.edu/admin/exe/">http://hgdownload.soe.ucsc.edu/admin/exe/</a>.</p> <a name="troubleConnecting"></a> <h3>Troubleshooting Track Hub connections</h3> <p> If the browser is unable to load a track hub, it will display an error message. Some common causes for an import to fail include typos in the URL, a hub server that is offline, or errors in the track hub configuration files. Occasionally, remote track hubs may be missing, off-line, or otherwise unavailable. If a user is already browsing data from the remote hub when it disconnects, a yellow error message will be displayed instead of the expected data.</p> @@ -845,31 +844,31 @@ <p> <strong><em>Example 3:</em></strong> Checking your settings against those provided by UCSC and another source, such as Ensembl.</p> <p> If you want to check the settings in your hub against those supported by other genome browsers, you will first need to create a single-column file that lists each non-UCSC setting and then use the "-extra=" option to specify this file when running hubCheck. For example, if you knew that a setting called "ensemblAssemblyName" was supported for use in track hubs by Ensembl, you could create a single line file that included the setting "ensemblAssemblyName". Then, when you want to check a hub that includes these extra trackDb settings, you would then specify this extra settings file on the command line:</p> <pre><code>$ hubCheck -checkSettings -extra=http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubCheckUnsupportedSettings/myExtraSettings.txt http://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubCheckUnsupportedSettings/hub.txt </code></pre> <p> (Note: The settings listed here in the "extra" file are -just examples and do not represent real trackDb variables for hubs at Ensembl.) +just examples and do not represent real trackDb variables for hubs at Ensembl.)</p> <!-- ========== Where to host your data ============================== --> <a name="Hosting"></a> <h2>Where to host your data?</h2> As stated in <a href="#Intro">What Are Track Hubs?</a>, track hubs files must be located in web-accessible locations that support byte-range requests. Four options for hosting include: <ul> <li>Your institution's Information Technology services <li>Commercial webspace providers <li>Commercial cloud providers <li>Free webspace providers </ul> <p> <b>Your Institution:</b> Many universities provide a location for researchers to place shareable data on the web and contacting your institution's system @@ -909,107 +908,104 @@ OneDrive, Tencent Weiyun, Yandex.Disk, etc.) do not work reliably as their business model requires rare and rate-limited data access, which is too slow or too limited for genome annotation display. However, commercial cloud <b>storage</b> offers that charge per GB transferred (Amazon S3, Microsoft Azure Storage, Google Cloud Storage, Backblaze, Alibaba Object Store, etc.) typically do work. As of 2020, they cost around 2-3 US cents/GB/month to store the hub data and 12-18 US cents per GB transferred, when the hub is used. For optimal performance, select a San Francisco / San Jose data center for the main UCSC site genome.ucsc.edu, a Frankfurt/Germany data center for genome-euro.ucsc.edu and a Tokyo data center for genome-asia.ucsc.edu. You may also want to review this discussion about issues with <a href="http://genomewiki.ucsc.edu/index.php/Cloud-storage_providers_and_byte-range_requests_of_UCSC_big*_files" target="_blank">distributed storage servers</a>. <b>These services are external to UCSC and may change.</b></p> -</p><b>Free webspace:</b> If you do not want to pay for web space, +<p><b>Free webspace:</b> If you do not want to pay for web space, and your institution does not provide a data location supporting byte-range requests, we know of at least the following sites where you can host research data and configuration files for free: <ul> <li><a href="https://de.cyverse.org/de/" target="_blank">CyVerse Discovery Environment</a> - lots of space, but can be relatively slow to display</li> <!--<li><a href="https://usegalaxy.org/" target="_blank">Galaxy</a></li>--> <li><a href="https://github.com/" target="_blank">Github</a> - files limited to 100MB, but very fast</li> <li><a href="https://figshare.com/" target="_blank">Figshare</a> - not limited and fast, but every file needs to be uploaded individually and cannot be changed. Optimal for very stable links, e.g. in publications.</li> </ul> +<p> Each of the providers above has a slightly different approach to hosting data for compatibility with the UCSC Genome Browser, and may have different advantages and disadvantages, such as size limitations, usage statistics, and version control integration. Additionally, as previously mentioned, any provider that supports byte-range access will work for hub hosting, and you are not limited to the above sites. Below is a summarized guide for each of the providers mentioned above.</p> <h3>Hosting Hubs on CyVerse</h3> <p> <a href="http://www.cyverse.org/" target="_blank">CyVerse</a>, previously known as the iPlant Collaborative, is an NSF-funded site created for assisting data scientists with their data storage and compute needs. Data hosting by CyVerse is free for academic groups and they support byte-range access, so they can be used for track hubs. However, Cyverse is sometimes slow, and may result in error messages if your hub includes many tracks that are meant to be shown at the same time by your users.</p> - -<p>In order to host your data on CyVerse, you first must create an account and then use their -<a href="https://de.cyverse.org/de" target="_blank">Discovery Environment</a> to upload data. After creating an -account, use the "Upload" and "Simple Upload" buttons to upload files -individually as shown below: +<p> +In order to host your data on CyVerse, you first must create an account and then use their +<a href="https://de.cyverse.org/de" target="_blank">Discovery Environment</a> to upload data. +After creating an account and signing in, access the data screen by clicking the second icon +on the left. Use the "Upload" button on the far right to import data from a URL or +locally from your machine. <div class="text-center"> - <img height="400px" src="../../images/cyverseUploadButton.png"> -</div> + <img height="150px" src="../../images/cyverseUploadButton.png"> +</div></p> <p> You can also use the command line utility -<a href="https://pods.iplantcollaborative.org/wiki/display/DS/Setting+Up+iCommands" target="_blank">iCommands</a> -to facilitate bulk transfer of data (best used for large files in the 2-100 GB range), or use -<a href="https://pods.iplantcollaborative.org/wiki/display/DS/Using+Cyberduck+for+Uploading+and+Downloading+to+the+Data+Store" target="_blank"> -Cyberduck</a> to bulk transfer up to 80 GB of data in one go.</p> - -<p> -After uploading some data, check the "Info-Type" of your BAM, bigWig, bigBed, etc. files. -If an Info-Type has not been selected automatically or if it is incorrect, make sure it -is correct. If uploading an assembly hub, assign the Info-Type "bed" to the 2bit file, as -well as any text files, like your trackDb.txt, groups.txt, or description.html. -</p> - -<p> -After giving an appropriate type (like "bam") to your binary files, you must update any -text files to point to CyVerse locations. For example, your <em>hub.txt</em> will contain a line -like: -<pre><code>genomesFile genomes.txt</code></pre> -<p>Which must be edited to point to a CyVerse URL such as:</p> -<pre><code>genomesFile https://data.cyverse.org/dav-anon/iplant/home/...</code></pre> -<p>Luckily, CyVerse allows you to edit these text files after uploading them, so you can create a -"Send To: Genome Browser" link:</p> +<a href="https://cyverse.atlassian.net/wiki/spaces/DS/pages/241869823/Setting+Up+iCommands" +target="_blank">iCommands</a> to facilitate bulk transfer of data (best used for +large files in the 2-100 GB range), or use +<a href="https://cyverse.atlassian.net/wiki/spaces/DS/pages/241869843/Using+Cyberduck+for+Uploading+and+Downloading+to+the+Data+Store" +target="_blank">Cyberduck</a> to bulk transfer up to 80 GB of data in one go.</p> +<p> +Once your file is available use the three dots on the far right to click the "Public +Links(s)" option.</p> <div class="text-center"> - <img height="400px" src="../../images/cyverseSendToGenomeBrowser.png"> + <img height="250px" src="../../images/cyverseCreatePublicLink.png"> </div> -<p>And then edit the fields of your <em>hub.txt</em>, <em>genomes.txt</em>, and <em>trackDb.txt</em> -files like so:</p> +<p> +Select this option for all the files you will be using with the Genome Browser, whether they +are text-based files (trackDb.txt, groups.txt, description.html, etc.) or binary-indexed files +(BAM, bigWig, bigBed, etc.) requiring byte-range access. Note, if you have a dataFile.bam, +you must also have a dataFile.bam.bai file of the matching name and both must have public +links created.</p> +<p> +After creating public links to your binary files, you must ensure your text files (i.e., trackDb.txt) +point to the CyVerse locations for the files. For instance, the bigDataUrl setting, will need to point +to the location of the BAM, bigWig, or bigBed (i.e., <code>bigDataUrl https://data.cyverse.org/... +/dataFile.bam</code>).</p> <div class="text-center"> - <img height="500px" width="1000px" src="../../images/cyverseEditedPaths.png"> + <img height="200px" src="../../images/cyverseCreatePublicLink2.png"> </div> -<p>To get the correct links to bigData files, again be sure to use the "Send To: Genome -Browser" links in the menu. -</p> - <p> -Please see the <a href="https://wiki.cyverse.org/wiki/display/DEmanual/Viewing+Genome+Files+in+a+Genome+Browser" target="_blank"> -Viewing Genome Files in a Genome Browser</a> wiki page on the CyVerse wiki for more information -(please note the difference of the Data Commons for final curated publication material, -and the Discovery Environment for developing data). +The hub.txt file (if not using the <a href="hgTracksHelp.html#UseOneFile" +target="_blank">useOneFile on</a> setting) will need to point the related +genomes.txt location, which in turn points to the trackDb.txt location +using these full https://data.cyverse.org/... links as well.</p> +<p> +Please see the <a href="https://cyverse.atlassian.net/wiki/spaces/DEmanual/pages/242027070/Using+the+Discovery+Environment" +target="_blank">Using the Discovery Environment</a> wiki page on the CyVerse wiki for more information. Please direct any questions about CyVerse or the Discovery Environment to their -<a href="http://www.cyverse.org/learning-center/ask-cyverse" target="_blank">Ask CyVerse</a> page or contact -Cyverse support staff directly via the blue Intercom button on the bottom right of the -Discovery Environment page. +<a href="https://cyverse.org/contact" target="_blank">Contact Us</a> page or "Chat with +Cyverse support" staff directly via the blue question box icon on the top right of the +Discovery Environment page.</p> <!-- Galaxy stub <h3>Hosting Hubs on Galaxy</h3> <p> Galaxy is </p> --> <h3>Hosting Hubs on Github</h3> <p> <a href="https://github.com" target="_blank">Github</a> supports byte-range access to files when they are accessed via the <em><b>raw.githubusercontent.com</b></em> style URLs. To obtain a raw URL to a file already uploaded on Github, click on a file in your repository and click the <em>Raw</em> button:</p> <div class="text-center"> <img height="275px" width="55%" src="../../images/githubRawLink.png" alt="Location of the Raw button for generating a plaintext URL to a file hosted on Github."> </div> <p>