add669d23a4b18c14371a461ac9ab766e18db54f gperez2 Sun Sep 14 16:10:11 2025 -0700 Adding chromAlias and chromAuthority settings to the Assembly Hub User Guide, refs #33014 diff --git src/hg/htdocs/goldenPath/help/assemblyHubHelp.html src/hg/htdocs/goldenPath/help/assemblyHubHelp.html index ce79fca0e8c..1dbe7943eaf 100755 --- src/hg/htdocs/goldenPath/help/assemblyHubHelp.html +++ src/hg/htdocs/goldenPath/help/assemblyHubHelp.html @@ -21,30 +21,31 @@ to the <a href="https://www.ncbi.nlm.nih.gov/datasets/genome/">NCBI Assembly</a> system, it may already be available in the <a href="https://genome.ucsc.edu">UCSC Genome Browser</a>. Please check the <a href="https://hgdownload.soe.ucsc.edu/hubs/">GenArk Assembly Hub</a> collection to see if your genome of interest is already available. If it is not listed there, you can use the <a href="/assemblyRequest">UCSC Assembly Request</a> page to request that the genome assembly be added.</p> <h2>Contents</h2> <h6><a href="#webServer">Web Server</a></h6> <h6><a href="#hubTxt">Assembly Hub Components</a></h6> <ul style="margin-left: 20px;"> <li><a href="#hubTxt">hub.txt</a></li> <li><a href="#genomesTxt">genomes.txt</a></li> <li><a href="#twoBitFile">2bit File</a></li> + <li><a href="#chromAlias">chromAlias</a></li> <li><a href="#groupsTxt">groups.txt</a></li> <li><a href="#singleFileHub">Single-File Track Hub</a></li> </ul> <h6><a href="#linkingHub">Linking to Your Assembly Hub</a></h6> <h6><a href="#buildingTracks">Building Tracks</a></h6> <ul style="margin-left: 20px;"> <li><a href="#cytobandTrack">Cyotoband Track</a></li> </ul> <h6><a href="#assemblyHubResources">Assembly Hub Resources</a></h6> <ul style="margin-left: 20px;"> <li><a href="#gOnRamp">G-OnRamp</a></li> <li><a href="#makeHub">MakeHub</a></li> <li><a href="#exampleNcbiAssemblyHubs">Example NCBI Assembly Hubs</a></li> </ul> </li> @@ -186,42 +187,143 @@ <pre> twoBitInfo ricCom1.2bit stdout | sort -k2rn > ricCom1.chrom.sizes </pre> <p> The <em>.2bit</em> file can also be hosted at a URL:</p> <pre> twoBitInfo -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout | sort -k2nr > ricCom1.chrom.sizes </pre> <p> To extract sequences from a <em>.2bit</em> file: </p> <pre> twoBitToFa -seq=chrCp -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout > ricCom1.chrCp.fa </pre> + +<a id="chromAlias"></a> +<h3>chromAlias</h3> + +<p> +The <code>chromAlias</code> setting enables the Genome Browser to automatically convert chromosome +names in submitted custom track data from alternate naming schemes to the names used in the +assembly. The <code>chromAlias</code> setting uses a <code>chromAlias.txt</code> file. This +functionality applies to both custom track data and assembly hub data.</p> + +<p><b>chromAlias.txt Format</b></p> +<p> +The first line of the <code>chromAlias.txt</code> file begins with a pound symbol (<code>#</code>) +followed by a blank space. Each subsequent word on this line, separated by tab characters, +specifies the source authority for the sequence names in that column. The first column contains the +sequence names used in the Genome Browser assembly, while the subsequent columns provide alternate +naming schemes.</p> + +<p> +All lines following the header line consist of columns of sequence names separated by a tab +character. If no equivalent name exists in a particular naming scheme, the column remains empty, +resulting in two adjacent tab characters.</p><br> +<p>Example:</p> +<pre> +# ucsc assembly genbank ncbi refseq ensembl +chr1 1 CM000663.2 1 NC_000001.11 1 +chr10 10 CM000672.2 10 NC_000010.11 10 +chrM MT J01415.2 MT NC_012920.1 MT +chrX X CM000685.2 X NC_000023.11 X +</pre> + +<p>In this example, the columns represent:</p> +<ul> + <li><b>ucsc</b> - UCSC-style <code>chrN</code> names</li> + <li><b>assembly</b> - names from the NCBI assembly_report.txt</code> file</li> + <li><b>genbank</b> - INSDC names</li> + <li><b>ncbi</b> - names from the <code>chr2acc</code> file in the <code>assembly_structure/</code> hierarchy</li> + <li><b>refseq</b> - names from RefSeq annotations</li> + <li><b>ensembl</b> - names from the Ensembl assembly</li> +</ul> +<p><b>Assembly Hub Usage</b></p> +<p>To use the <code>chromAlias.txt</code> file in an assembly hub, add the following line to the +genome stanza of the hub.txt file:</p> +<pre>chromAlias thisGenome.chromAlias.txt</pre> +<p>This is a relative path reference from the <code>hub.txt</code> file.</p> +<p>Example genome stanza:</p> +<pre> +genome GCF_000001405.39 +taxId 9606 +groups groups.txt +description human +twoBitPath GCF_000001405.39.2bit +twoBitBptUrl GCF_000001405.39.2bit.bpt +chromSizes GCF_000001405.39.chrom.sizes.txt +chromAlias GCF_000001405.39.chromAlias.txt +organism human +defaultPos chr1:82985474-82995474 +scientificName Homo sapiens +htmlPath html/GCF_000001405.39_GRCh38.p13.description.html +</pre> + +<p><b>Best Performance</b></p> +<p> +For improved performance, the <code>chromAlias.txt</code> file can be converted to a bigBed format. +This enables efficient searching for sequence names without requiring the entire text file to be +read, which is particularly important for assemblies with large numbers of sequences.</p> +<p> +The Perl script +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/automation/aliasTextToBed.pl" + target="_blank">aliasTextToBed.pl</a> converts the <code>chromAlias.txt</code> file into the +corresponding bed and bigBed files:</p> +<pre> +aliasTextToBed.pl -chromSizes=asmId.chrom.sizes -aliasText=asmId.chromAlias.txt \ + -aliasBed=asmId.chromAlias.bed -aliasAs=asmId.chromAlias.as -aliasBigBed=asmId.chromAlias.bb +</pre> +<p>Inputs:</p> +<ul> + <li><code>chrom.sizes</code> file</li> + <li><code>chromAlias.txt</code> file</li> +</ul> +<p>Outputs:</p> +<ul> + <li><code>chromAlias.bed</code></li> + <li><code>chromAlias.as</code></li> + <li><code>chromAlias.bb</code></li> +</ul> +<p> +Replace the <code>chromAlias</code> setting with the <code>chromAliasBb</code> setting, and specify +the <code>.bb</code> file in the genome stanza of the hub definition:</p> +<pre>chromAliasBb GCF_000001405.39.chromAlias.bb</pre> +<p>This replaces the <code>chromAlias.txt</code> specification.</p> +<p><b>Default Naming Scheme</b></p> +<p>A default naming scheme may be set in the <code>hub.txt</code> file using the +<code>chromAuthority</code> setting:</p> +<pre>chromAuthority ucsc</pre> +<p>In this example, the value <code>ucsc</code> corresponds to the column header from the +<code>chromAlias.txt</code> file. This setting ensures that names in the specified column are +displayed by default in the Genome Browser.</p> + + <a id="groupsTxt"></a> <h3>groups.txt</h3> <p>The <b>groups.txt</b> file defines the grouping of track controls under the Genome Browser graphic display.</p> <p>Example:</p> <pre> name map label Mapping priority 2 defaultIsClosed 0 </pre> + <ul> <li>The <b>name</b> setting is used in the trackDb.txt file to associate specific tracks with a group.</li> <li>The <b>label</b> setting specifies the title of the group in the genome browser. By default, groups are sorted alphabetically based on the label.</li> <li>The <b>priority</b> setting dictates the display order of the track groups, with lower numbers shown first.</li> <li>The <b>defaultIsClosed</b> setting controls whether the group is initially expanded or collapsed (0 for expanded, 1 for collapsed).</li> </ul> <p>Refer to the <a href="/goldenPath/help/hgTrackHubHelp.html#Group" target="_blank">Adding Groups to a Track hub</a> section of the Track Hubs help page for more details.</p> <a id="singleFileHub"></a>