add669d23a4b18c14371a461ac9ab766e18db54f
gperez2
  Sun Sep 14 16:10:11 2025 -0700
Adding chromAlias and chromAuthority settings to the Assembly Hub User Guide, refs #33014

diff --git src/hg/htdocs/goldenPath/help/assemblyHubHelp.html src/hg/htdocs/goldenPath/help/assemblyHubHelp.html
index ce79fca0e8c..1dbe7943eaf 100755
--- src/hg/htdocs/goldenPath/help/assemblyHubHelp.html
+++ src/hg/htdocs/goldenPath/help/assemblyHubHelp.html
@@ -21,30 +21,31 @@
 to the <a href="https://www.ncbi.nlm.nih.gov/datasets/genome/">NCBI Assembly</a> system, it may
 already be available in the <a href="https://genome.ucsc.edu">UCSC Genome Browser</a>.
 Please check the <a href="https://hgdownload.soe.ucsc.edu/hubs/">GenArk Assembly Hub</a> collection
 to see if your genome of interest is already available. If it is not listed there, you can use the
 <a href="/assemblyRequest">UCSC Assembly Request</a> page to request that the genome assembly be
 added.</p>
 
 
 <h2>Contents</h2>
 <h6><a href="#webServer">Web Server</a></h6>
 <h6><a href="#hubTxt">Assembly Hub Components</a></h6>
     <ul style="margin-left: 20px;">
         <li><a href="#hubTxt">hub.txt</a></li>
 	<li><a href="#genomesTxt">genomes.txt</a></li>
 	<li><a href="#twoBitFile">2bit File</a></li>
+	<li><a href="#chromAlias">chromAlias</a></li>
         <li><a href="#groupsTxt">groups.txt</a></li>
 	<li><a href="#singleFileHub">Single-File Track Hub</a></li> 
     </ul>
 <h6><a href="#linkingHub">Linking to Your Assembly Hub</a></h6>
 <h6><a href="#buildingTracks">Building Tracks</a></h6>
 <ul style="margin-left: 20px;">
     <li><a href="#cytobandTrack">Cyotoband Track</a></li>
 </ul>
 <h6><a href="#assemblyHubResources">Assembly Hub Resources</a></h6>
 <ul style="margin-left: 20px;">
     <li><a href="#gOnRamp">G-OnRamp</a></li>
     <li><a href="#makeHub">MakeHub</a></li>
     <li><a href="#exampleNcbiAssemblyHubs">Example NCBI Assembly Hubs</a></li>   
         </ul>
     </li>
@@ -186,42 +187,143 @@
 <pre>
 twoBitInfo ricCom1.2bit stdout | sort -k2rn &gt; ricCom1.chrom.sizes
 </pre>
 <p>
 The <em>.2bit</em> file can also be hosted at a URL:</p>
 <pre>
 twoBitInfo -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout | sort -k2nr &gt; ricCom1.chrom.sizes
 </pre>
 <p>
 To extract sequences from a <em>.2bit</em> file:
 </p>
 <pre>
 twoBitToFa -seq=chrCp -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout &gt; ricCom1.chrCp.fa
 </pre>
 
+
+<a id="chromAlias"></a>
+<h3>chromAlias</h3>
+
+<p>
+The <code>chromAlias</code> setting enables the Genome Browser to automatically convert chromosome
+names in submitted custom track data from alternate naming schemes to the names used in the
+assembly. The <code>chromAlias</code> setting uses a <code>chromAlias.txt</code> file. This
+functionality applies to both custom track data and assembly hub data.</p>
+
+<p><b>chromAlias.txt Format</b></p>
+<p>
+The first line of the <code>chromAlias.txt</code> file begins with a pound symbol (<code>#</code>)
+followed by a blank space. Each subsequent word on this line, separated by tab characters,
+specifies the source authority for the sequence names in that column. The first column contains the
+sequence names used in the Genome Browser assembly, while the subsequent columns provide alternate
+naming schemes.</p>
+
+<p>
+All lines following the header line consist of columns of sequence names separated by a tab
+character. If no equivalent name exists in a particular naming scheme, the column remains empty,
+resulting in two adjacent tab characters.</p><br>
+<p>Example:</p>
+<pre>
+# ucsc  assembly        genbank ncbi    refseq  ensembl
+chr1    1       CM000663.2      1       NC_000001.11    1
+chr10   10      CM000672.2      10      NC_000010.11    10
+chrM    MT      J01415.2        MT      NC_012920.1     MT
+chrX    X       CM000685.2      X       NC_000023.11    X
+</pre>
+
+<p>In this example, the columns represent:</p>
+<ul>
+	<li><b>ucsc</b> -  UCSC-style <code>chrN</code> names</li>
+	<li><b>assembly</b> - names from the NCBI assembly_report.txt</code> file</li>
+	<li><b>genbank</b> - INSDC names</li>
+	<li><b>ncbi</b> - names from the <code>chr2acc</code> file in the <code>assembly_structure/</code> hierarchy</li>
+	<li><b>refseq</b> - names from RefSeq annotations</li>
+	<li><b>ensembl</b> - names from the Ensembl assembly</li>
+</ul>
+<p><b>Assembly Hub Usage</b></p>
+<p>To use the <code>chromAlias.txt</code> file in an assembly hub, add the following line to the
+genome stanza of the hub.txt file:</p>
+<pre>chromAlias thisGenome.chromAlias.txt</pre>
+<p>This is a relative path reference from the <code>hub.txt</code> file.</p>
+<p>Example genome stanza:</p>
+<pre>
+genome GCF_000001405.39
+taxId 9606
+groups groups.txt
+description human
+twoBitPath GCF_000001405.39.2bit
+twoBitBptUrl GCF_000001405.39.2bit.bpt
+chromSizes GCF_000001405.39.chrom.sizes.txt
+chromAlias GCF_000001405.39.chromAlias.txt
+organism human
+defaultPos chr1:82985474-82995474
+scientificName Homo sapiens
+htmlPath html/GCF_000001405.39_GRCh38.p13.description.html
+</pre>
+
+<p><b>Best Performance</b></p>
+<p>
+For improved performance, the <code>chromAlias.txt</code> file can be converted to a bigBed format.
+This enables efficient searching for sequence names without requiring the entire text file to be
+read, which is particularly important for assemblies with large numbers of sequences.</p>
+<p>
+The Perl script
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/utils/automation/aliasTextToBed.pl"
+	target="_blank">aliasTextToBed.pl</a> converts the <code>chromAlias.txt</code> file into the
+corresponding bed and bigBed files:</p>
+<pre>
+aliasTextToBed.pl -chromSizes=asmId.chrom.sizes -aliasText=asmId.chromAlias.txt \
+   -aliasBed=asmId.chromAlias.bed -aliasAs=asmId.chromAlias.as -aliasBigBed=asmId.chromAlias.bb
+</pre>
+<p>Inputs:</p>
+<ul>
+	<li><code>chrom.sizes</code> file</li>
+	<li><code>chromAlias.txt</code> file</li>
+</ul>
+<p>Outputs:</p>
+<ul>
+        <li><code>chromAlias.bed</code></li>
+        <li><code>chromAlias.as</code></li>
+	<li><code>chromAlias.bb</code></li>
+</ul>
+<p>
+Replace the <code>chromAlias</code> setting with the <code>chromAliasBb</code> setting, and specify
+the <code>.bb</code> file in the genome stanza of the hub definition:</p>
+<pre>chromAliasBb GCF_000001405.39.chromAlias.bb</pre>
+<p>This replaces the <code>chromAlias.txt</code> specification.</p>
+<p><b>Default Naming Scheme</b></p>
+<p>A default naming scheme may be set in the <code>hub.txt</code> file using the
+<code>chromAuthority</code> setting:</p>
+<pre>chromAuthority ucsc</pre>
+<p>In this example, the value <code>ucsc</code> corresponds to the column header from the
+<code>chromAlias.txt</code> file. This setting ensures that names in the specified column are
+displayed by default in the Genome Browser.</p>
+
+
 <a id="groupsTxt"></a>
 <h3>groups.txt</h3>
 <p>The <b>groups.txt</b> file defines the grouping of track controls under the Genome Browser graphic
 display.</p>
 <p>Example:</p>
 <pre>
 name map
 label Mapping
 priority 2
 defaultIsClosed 0
 </pre>
 
+
 <ul>
    <li>The <b>name</b> setting is used in the trackDb.txt file to associate specific tracks with a
        group.</li>
    <li>The <b>label</b> setting specifies the title of the group in the genome browser. By default,
        groups are sorted alphabetically based on the label.</li>
    <li>The <b>priority</b> setting dictates the display order of the track groups, with lower
        numbers shown first.</li>
    <li>The <b>defaultIsClosed</b> setting controls whether the group is initially expanded or
        collapsed (0 for expanded, 1 for collapsed).</li>
 </ul>
 <p>Refer to the <a href="/goldenPath/help/hgTrackHubHelp.html#Group"
 target="_blank">Adding Groups to a Track hub</a> section of the Track Hubs help page for more
 details.</p>
 
 <a id="singleFileHub"></a>