add669d23a4b18c14371a461ac9ab766e18db54f gperez2 Sun Sep 14 16:10:11 2025 -0700 Adding chromAlias and chromAuthority settings to the Assembly Hub User Guide, refs #33014 diff --git src/hg/htdocs/goldenPath/help/assemblyHubHelp.html src/hg/htdocs/goldenPath/help/assemblyHubHelp.html index ce79fca0e8c..1dbe7943eaf 100755 --- src/hg/htdocs/goldenPath/help/assemblyHubHelp.html +++ src/hg/htdocs/goldenPath/help/assemblyHubHelp.html @@ -21,30 +21,31 @@ to the NCBI Assembly system, it may already be available in the UCSC Genome Browser. Please check the GenArk Assembly Hub collection to see if your genome of interest is already available. If it is not listed there, you can use the UCSC Assembly Request page to request that the genome assembly be added.

Contents

Web Server
Assembly Hub Components
Linking to Your Assembly Hub
Building Tracks
Assembly Hub Resources
@@ -186,42 +187,143 @@
 twoBitInfo ricCom1.2bit stdout | sort -k2rn > ricCom1.chrom.sizes
 

The .2bit file can also be hosted at a URL:

 twoBitInfo -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout | sort -k2nr > ricCom1.chrom.sizes
 

To extract sequences from a .2bit file:

 twoBitToFa -seq=chrCp -udcDir=https://genome.ucsc.edu/goldenPath/help/examples/hubExamples/hubPlants/cshl2013/ricCom1/ricCom1.2bit stdout > ricCom1.chrCp.fa
 
+ + +

chromAlias

+ +

+The chromAlias setting enables the Genome Browser to automatically convert chromosome +names in submitted custom track data from alternate naming schemes to the names used in the +assembly. The chromAlias setting uses a chromAlias.txt file. This +functionality applies to both custom track data and assembly hub data.

+ +

chromAlias.txt Format

+

+The first line of the chromAlias.txt file begins with a pound symbol (#) +followed by a blank space. Each subsequent word on this line, separated by tab characters, +specifies the source authority for the sequence names in that column. The first column contains the +sequence names used in the Genome Browser assembly, while the subsequent columns provide alternate +naming schemes.

+ +

+All lines following the header line consist of columns of sequence names separated by a tab +character. If no equivalent name exists in a particular naming scheme, the column remains empty, +resulting in two adjacent tab characters.


+

Example:

+
+# ucsc  assembly        genbank ncbi    refseq  ensembl
+chr1    1       CM000663.2      1       NC_000001.11    1
+chr10   10      CM000672.2      10      NC_000010.11    10
+chrM    MT      J01415.2        MT      NC_012920.1     MT
+chrX    X       CM000685.2      X       NC_000023.11    X
+
+ +

In this example, the columns represent:

+ +

Assembly Hub Usage

+

To use the chromAlias.txt file in an assembly hub, add the following line to the +genome stanza of the hub.txt file:

+
chromAlias thisGenome.chromAlias.txt
+

This is a relative path reference from the hub.txt file.

+

Example genome stanza:

+
+genome GCF_000001405.39
+taxId 9606
+groups groups.txt
+description human
+twoBitPath GCF_000001405.39.2bit
+twoBitBptUrl GCF_000001405.39.2bit.bpt
+chromSizes GCF_000001405.39.chrom.sizes.txt
+chromAlias GCF_000001405.39.chromAlias.txt
+organism human
+defaultPos chr1:82985474-82995474
+scientificName Homo sapiens
+htmlPath html/GCF_000001405.39_GRCh38.p13.description.html
+
+ +

Best Performance

+

+For improved performance, the chromAlias.txt file can be converted to a bigBed format. +This enables efficient searching for sequence names without requiring the entire text file to be +read, which is particularly important for assemblies with large numbers of sequences.

+

+The Perl script +aliasTextToBed.pl converts the chromAlias.txt file into the +corresponding bed and bigBed files:

+
+aliasTextToBed.pl -chromSizes=asmId.chrom.sizes -aliasText=asmId.chromAlias.txt \
+   -aliasBed=asmId.chromAlias.bed -aliasAs=asmId.chromAlias.as -aliasBigBed=asmId.chromAlias.bb
+
+

Inputs:

+ +

Outputs:

+ +

+Replace the chromAlias setting with the chromAliasBb setting, and specify +the .bb file in the genome stanza of the hub definition:

+
chromAliasBb GCF_000001405.39.chromAlias.bb
+

This replaces the chromAlias.txt specification.

+

Default Naming Scheme

+

A default naming scheme may be set in the hub.txt file using the +chromAuthority setting:

+
chromAuthority ucsc
+

In this example, the value ucsc corresponds to the column header from the +chromAlias.txt file. This setting ensures that names in the specified column are +displayed by default in the Genome Browser.

+ +

groups.txt

The groups.txt file defines the grouping of track controls under the Genome Browser graphic display.

Example:

 name map
 label Mapping
 priority 2
 defaultIsClosed 0
 
+

Refer to the Adding Groups to a Track hub section of the Track Hubs help page for more details.