3647084be920c90718eafc5b419ca2992ae65817 lrnassar Thu Mar 5 10:37:37 2026 -0800 Adding additional terms to the glossary page, refs #35259 diff --git docs/genomeBrowserGlossary.md docs/genomeBrowserGlossary.md index 399acc65698..6c0e0aa5d3f 100644 --- docs/genomeBrowserGlossary.md +++ docs/genomeBrowserGlossary.md @@ -2,59 +2,77 @@ title: "UCSC Genome Browser Glossary" --- A comprehensive reference guide to terminology used on the UCSC Genome Browser. This page covers the following topics: - [Genome Assemblies and Nomenclature](#genome-assemblies-and-nomenclature) - [Popular Genome Assemblies](#popular-genome-assemblies) - [Core Tools](#core-tools) - [Browser Interface and Interaction](#browser-interface-and-interaction) - [Main Display Elements](#main-display-elements) - [Navigation Controls](#navigation-controls) - [Mouse Interactions](#mouse-interactions) - [Position and Search](#position-and-search) - - [Configuration and Settings](#configuration-and-search) + - [Configuration and Settings](#configuration-and-settings) - [Views, Output, and Export](#views-output-and-export) - [Tracks and Display](#tracks-and-display) - [User Data Features](#user-data-features) - [Data Formats](#data-formats) - [Genome Browser Data and Annotations](#genome-browser-data-and-annotations) - [Gene Annotations](#gene-annotations) + - [Mapping, Sequencing, and Repeats](#mapping-sequencing-and-repeats) - [Conservation and Comparative Genomics](#conservation-and-comparative-genomics) - [Variants and Clinical Data](#variants-and-clinical-data) - [Regulatory and Functional Data](#regulatory-and-functional-data) +- [Table Browser](#table-browser) - [Technical Terms](#technical-terms) ## Genome Assemblies and Nomenclature **Assembly**: A genome assembly is the complete genome sequence produced after chromosomes have been fragmented, sequenced, and computationally reassembled. Assemblies are updated when new sequence data fills gaps or improved algorithms produce better results. Find supported assemblies from the [gateway page](/cgi-bin/hgGateway) or request new ones from our [assembly search page](/assemblySearch.html). **[GenArk](https://hgdownload.gi.ucsc.edu/hubs/)**: UCSC's Genome Archive containing thousands of additional genome assemblies beyond the main featured assemblies. **Chromosome Coordinates**: Genomic positions specified as chromosome name and base position (e.g., `chr7:155,799,529-155,812,871`). UCSC uses zero-based, half-open coordinates in its databases. +**Scaffold / Contig**: Intermediate sequence units used in genome assembly. +A contig is a contiguous stretch of assembled sequence with no gaps, while a +scaffold is an ordered set of contigs joined by estimated gap lengths. In +assemblies that are not fully resolved into chromosomes, sequences may be +named as scaffolds (e.g., `scaffold10671`) rather than chromosomes. In +chromosome-based assemblies, unplaced scaffolds appear as sequences like +`chrUn_gl000220` and unlocalized scaffolds (known chromosome, unknown +position) appear as `chr1_gl000191_random`. + +**Haplotype / Alternate Sequence**: Alternative versions of specific genomic +regions representing common structural variation between individuals. In the +Genome Browser, these appear as sequences with `_hap` or `_alt` suffixes +(e.g., `chr6_cox_hap2`, `chr1_KI270762v1_alt`). Alternate sequences can be +viewed in chromosomal context using +[Multi-Region mode](/goldenPath/help/multiRegionHelp.html). + ### Popular Genome Assemblies **[hg19 (GRCh37)](/cgi-bin/hgTracks?db=hg19)**: The February 2009 human reference genome assembly from the Genome Reference Consortium. Still widely used for legacy datasets and clinical annotations. **[hg38 (GRCh38)](/cgi-bin/hgTracks?db=hg38)**: The December 2013 human reference genome assembly, the current standard for most new human genomics work. Contains improved sequence accuracy and gap filling compared to hg19. **[hs1 (T2T-CHM13)](/cgi-bin/hgTracks?db=hub_3671779_hs1)**: The telomere-to-telomere human genome assembly released in 2022, representing the first complete, gapless sequence of a human genome including centromeres and other previously unresolved regions. @@ -173,31 +191,31 @@ **Drag-and-Scroll (Pan)**: Click and drag anywhere on the browser graphic (except the ruler) to scroll the view horizontally left or right. **Drag-and-Select (Drag-and-Zoom)**: Click and drag on the ruler/base position track to select a region, then choose to zoom into that region. Hold Shift while dragging elsewhere on the image to activate this feature outside the ruler. **Highlight**: A colored vertical band that can be added to mark regions of interest. Created via the drag-and-select popup menu or right-click menu. Multiple highlights can be added with different colors. ### Position and Search -**Position/Search box**: Text entry box at the top of the main genome genome browser image. Accepts +**Position/Search Box**: Text entry box at the top of the main Genome Browser image. Accepts positions or one of a variety of search terms, including gene names, rsIDs, short sequences and [various other terms](/goldenPath/help/query.html). **Autocomplete**: For assemblies with gene annotations, the position search box offers autocomplete suggestions as you type gene symbols. **[Track Search](/cgi-bin/hgTracks?hgt_tSearch=track+search)**: A feature for finding tracks by searching their names, descriptions, and metadata. Accessed via the Genome Browser menu or a button below the graphic. ### Configuration and Settings **[Configure Button](/cgi-bin/hgTracks?hgTracksConfigPage=configure)**: Opens the Track Configuration page where you can adjust global display settings including image width, text size, font, and gridlines. @@ -236,30 +254,62 @@ viewing region or for a specific track item. Accessible via the View menu or right-click context menu. **PDF/PS Output**: Options under the View menu to generate publication-quality vector graphics of the browser display. ## Tracks and Display **Track**: A horizontal row in the Genome Browser display showing a specific type of annotation data (e.g., genes, SNPs, conservation scores). Each track can be configured for different display modes. **Track Group**: A set of related tracks grouped together under the main track image, e.g. "Mapping and Sequencing" or "Comparative Genomics". +**Transcript / Isoform**: A transcript is a single RNA molecule produced from +a gene. Many genes produce multiple transcripts (isoforms) through +alternative splicing, alternative promoters, or alternative polyadenylation. +In the browser, each isoform is drawn as a separate line within a gene +track, which is why a single gene may appear as multiple stacked items. + +**Strand (+ / -)**: The orientation of a genomic feature relative to the +reference sequence. The positive (+) strand reads 5' to 3' left to right; +the negative (-) strand reads 3' to 5'. In gene tracks, chevrons (arrows) +within intron lines indicate the direction of transcription. + +**Details Page**: The information page that opens when you click on an item +in the browser graphic. Displays feature-specific data such as genomic +coordinates, strand, score, and links to external databases. The content +varies by track type. + +**[Multi-Region Mode](/goldenPath/help/multiRegionHelp.html)**: A display +mode that shows non-contiguous genomic regions side by side. Options include +exon-only view (hiding introns), gene-only view (hiding intergenic regions), +and custom regions defined by a BED file. Also supports viewing alternate +haplotype sequences in chromosomal context. Accessible from the View menu. + +**[Track Collection Builder](/cgi-bin/hgCollection)**: A tool for combining +multiple wiggle-type tracks (bigWig, bedGraph) from native browser data, +custom tracks, or track hubs into a single configurable composite. Supports +overlay methods including transparent, stacked, add, and subtract. + +**Filtering**: Track-level configuration that limits the displayed items to +those matching specified criteria such as score thresholds, name patterns, or +field values. Filter settings are available on many track settings pages and +persist across sessions. + ### Display Modes | Mode | Description | |------|-------------| | `hide` | Track is not displayed | | `dense` | All features collapsed into a single line | | `squish` | Features shown at reduced height | | `pack` | Features shown at full height, labeled when space permits | | `full` | Features shown at full height with all labels | **Composite Track**: A container that groups related tracks together (e.g., RNA-seq replicates), allowing them to be managed collectively. Indicated in the track groups by a folder icon. @@ -283,32 +333,32 @@ annotation files that can be connected to the browser via a `hub.txt` configuration file. Provides more stable and configurable data display than custom tracks. Will show up as its own group under the main genome browser image. See our [hub basics page](/docs/hubs/hubBasics.html) for help creating your own or our [track hub documentation](/goldenPath/help/hgTrackHubHelp.html) for a full description of the format. **[Assembly Hub](/goldenPath/help/assemblyHubHelp.html)**: A track hub that includes a custom genome assembly (in twoBit format) along with annotation tracks. **[Public Hub](/cgi-bin/hgHubConnect#publicHubs)**: A track or assembly hub provided by an external group. Will show up as its own group under the main genome browser image. Questions about track data should be directed to the hub -maintainers, whoe email address can be found on the description page for -any track in the hub. Public hubs are required to me a set of +maintainers, whose email address can be found on the description page for +any track in the hub. Public hubs are required to meet a set of [guidelines](/goldenPath/help/publicHubGuidelines.html) and are reviewed by Genome Browser staff before being added to the list. **[Hub Space/Hub Upload](/cgi-bin/hgHubConnect#hubUpload)**: The UCSC Genome Browser provides up to 10 GB of space for those with Genome Browser accounts to store custom track and hub data. **[Hub Development](/cgi-bin/hgHubConnect#hubDeveloper)**: Configuration settings useful when developing a new hub. Provides an interface for checking a hub for configuration issues. **[Sessions](/cgi-bin/hgSession)**: A saved snapshot of browser configuration including track visibility settings, position, custom tracks, and hubs. Can be shared via URL. See [sessions documentation](/goldenPath/help/hgSessionHelp.html). @@ -339,33 +389,51 @@ value. Similar to WIG but preserves original data on export. **[BAM (Binary Alignment/Map)](/goldenPath/help/bam.html)**: A compressed binary format for storing sequence alignment data. Requires a separate `.bai` index file. **[CRAM](/goldenPath/help/cram.html)**: A more compressed alternative to BAM that references an external genome sequence file. **[VCF (Variant Call Format)](/goldenPath/help/vcf.html)**: A standard format for storing genetic variant data including SNPs, insertions, and deletions. **[PSL](/FAQ/FAQformat.html#format2)**: A format for storing sequence alignments, commonly used for BLAT output and mRNA/EST alignments. +**[GenePred](/FAQ/FAQformat.html#format9)**: A table format used to represent +gene prediction and transcript structure data. Fields include transcript +name, chromosome, strand, transcription start/end, coding region start/end, +exon count, and exon coordinates. An extended version (genePredExt) adds +gene name and coding region status fields. + **[MAF (Multiple Alignment Format)](/FAQ/FAQformat.html#format5)**: A format for storing multiple sequence alignments across species. +**[interact / bigInteract](/goldenPath/help/interact.html)**: A format for +displaying pairwise interactions between genomic regions, drawn as arcs or +half-rectangles connecting two loci. Suitable for chromatin interaction data +such as ChIA-PET and promoter-enhancer links. The bigInteract binary version +is used for track hubs. + +**[HAL (Hierarchical Alignment Format)](/FAQ/FAQformat.html#format12)**: A +graph-based binary format for storing multiple genome alignments organized +according to a phylogenetic tree. Unlike MAF, HAL allows reference-free +querying with respect to any genome in the alignment. Native output format of +the Progressive Cactus alignment pipeline. + **[twoBit](/FAQ/FAQformat.html#format7)**: An efficient binary format for storing genomic sequence data. See our [format page](/FAQ/FAQformat.html) for a full listing of track and data types. ## Genome Browser Data and Annotations ### Gene Annotations **[GENCODE](https://www.gencodegenes.org/)**: The reference gene annotation for human and mouse genomes, combining manual curation with computational predictions. Includes protein-coding genes, non-coding RNAs, and pseudogenes. **[RefSeq](https://www.ncbi.nlm.nih.gov/refseq/)**: NCBI's curated collection of reference sequences for genes, transcripts, and proteins. @@ -405,30 +473,55 @@ **Multiz**: An algorithm for creating multiple genome alignments from pairwise alignments. Subsequent multiple alignments are displayed in the Genome Browser in MAF format. Typically found alongside phastCons and phyloP scores. **[Chain](/goldenPath/help/chain.html)**: A series of gapless aligned blocks between two genomes, representing alignable regions. **[Net](/goldenPath/help/net.html)**: A hierarchical arrangement of chains representing syntenic (same genomic context) alignments between genomes, with the highest-scoring chains filling each region. More details about net construction can be found in [this FAQ](/FAQ/FAQtracks#tracks24). **Conservation Track**: A composite track displaying multiple species alignments and conservation scores (phastCons and phyloP) computed from those alignments. +### Mapping, Sequencing, and Repeats + +**[RepeatMasker](http://www.repeatmasker.org/)**: A program that screens DNA +sequences for interspersed repeats and low-complexity regions. The +RepeatMasker track is one of the most prominent default tracks, displaying +repeat classes including SINEs, LINEs, LTR elements, DNA transposons, simple +repeats, and satellites. Items are color-coded by repeat class and shaded by +divergence from the repeat consensus. + +**GC Percent**: A track showing the percentage of guanine (G) and cytosine +(C) bases across the genome, calculated in fixed-size windows. Regions with +higher GC content are drawn more darkly. High GC content is generally +associated with gene-rich areas of the genome. + +**Mappability**: Tracks indicating how uniquely short sequences (k-mers) at +each position can be mapped back to the genome. Regions with low mappability +contain repetitive sequences where sequencing reads cannot be confidently +placed, which is important for interpreting read coverage and variant calls. + +**Synteny**: Conservation of gene order and genomic organization between +species. Syntenic regions share a common ancestral arrangement. The concept +is central to the Chain and Net comparative genomics tracks, where Net +tracks specifically display the highest-scoring syntenic alignments between +two genomes. + ### Variants and Clinical Data **SNP (Single Nucleotide Polymorphism)**: A single base position where different alleles exist in a population. **[dbSNP](https://www.ncbi.nlm.nih.gov/snp/)**: NCBI's database of genetic variation, displayed as SNP tracks in the browser. **rsID**: A reference SNP identifier from dbSNP (e.g., `rs12345`). **[ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/)**: NCBI's database of clinically significant genetic variants and their relationship to disease. **[gnomAD (Genome Aggregation Database)](https://gnomad.broadinstitute.org/)**: A resource of exome and genome sequencing data from large populations, @@ -449,38 +542,52 @@ **DNase Hypersensitivity**: Regions of open chromatin accessible to DNase I enzyme, indicating potential regulatory activity. **ChIP-seq**: Chromatin immunoprecipitation followed by sequencing, used to identify protein-DNA interactions. **CpG Islands**: Genomic regions with high frequency of CpG dinucleotides, often found near gene promoters. **[GTEx (Genotype-Tissue Expression)](/gtex.html)**: A project providing gene expression data across multiple human tissues. **[FANTOM5](https://fantom.gsc.riken.jp/5)**: A project mapping transcription start sites and promoter activity across cell types and tissues. +## Table Browser + +**[Intersection](/goldenPath/help/hgTablesHelp.html#Intersection)**: A Table +Browser feature that combines data from two tracks by finding overlapping +genomic regions. For example, intersecting a gene track with a conservation +track returns only the genes that overlap conserved elements. Supports both +simple (two-table) and multiple intersection modes. + +**Data Format Description (Schema)**: A page describing the structure of a +track's underlying data — its columns, data types, and example values. Found +via the "Data schema/format description and download" link on track +description pages, or via the "describe table schema" button in the Table +Browser. Also provides a download link for the dataset. + ## Technical Terms **Byte-Range Requests**: HTTP feature required for hosting bigBed, bigWig, and BAM files, allowing the browser to fetch only the portion of a file needed for the current view. **MariaDb/MySQL**: The relational database system underlying the Genome Browser's data storage. **[REST API](/goldenPath/help/api.html)**: A programming interface for retrieving Genome Browser data in JSON format. **[trackDb](/goldenPath/help/trackDb/trackDbHub.html)**: A configuration file (`trackDb.txt`) that defines track properties in a track hub, including display settings, colors, and metadata. **AutoSql**: A schema definition format used to describe custom fields in Genome Browser tables and bigBed files. **hubCheck**: A command-line utility for validating track hub configuration files. Available from our -[download server](https://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads.) +[download server](https://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads).