db04a602ccfef6857085f2a227eefc44e6051044
mspeir
  Wed Mar 4 18:40:53 2026 -0800
adding new genome browser glossary page, refs #35259

diff --git docs/genomeBrowserGlossary.md docs/genomeBrowserGlossary.md
new file mode 100644
index 00000000000..399acc65698
--- /dev/null
+++ docs/genomeBrowserGlossary.md
@@ -0,0 +1,486 @@
+---
+title: "UCSC Genome Browser Glossary"
+---
+
+A comprehensive reference guide to terminology used on the UCSC Genome Browser.
+
+This page covers the following topics:
+
+- [Genome Assemblies and Nomenclature](#genome-assemblies-and-nomenclature)
+    - [Popular Genome Assemblies](#popular-genome-assemblies)
+- [Core Tools](#core-tools)
+- [Browser Interface and Interaction](#browser-interface-and-interaction)
+    - [Main Display Elements](#main-display-elements)
+    - [Navigation Controls](#navigation-controls)
+    - [Mouse Interactions](#mouse-interactions)
+    - [Position and Search](#position-and-search)
+    - [Configuration and Settings](#configuration-and-search)
+    - [Views, Output, and Export](#views-output-and-export)
+- [Tracks and Display](#tracks-and-display)
+- [User Data Features](#user-data-features)
+- [Data Formats](#data-formats)
+- [Genome Browser Data and Annotations](#genome-browser-data-and-annotations)
+    - [Gene Annotations](#gene-annotations)
+    - [Conservation and Comparative Genomics](#conservation-and-comparative-genomics)
+    - [Variants and Clinical Data](#variants-and-clinical-data)
+    - [Regulatory and Functional Data](#regulatory-and-functional-data)
+- [Technical Terms](#technical-terms)
+
+## Genome Assemblies and Nomenclature
+
+**Assembly**: A genome assembly is the complete genome sequence produced after
+chromosomes have been fragmented, sequenced, and computationally reassembled.
+Assemblies are updated when new sequence data fills gaps or improved algorithms
+produce better results. Find supported assemblies from the 
+[gateway page](/cgi-bin/hgGateway) or request new ones from
+our [assembly search page](/assemblySearch.html).
+
+**[GenArk](https://hgdownload.gi.ucsc.edu/hubs/)**: UCSC's Genome Archive
+containing thousands of additional genome assemblies beyond the main featured
+assemblies.
+
+**Chromosome Coordinates**: Genomic positions specified as chromosome name and
+base position (e.g., `chr7:155,799,529-155,812,871`). UCSC uses zero-based,
+half-open coordinates in its databases.
+
+
+### Popular Genome Assemblies
+
+**[hg19 (GRCh37)](/cgi-bin/hgTracks?db=hg19)**: The February 2009 human
+reference genome assembly from the Genome Reference Consortium. Still widely
+used for legacy datasets and clinical annotations.
+
+**[hg38 (GRCh38)](/cgi-bin/hgTracks?db=hg38)**: The December 2013 human
+reference genome assembly, the current standard for most new human genomics
+work. Contains improved sequence accuracy and gap filling compared to hg19.
+
+**[hs1 (T2T-CHM13)](/cgi-bin/hgTracks?db=hub_3671779_hs1)**: The
+telomere-to-telomere human genome assembly released in 2022, representing the
+first complete, gapless sequence of a human genome including centromeres and
+other previously unresolved regions.
+
+**[mm10 (GRCm38)](/cgi-bin/hgTracks?db=mm10)**: The December 2011 mouse
+reference genome assembly from the Genome Reference Consortium.
+
+**[mm39 (GRCm39)](/cgi-bin/hgTracks?db=mm39)**: The June 2020 mouse reference
+genome assembly, the current standard for mouse genomics.
+
+
+## Core Tools
+
+**[Genome Browser](/cgi-bin/hgTracks)**: The main visualization tool that
+displays any portion of a genome at any scale with aligned annotation tracks
+showing genes, regulatory elements, conservation, variants, and other genomic
+features.
+
+**[BLAT (BLAST-Like Alignment Tool)](/cgi-bin/hgBlat)**: A rapid sequence
+alignment tool developed by Jim Kent for finding sequence matches in genomes.
+Faster than BLAST for closely related sequences and useful for locating
+mRNA/EST alignments.
+
+**[Table Browser](/cgi-bin/hgTables)**: A web interface for querying,
+filtering, and downloading data from the underlying MySQL databases. Allows
+intersection of data tables and export in multiple formats.
+
+**[LiftOver](/cgi-bin/hgLiftOver)**: A tool for converting genomic coordinates
+between different genome assemblies (e.g., hg19 to hg38). Requires chain files
+that map regions between assemblies.
+
+**[In-Silico PCR](/cgi-bin/hgPcr)**: A tool for virtually testing PCR primer
+pairs against a genome to verify specificity and predict amplicon locations.
+
+**[Variant Annotation Integrator](/cgi-bin/hgVai)**: A tool for annotating
+genomic variants using multiple data sources to predict functional effects.
+
+**[Data Integrator](/cgi-bin/hgIntegrator)**: A tool for intersecting and
+combining data from multiple annotation tracks simultaneously.
+
+
+## Browser Interface and Interaction
+
+### Main Display Elements
+
+**Browser Graphic / Tracks Image**: The main visualization area displaying the
+genome and all visible annotation tracks. The image is interactive and supports
+mouse-based navigation.
+
+**Base Position Track / Ruler**: The coordinate ruler at the top of the browser
+graphic showing the genomic position scale. Clicking and dragging on the ruler
+activates the drag-and-select zoom feature.
+
+**Chromosome Ideogram**: A graphical representation of the entire chromosome
+shown above the browser graphic (for assemblies with cytological banding data).
+A red box indicates the currently viewed region's location on the chromosome.
+
+**Scale Bar**: A reference bar in the center of the browser graphic showing the
+current viewing scale in bases, kilobases, or megabases.
+
+**Track Label (Long Label)**: The descriptive text displayed at the left edge
+of each track in the browser graphic (e.g., "GENCODE V41 Comprehensive
+Transcript Annotation").
+
+**Short Label**: The abbreviated track name shown in the track controls section
+below the browser graphic.
+
+**Track Control / Visibility Menu**: The drop-down menus below the browser
+graphic that control each track's display mode (`hide`, `dense`, `squish`,
+`pack`, `full`).
+
+**Minibutton**: The small gray button to the left of each displayed track.
+Clicking it opens the track's configuration/settings page.
+
+**Track Groups**: Categories that organize related tracks together below the
+browser graphic (e.g., "Genes and Gene Predictions," "Mapping and Sequencing,"
+"Regulation").
+
+### Navigation Controls
+
+**Position/Search Box**: The text field at the top of the page where you enter
+coordinates, gene names, accession numbers, rsIDs, HGVS terms, or DNA sequences
+to navigate to specific locations.
+
+**Zoom Buttons**: Controls above and below the browser graphic for zooming in
+(`1.5x`, `3x`, `10x`, `base`) or out (`1.5x`, `3x`, `10x`, `100x`) on the
+current view.
+
+**Move/Pan Buttons**: Arrow buttons for shifting the view left or right along
+the chromosome while maintaining the current zoom level.
+
+**Reverse Button**: Flips the browser display to show the negative strand (3'
+to 5') instead of the default forward strand (5' to 3').
+
+**Next/Prev Item Navigation**: Gray double-headed arrows that appear at the ends
+of track items (when enabled in configuration) allowing you to jump to the next or
+previous feature in that track.
+
+**Keyboard Shortcuts**: Many Genome Browser interactions can be activated using
+keyboard shortcuts (e.g. "vd" to view DNA sequence of current window). See all
+keyboard shortcuts by typing "?".
+
+### Mouse Interactions
+
+**Right-Click Context Menu**: A context-sensitive menu that appears when you
+right-click on any item in the browser graphic. Options include zooming to the
+full item, highlighting, getting DNA sequence, viewing the details page, and
+accessing track configuration.
+
+**Click on Item**: Clicking on a feature (gene, SNP, etc.) in the browser
+graphic opens its details page with comprehensive information and external
+links.
+
+**Drag-and-Reorder**: Click and drag tracks vertically to rearrange their
+display order in the browser graphic.
+
+**Drag-and-Scroll (Pan)**: Click and drag anywhere on the browser graphic
+(except the ruler) to scroll the view horizontally left or right.
+
+**Drag-and-Select (Drag-and-Zoom)**: Click and drag on the ruler/base position
+track to select a region, then choose to zoom into that region. Hold Shift
+while dragging elsewhere on the image to activate this feature outside the
+ruler.
+
+**Highlight**: A colored vertical band that can be added to mark regions of
+interest. Created via the drag-and-select popup menu or right-click menu.
+Multiple highlights can be added with different colors.
+
+### Position and Search
+
+**Position/Search box**: Text entry box at the top of the main genome genome browser image. Accepts
+positions or one of a variety of search terms, including gene names, rsIDs, short sequences and
+[various other terms](/goldenPath/help/query.html).
+
+**Autocomplete**: For assemblies with gene annotations, the position search box
+offers autocomplete suggestions as you type gene symbols.
+
+**[Track Search](/cgi-bin/hgTracks?hgt_tSearch=track+search)**: A feature for
+finding tracks by searching their names, descriptions, and metadata. Accessed
+via the Genome Browser menu or a button below the graphic.
+
+### Configuration and Settings
+
+**[Configure Button](/cgi-bin/hgTracks?hgTracksConfigPage=configure)**: Opens
+the Track Configuration page where you can adjust global display settings
+including image width, text size, font, and gridlines.
+
+**Track Settings Page**: The detailed configuration page for an individual
+track, accessed by clicking the track's minibutton or name. Allows filtering,
+coloring, and display customization.
+
+**Default Tracks Button**: Resets all track visibility settings to their
+default states for the current assembly.
+
+**Hide All Button**: Sets all tracks to hidden, clearing the display.
+
+**Image Width**: A configurable setting (in pixels) controlling the horizontal
+size of the browser graphic. Larger widths show more genomic territory without
+scrolling.
+
+**Gridlines**: Optional light blue vertical lines in the browser graphic that
+help align features across tracks. Can be toggled on/off in configuration.
+
+**Reset All User Settings**: Under top navigation menu "Genome Browser", clears all customizations
+including track visibility, custom tracks, and hubs, returning the browser
+to its original default state.
+
+### Views, Output, and Export
+
+**Recommended Track Sets**: Under top navigation menu "Genome Browser". Allows
+users to enable a set of recommended tracks for tasks such as clinical variant
+evaluation.
+
+**View Menu**: A top navigation menu providing options like viewing DNA
+sequence, converting coordinates to other assemblies, and accessing
+PDF/PostScript output.
+
+**Get DNA**: A feature to retrieve the genomic DNA sequence for the current
+viewing region or for a specific track item. Accessible via the View menu or
+right-click context menu.
+
+**PDF/PS Output**: Options under the View menu to generate publication-quality
+vector graphics of the browser display.
+
+## Tracks and Display
+
+**Track**: A horizontal row in the Genome Browser display showing a specific
+type of annotation data (e.g., genes, SNPs, conservation scores). Each track
+can be configured for different display modes.
+
+**Track Group**: A set of related tracks grouped together under the main track
+image, e.g. "Mapping and Sequencing" or "Comparative Genomics".
+
+### Display Modes
+
+| Mode | Description |
+|------|-------------|
+| `hide` | Track is not displayed |
+| `dense` | All features collapsed into a single line |
+| `squish` | Features shown at reduced height |
+| `pack` | Features shown at full height, labeled when space permits |
+| `full` | Features shown at full height with all labels |
+
+
+**Composite Track**: A container that groups related tracks together (e.g.,
+RNA-seq replicates), allowing them to be managed collectively. Indicated in the
+track groups by a folder icon.
+
+**MultiWig**: A special composite display mode that overlays multiple
+wiggle-format data tracks in a single graphical area. See, for example,
+the "Layered H3K4Me1" track under the "ENCODE Regulation" supertrack.
+
+**Supertrack**: A higher-level grouping of composite tracks or individual
+tracks into a collapsible folder structure. Indicated in the
+track groups by a folder icon.
+
+## User Data Features
+
+**[Custom Tracks](/cgi-bin/hgCustom)**: User-uploaded annotation data displayed
+temporarily in the browser (expires after 48 hours of inactivity unless saved
+in a session). See [custom track
+documentation](/goldenPath/help/customTrack.html) to learn how to load your
+custom tracks and accepted formats.
+
+**[Track Hub](/cgi-bin/hgHubConnect#unlistedHubs)**: A collection of remotely hosted
+annotation files that can be connected to the browser via a `hub.txt`
+configuration file. Provides more stable and configurable data display than
+custom tracks. Will show up as its own group under the main genome browser image.
+See our [hub basics page](/docs/hubs/hubBasics.html) for
+help creating your own or our [track hub
+documentation](/goldenPath/help/hgTrackHubHelp.html) for a full description
+of the format.
+
+**[Assembly Hub](/goldenPath/help/assemblyHubHelp.html)**: A track hub that
+includes a custom genome assembly (in twoBit format) along with annotation
+tracks.
+
+**[Public Hub](/cgi-bin/hgHubConnect#publicHubs)**: A track or assembly hub
+provided by an external group. Will show up as its own group under the main
+genome browser image. Questions about track data should be directed to the hub
+maintainers, whoe email address can be found on the description page for 
+any track in the hub. Public hubs are required to me a set of 
+[guidelines](/goldenPath/help/publicHubGuidelines.html) and are reviewed by 
+Genome Browser staff before being added to the list.
+
+**[Hub Space/Hub Upload](/cgi-bin/hgHubConnect#hubUpload)**: The UCSC Genome
+Browser provides up to 10 GB of space for those with Genome Browser accounts
+to store custom track and hub data.
+
+**[Hub Development](/cgi-bin/hgHubConnect#hubDeveloper)**: Configuration settings
+useful when developing a new hub. Provides an interface for checking a hub for
+configuration issues.
+
+**[Sessions](/cgi-bin/hgSession)**: A saved snapshot of browser configuration
+including track visibility settings, position, custom tracks, and hubs. Can be shared
+via URL. See [sessions documentation](/goldenPath/help/hgSessionHelp.html).
+
+**[Public Sessions](/cgi-bin/hgPublicSessions)**: User-created sessions made
+publicly available for others to view. 
+
+## Data Formats
+
+**[BED (Browser Extensible Data)](/FAQ/FAQformat.html#format1)**: A
+tab-delimited format for defining genomic regions. Minimum 3 columns
+(chromosome, start, end); can extend to 12+ columns including name, score,
+strand, and exon structure.
+
+**[bigBed](/goldenPath/help/bigBed.html)**: A compressed, indexed binary
+version of BED format enabling efficient random access for large datasets.
+Custom AutoSQL (.as) files allow it to be extended to any number of columns
+containing item details, sequence, tables, and more.
+
+**[WIG (Wiggle)](/goldenPath/help/wiggle.html)**: A format for
+continuous-valued data displayed as graphs (e.g., conservation scores, read
+coverage).
+
+**[bigWig](/goldenPath/help/bigWig.html)**: A compressed, indexed binary
+version of WIG format for large continuous data tracks.
+
+**[bedGraph](/goldenPath/help/bedgraph.html)**: A format for displaying
+continuous data where each line specifies a chromosome region and associated
+value. Similar to WIG but preserves original data on export.
+
+**[BAM (Binary Alignment/Map)](/goldenPath/help/bam.html)**: A compressed
+binary format for storing sequence alignment data. Requires a separate `.bai`
+index file.
+
+**[CRAM](/goldenPath/help/cram.html)**: A more compressed alternative to BAM
+that references an external genome sequence file.
+
+**[VCF (Variant Call Format)](/goldenPath/help/vcf.html)**: A standard format
+for storing genetic variant data including SNPs, insertions, and deletions.
+
+**[PSL](/FAQ/FAQformat.html#format2)**: A format for storing sequence
+alignments, commonly used for BLAT output and mRNA/EST alignments.
+
+**[MAF (Multiple Alignment Format)](/FAQ/FAQformat.html#format5)**: A format
+for storing multiple sequence alignments across species.
+
+**[twoBit](/FAQ/FAQformat.html#format7)**: An efficient binary format for
+storing genomic sequence data.
+
+See our [format page](/FAQ/FAQformat.html) for a full listing of track and data types. 
+
+## Genome Browser Data and Annotations
+### Gene Annotations
+
+**[GENCODE](https://www.gencodegenes.org/)**: The reference gene annotation for
+human and mouse genomes, combining manual curation with computational
+predictions. Includes protein-coding genes, non-coding RNAs, and pseudogenes.
+
+**[RefSeq](https://www.ncbi.nlm.nih.gov/refseq/)**: NCBI's curated collection
+of reference sequences for genes, transcripts, and proteins.
+
+**[Ensembl Genes](https://www.ensembl.org/)**: Gene predictions from the
+Ensembl project, available for many species.
+
+**UCSC Genes**: UCSC's gene track built by integrating data from RefSeq and GenBank
+among other sources with extensive metadata and external database links. Now retired
+and replaced by GENCODE genes. 
+
+**Exon**: A coding or untranslated region of a gene that is retained in the
+mature mRNA after splicing. Displayed as thick boxes in gene tracks.
+
+**Intron**: A region within a gene that is removed during RNA splicing.
+Displayed as thin lines connecting exons. Chevrons indicate direction of
+transcription.
+
+**UTR (Untranslated Region)**: Portions of mRNA at the 5' and 3' ends that do
+not code for protein. Displayed as half-height boxes in gene tracks.
+
+**CDS (Coding Sequence)**: The portion of a gene or mRNA that codes for
+protein, from start codon to stop codon.
+
+
+### Conservation and Comparative Genomics
+
+**[phastCons](/goldenPath/help/phastCons.html)**: A conservation scoring method
+that calculates the probability that each base is in an evolutionarily
+conserved element, using a phylogenetic hidden Markov model. Scores range from
+0 to 1. Typically found alongside phyloP scores and a Multiz multiple alignment.
+
+**phyloP**: A conservation scoring method that measures evolutionary rates at
+individual bases compared to a neutral model. Positive scores indicate
+conservation; negative scores indicate faster-than-expected evolution.
+Typically found alongside phastCons scores and a Multiz multiple alignment.
+
+**Multiz**: An algorithm for creating multiple genome alignments from pairwise
+alignments. Subsequent multiple alignments are displayed in the Genome Browser
+in MAF format. Typically found alongside phastCons and phyloP scores.
+
+**[Chain](/goldenPath/help/chain.html)**: A series of gapless aligned blocks between two genomes, representing
+alignable regions.
+
+**[Net](/goldenPath/help/net.html)**: A hierarchical arrangement of chains representing syntenic (same
+genomic context) alignments between genomes, with the highest-scoring chains
+filling each region. More details about net construction can be found in
+[this FAQ](/FAQ/FAQtracks#tracks24).
+
+**Conservation Track**: A composite track displaying multiple species alignments and conservation scores (phastCons and phyloP) computed from those alignments.
+
+
+### Variants and Clinical Data
+
+**SNP (Single Nucleotide Polymorphism)**: A single base position where
+different alleles exist in a population.
+
+**[dbSNP](https://www.ncbi.nlm.nih.gov/snp/)**: NCBI's database of genetic
+variation, displayed as SNP tracks in the browser.
+
+**rsID**: A reference SNP identifier from dbSNP (e.g., `rs12345`).
+
+**[ClinVar](https://www.ncbi.nlm.nih.gov/clinvar/)**: NCBI's database of
+clinically significant genetic variants and their relationship to disease.
+
+**[gnomAD (Genome Aggregation Database)](https://gnomad.broadinstitute.org/)**:
+A resource of exome and genome sequencing data from large populations,
+providing allele frequencies.
+
+**HGVS Nomenclature**: A standardized system for describing sequence variants
+(e.g., `NM_004006.2:c.4375C>T`). Accepted in the position/search box.
+
+### Regulatory and Functional Data
+
+**[ENCODE (Encyclopedia of DNA Elements)](/ENCODE/)**: A consortium project
+identifying all functional elements in the human genome, including regulatory
+regions.
+
+**cCRE (Candidate Cis-Regulatory Element)**: Regions identified by ENCODE as
+potential regulatory elements based on epigenomic data.
+
+**DNase Hypersensitivity**: Regions of open chromatin accessible to DNase I
+enzyme, indicating potential regulatory activity.
+
+**ChIP-seq**: Chromatin immunoprecipitation followed by sequencing, used to
+identify protein-DNA interactions.
+
+**CpG Islands**: Genomic regions with high frequency of CpG dinucleotides, often
+found near gene promoters.
+
+**[GTEx (Genotype-Tissue Expression)](/gtex.html)**: A project providing gene
+expression data across multiple human tissues.
+
+**[FANTOM5](https://fantom.gsc.riken.jp/5)**: A project mapping transcription
+start sites and promoter activity across cell types and tissues.
+
+## Technical Terms
+
+**Byte-Range Requests**: HTTP feature required for hosting bigBed, bigWig, and
+BAM files, allowing the browser to fetch only the portion of a file needed for
+the current view.
+
+**MariaDb/MySQL**: The relational database system underlying the Genome
+Browser's data storage.
+
+**[REST API](/goldenPath/help/api.html)**: A programming interface for
+retrieving Genome Browser data in JSON format.
+
+**[trackDb](/goldenPath/help/trackDb/trackDbHub.html)**: A configuration file
+(`trackDb.txt`) that defines track properties in a track hub, including display
+settings, colors, and metadata.
+
+**AutoSql**: A schema definition format used to describe custom fields in
+Genome Browser tables and bigBed files.
+
+**hubCheck**: A command-line utility for validating track hub configuration files.
+Available from our
+[download server](https://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads.)
+