c1623f23f647a7a879d73454e4d9898bc03dc185 mspeir Mon Mar 9 11:18:28 2026 -0700 Tweaks to descriptions for some glossary items, refs #35259 diff --git docs/genomeBrowserGlossary.md docs/genomeBrowserGlossary.md index 6c0e0aa5d3f..cccabcb5a44 100644 --- docs/genomeBrowserGlossary.md +++ docs/genomeBrowserGlossary.md @@ -31,48 +31,55 @@ ## Genome Assemblies and Nomenclature **Assembly**: A genome assembly is the complete genome sequence produced after chromosomes have been fragmented, sequenced, and computationally reassembled. Assemblies are updated when new sequence data fills gaps or improved algorithms produce better results. Find supported assemblies from the [gateway page](/cgi-bin/hgGateway) or request new ones from our [assembly search page](/assemblySearch.html). **[GenArk](https://hgdownload.gi.ucsc.edu/hubs/)**: UCSC's Genome Archive containing thousands of additional genome assemblies beyond the main featured assemblies. **Chromosome Coordinates**: Genomic positions specified as chromosome name and base position (e.g., `chr7:155,799,529-155,812,871`). UCSC uses zero-based, -half-open coordinates in its databases. +half-open coordinates in its databases. See [this blog +post](https://genome-blog.gi.ucsc.edu/blog/2016/12/12/the-ucsc-genome-browser-coordinate-counting-systems) +for details. **Scaffold / Contig**: Intermediate sequence units used in genome assembly. A contig is a contiguous stretch of assembled sequence with no gaps, while a scaffold is an ordered set of contigs joined by estimated gap lengths. In assemblies that are not fully resolved into chromosomes, sequences may be named as scaffolds (e.g., `scaffold10671`) rather than chromosomes. In chromosome-based assemblies, unplaced scaffolds appear as sequences like `chrUn_gl000220` and unlocalized scaffolds (known chromosome, unknown position) appear as `chr1_gl000191_random`. **Haplotype / Alternate Sequence**: Alternative versions of specific genomic regions representing common structural variation between individuals. In the Genome Browser, these appear as sequences with `_hap` or `_alt` suffixes -(e.g., `chr6_cox_hap2`, `chr1_KI270762v1_alt`). Alternate sequences can be -viewed in chromosomal context using +(e.g., `chr6_cox_hap2`, `chr1_KI270762v1_alt`). See our [FAQ](/FAQ/FAQdownloads.html#downloadAlt) +for more details. Alternate sequences can be viewed in chromosomal context using [Multi-Region mode](/goldenPath/help/multiRegionHelp.html). +**Fix Sequences (Fix Patches)**: Patch sequences correct errors or improve +the reference assembly without changing the coordinate system. +In the UCSC Genome Browser, these sequences are identified +by appending `_fix` to their names (e.g., `chr2_KN538362v1_fix`). +See our [FAQ]( /FAQ/FAQdownloads.html#downloadFix) for more details. ### Popular Genome Assemblies **[hg19 (GRCh37)](/cgi-bin/hgTracks?db=hg19)**: The February 2009 human reference genome assembly from the Genome Reference Consortium. Still widely used for legacy datasets and clinical annotations. **[hg38 (GRCh38)](/cgi-bin/hgTracks?db=hg38)**: The December 2013 human reference genome assembly, the current standard for most new human genomics work. Contains improved sequence accuracy and gap filling compared to hg19. **[hs1 (T2T-CHM13)](/cgi-bin/hgTracks?db=hub_3671779_hs1)**: The telomere-to-telomere human genome assembly released in 2022, representing the first complete, gapless sequence of a human genome including centromeres and other previously unresolved regions. @@ -86,94 +93,97 @@ ## Core Tools **[Genome Browser](/cgi-bin/hgTracks)**: The main visualization tool that displays any portion of a genome at any scale with aligned annotation tracks showing genes, regulatory elements, conservation, variants, and other genomic features. **[BLAT (BLAST-Like Alignment Tool)](/cgi-bin/hgBlat)**: A rapid sequence alignment tool developed by Jim Kent for finding sequence matches in genomes. Faster than BLAST for closely related sequences and useful for locating mRNA/EST alignments. **[Table Browser](/cgi-bin/hgTables)**: A web interface for querying, filtering, and downloading data from the underlying MySQL databases. Allows -intersection of data tables and export in multiple formats. +intersection of data tables and export in multiple formats. See +[below](#table-browser) for more related terms, or our +[documentation](/goldenPath/help/hgTablesHelp.html). **[LiftOver](/cgi-bin/hgLiftOver)**: A tool for converting genomic coordinates between different genome assemblies (e.g., hg19 to hg38). Requires chain files that map regions between assemblies. **[In-Silico PCR](/cgi-bin/hgPcr)**: A tool for virtually testing PCR primer pairs against a genome to verify specificity and predict amplicon locations. **[Variant Annotation Integrator](/cgi-bin/hgVai)**: A tool for annotating genomic variants using multiple data sources to predict functional effects. **[Data Integrator](/cgi-bin/hgIntegrator)**: A tool for intersecting and combining data from multiple annotation tracks simultaneously. ## Browser Interface and Interaction ### Main Display Elements **Browser Graphic / Tracks Image**: The main visualization area displaying the genome and all visible annotation tracks. The image is interactive and supports mouse-based navigation. **Base Position Track / Ruler**: The coordinate ruler at the top of the browser graphic showing the genomic position scale. Clicking and dragging on the ruler activates the drag-and-select zoom feature. **Chromosome Ideogram**: A graphical representation of the entire chromosome shown above the browser graphic (for assemblies with cytological banding data). A red box indicates the currently viewed region's location on the chromosome. +Can zoom to regions by dragging-and-selecting a region in the ideogram. **Scale Bar**: A reference bar in the center of the browser graphic showing the current viewing scale in bases, kilobases, or megabases. -**Track Label (Long Label)**: The descriptive text displayed at the left edge -of each track in the browser graphic (e.g., "GENCODE V41 Comprehensive +**Track Label (Long Label)**: The descriptive text displayed above +each track in the browser graphic (e.g., "GENCODE V41 Comprehensive Transcript Annotation"). **Short Label**: The abbreviated track name shown in the track controls section below the browser graphic. **Track Control / Visibility Menu**: The drop-down menus below the browser graphic that control each track's display mode (`hide`, `dense`, `squish`, `pack`, `full`). **Minibutton**: The small gray button to the left of each displayed track. Clicking it opens the track's configuration/settings page. **Track Groups**: Categories that organize related tracks together below the browser graphic (e.g., "Genes and Gene Predictions," "Mapping and Sequencing," "Regulation"). ### Navigation Controls **Position/Search Box**: The text field at the top of the page where you enter coordinates, gene names, accession numbers, rsIDs, HGVS terms, or DNA sequences to navigate to specific locations. **Zoom Buttons**: Controls above and below the browser graphic for zooming in (`1.5x`, `3x`, `10x`, `base`) or out (`1.5x`, `3x`, `10x`, `100x`) on the current view. -**Move/Pan Buttons**: Arrow buttons for shifting the view left or right along +**Move Buttons**: Arrow buttons for shifting the view left or right along the chromosome while maintaining the current zoom level. **Reverse Button**: Flips the browser display to show the negative strand (3' to 5') instead of the default forward strand (5' to 3'). **Next / Prev Item Navigation**: Gray double-headed arrows that appear at the ends of track items (when enabled in configuration) allowing you to jump to the next or previous feature in that track. **Keyboard Shortcuts**: Many Genome Browser interactions can be activated using keyboard shortcuts (e.g. "vd" to view DNA sequence of current window). See all keyboard shortcuts by typing "?". ### Mouse Interactions @@ -226,89 +236,87 @@ **Default Tracks Button**: Resets all track visibility settings to their default states for the current assembly. **Hide All Button**: Sets all tracks to hidden, clearing the display. **Image Width**: A configurable setting (in pixels) controlling the horizontal size of the browser graphic. Larger widths show more genomic territory without scrolling. **Gridlines**: Optional light blue vertical lines in the browser graphic that help align features across tracks. Can be toggled on/off in configuration. **Reset All User Settings**: Under top navigation menu "Genome Browser", clears all customizations including track visibility, custom tracks, and hubs, returning the browser -to its original default state. +to its original default state. Useful when browser configuration seems to be stuck +in a broken state. ### Views, Output, and Export **Recommended Track Sets**: Under top navigation menu "Genome Browser". Allows users to enable a set of recommended tracks for tasks such as clinical variant evaluation. **View Menu**: A top navigation menu providing options like viewing DNA sequence, converting coordinates to other assemblies, and accessing PDF/PostScript output. **Get DNA**: A feature to retrieve the genomic DNA sequence for the current viewing region or for a specific track item. Accessible via the View menu or right-click context menu. **PDF/PS Output**: Options under the View menu to generate publication-quality vector graphics of the browser display. ## Tracks and Display **Track**: A horizontal row in the Genome Browser display showing a specific type of annotation data (e.g., genes, SNPs, conservation scores). Each track can be configured for different display modes. **Track Group**: A set of related tracks grouped together under the main track image, e.g. "Mapping and Sequencing" or "Comparative Genomics". -**Transcript / Isoform**: A transcript is a single RNA molecule produced from -a gene. Many genes produce multiple transcripts (isoforms) through -alternative splicing, alternative promoters, or alternative polyadenylation. -In the browser, each isoform is drawn as a separate line within a gene -track, which is why a single gene may appear as multiple stacked items. - **Strand (+ / -)**: The orientation of a genomic feature relative to the reference sequence. The positive (+) strand reads 5' to 3' left to right; the negative (-) strand reads 3' to 5'. In gene tracks, chevrons (arrows) within intron lines indicate the direction of transcription. **Details Page**: The information page that opens when you click on an item in the browser graphic. Displays feature-specific data such as genomic coordinates, strand, score, and links to external databases. The content varies by track type. **[Multi-Region Mode](/goldenPath/help/multiRegionHelp.html)**: A display mode that shows non-contiguous genomic regions side by side. Options include exon-only view (hiding introns), gene-only view (hiding intergenic regions), and custom regions defined by a BED file. Also supports viewing alternate -haplotype sequences in chromosomal context. Accessible from the View menu. +haplotype sequences in chromosomal context. Accessible from the View menu +or button next to position box. **[Track Collection Builder](/cgi-bin/hgCollection)**: A tool for combining multiple wiggle-type tracks (bigWig, bedGraph) from native browser data, custom tracks, or track hubs into a single configurable composite. Supports -overlay methods including transparent, stacked, add, and subtract. +overlay methods including transparent, stacked, add, and subtract. Accessible +from the My Data menu. **Filtering**: Track-level configuration that limits the displayed items to those matching specified criteria such as score thresholds, name patterns, or field values. Filter settings are available on many track settings pages and -persist across sessions. +persist across sessions. Click the minibutton or "Configure" from the +right-click menu to see available filter options for a specific track. ### Display Modes | Mode | Description | |------|-------------| | `hide` | Track is not displayed | | `dense` | All features collapsed into a single line | | `squish` | Features shown at reduced height | | `pack` | Features shown at full height, labeled when space permits | | `full` | Features shown at full height with all labels | **Composite Track**: A container that groups related tracks together (e.g., RNA-seq replicates), allowing them to be managed collectively. Indicated in the track groups by a folder icon. @@ -432,30 +440,36 @@ **[GENCODE](https://www.gencodegenes.org/)**: The reference gene annotation for human and mouse genomes, combining manual curation with computational predictions. Includes protein-coding genes, non-coding RNAs, and pseudogenes. **[RefSeq](https://www.ncbi.nlm.nih.gov/refseq/)**: NCBI's curated collection of reference sequences for genes, transcripts, and proteins. **[Ensembl Genes](https://www.ensembl.org/)**: Gene predictions from the Ensembl project, available for many species. **UCSC Genes**: UCSC's gene track built by integrating data from RefSeq and GenBank among other sources with extensive metadata and external database links. Now retired and replaced by GENCODE genes. +**Transcript / Isoform**: A transcript is a single RNA molecule produced from +a gene. Many genes produce multiple transcripts (isoforms) through +alternative splicing, alternative promoters, or alternative polyadenylation. +In the browser, each isoform is drawn as a separate line within a gene +track, which is why a single gene may appear as multiple stacked items. + **Exon**: A coding or untranslated region of a gene that is retained in the mature mRNA after splicing. Displayed as thick boxes in gene tracks. **Intron**: A region within a gene that is removed during RNA splicing. Displayed as thin lines connecting exons. Chevrons indicate direction of transcription. **UTR (Untranslated Region)**: Portions of mRNA at the 5' and 3' ends that do not code for protein. Displayed as half-height boxes in gene tracks. **CDS (Coding Sequence)**: The portion of a gene or mRNA that codes for protein, from start codon to stop codon. ### Conservation and Comparative Genomics @@ -578,16 +592,20 @@ Browser's data storage. **[REST API](/goldenPath/help/api.html)**: A programming interface for retrieving Genome Browser data in JSON format. **[trackDb](/goldenPath/help/trackDb/trackDbHub.html)**: A configuration file (`trackDb.txt`) that defines track properties in a track hub, including display settings, colors, and metadata. **AutoSql**: A schema definition format used to describe custom fields in Genome Browser tables and bigBed files. **hubCheck**: A command-line utility for validating track hub configuration files. Available from our [download server](https://hgdownload.gi.ucsc.edu/downloads.html#utilities_downloads). +See the [hubCheck +documentation](/goldenPath/help/hgTrackHubHelp.html#Compatibility) and related +[blog post](https://genome-blog.gi.ucsc.edu/blog/how-portable-is-your-track-hub-use-hubcheck-to-find-out/). +