55e909c0e98fb50a5cd761f1ce2cb52f9089f5f4 max Tue Jun 2 03:05:59 2026 -0700 [Claude] ncOrfs: add 5ULTRA uORFs subtrack (MANE Select, 22,567 features) Adds fiveUltraUorfs, a new subtrack under the ncOrfs supertrack showing 22,567 ATG-initiated uORFs in MANE Select transcripts from the 5ULTRA pipeline (Chaldebas et al., Am J Hum Genet 2026, PMID 41881026). Features are colored by uORF type (Okabe-Ito palette), have exon/intron structure projected from MANE via addIntrons.py, and carry gene, rank, and Kozak strength as extra bigBed fields. ncOrfs.html summary table updated to include the new track. Co-Authored-By: Claude Sonnet 4.6 refs #37580 diff --git src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html new file mode 100644 index 00000000000..9527065f7fe --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html @@ -0,0 +1,150 @@ +

Description

+ +

+This track shows 22,567 upstream open reading frames (uORFs) in the 5' untranslated regions (5' UTRs) +of human protein-coding genes, compiled as part of the +5ULTRA pipeline for annotating +5' UTR variants. The uORFs are defined on +MANE Select transcripts, +which provide a single well-supported, clinically relevant transcript per gene matched between +Ensembl/GENCODE and RefSeq. Only ATG-initiated uORFs are included. +

+ +

+uORFs are short open reading frames found upstream of the main protein-coding sequence. When a +ribosome scans from the 5' cap, it may translate a uORF before reaching the main start codon, +which often reduces production of the downstream protein. Genetic variants that create, disrupt, +or alter uORFs can therefore change protein output and contribute to disease, particularly when they +affect genes where tight translational control is critical. +

+ +

+Each uORF is classified into one of three types based on the position of its stop codon relative +to the main CDS start: +

+ + +

Display Conventions and Configuration

+ +

+Items are colored by Kozak consensus strength, using the same color scheme as all other subtracks +in this collection: +

+

+ Strong – A/G at position −3 and G at position +4
+ Moderate – only one of those two positions matches
+ Weak – neither position matches
+ no context – Kozak context not available +

+ +

+Because all uORFs in this set are ATG-initiated, the non-ATG category does not apply here. +uORF type (Non-Overlapping, Overlapping, N-terminal extension) is shown in the mouseover and +can be used as a filter. +

+ +

+The exon/intron structure is projected from the overlapping MANE Select transcript so that uORFs +spanning multiple exons are drawn correctly. If no suitable MANE transcript could provide intron +boundaries at the exact uORF endpoints, the GENCODE comprehensive annotation is used as a fallback. +The source transcript ID is recorded in the intronsSource field +(none if no host transcript was found in either pool). +

+ +

+Mouseover shows the gene symbol, uORF type, rank within the gene, Kozak strength, and the donor +transcript used for intron structure. The track can be filtered by uORF type and by Kozak strength +(Strong, Moderate, Weak). +

+ +

Data Access

+ +

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator and exported from there to +spreadsheet or tab-separated tables. From scripts, the data can be accessed through our +API, track=fiveUltraUorfs. +

+ +

+For automated download and analysis, the genome annotation is stored in a bigBed file that can be +downloaded from +our download server. +The file for this track is called fiveUltraUorfs.bb. +Individual regions or the whole genome annotation can be obtained using our tool +bigBedToBed, which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +here. +The tool can also be used to obtain features within a given range, e.g. +

+bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/fiveUltraUorfs/fiveUltraUorfs.bb -chrom=chr21 -start=0 -end=100000000 stdout + +

+The original uORF reference set is distributed with the +5ULTRA software package +and can be obtained by installing the package and running 5ULTRA-download-data. +

+ +

Methods

+ +

+Chaldebas et al. compiled a reference set of uORFs from two sources: the Ribo-uORF database +(501,554 translated uORFs supported by 1,495 ribosome profiling datasets) and the uORFdb database +of computationally predicted uORFs. Only ATG-initiated uORFs were retained. +These were mapped to the 5' UTRs of 18,775 MANE Select protein-coding transcripts from +GENCODE v45 basic annotation, yielding 22,567 uORFs. Of these, 8,067 (35.7%) have direct +ribosome profiling support. Each uORF was classified as Non-Overlapping, Overlapping, or +N-terminal extension based on the relationship between its stop codon and the main CDS start. +See Chaldebas et al. 2026 for full details. +

+ +

+The uORF reference BED file was obtained from the +5ULTRA GitHub repository +(installed via 5ULTRA-download-data; file uORFs.MANE.hg38.bed). +Colors were remapped to the Okabe-Ito colorblind-safe palette. To recover exon/intron structure +for uORFs that span introns, each feature was projected onto an overlapping MANE Select transcript +using the addIntrons.py script in the kent source tree; a GENCODE v49 comprehensive +bigBed was used as fallback when no MANE candidate could provide intact boundaries at the uORF +endpoints. Of 22,567 uORFs, 3,861 received multi-block exon structure from MANE, 9 from GENCODE +fallback, and 45 overlapped a GENCODE transcript but had no introns within the uORF range; +18,652 remain as single-exon features whose genomic span contains no intron from any known host. +Build steps are recorded in the +makedoc; +the processing scripts are at +src/hg/makeDb/scripts/ncOrfs/. +

+ +

Credits

+ +

+Thanks to Matthieu Chaldebas and the 5ULTRA team for making the uORF reference data +publicly available as part of the 5ULTRA package. +

+ +

References

+ +

+Chaldebas M, Ponsin K, Bohlen J, Conil C, Mourelatos H, Stenson PD, Cooper DN, Abel L, Casanova JL, +Cobat A et al. + +Genome-wide detection of human 5' UTR variants that impact protein translation. +Am J Hum Genet. 2026 Apr 2;113(4):809-827. +PMID: 41881026; PMC: PMC13087467 +