55e909c0e98fb50a5cd761f1ce2cb52f9089f5f4 max Tue Jun 2 03:05:59 2026 -0700 [Claude] ncOrfs: add 5ULTRA uORFs subtrack (MANE Select, 22,567 features) Adds fiveUltraUorfs, a new subtrack under the ncOrfs supertrack showing 22,567 ATG-initiated uORFs in MANE Select transcripts from the 5ULTRA pipeline (Chaldebas et al., Am J Hum Genet 2026, PMID 41881026). Features are colored by uORF type (Okabe-Ito palette), have exon/intron structure projected from MANE via addIntrons.py, and carry gene, rank, and Kozak strength as extra bigBed fields. ncOrfs.html summary table updated to include the new track. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> refs #37580 diff --git src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html new file mode 100644 index 00000000000..9527065f7fe --- /dev/null +++ src/hg/makeDb/trackDb/human/hg38/fiveUltraUorfs.html @@ -0,0 +1,150 @@ +<h2>Description</h2> + +<p> +This track shows 22,567 upstream open reading frames (uORFs) in the 5' untranslated regions (5' UTRs) +of human protein-coding genes, compiled as part of the +<a href="https://github.com/mchaldebas/5ULTRA" target="_blank">5ULTRA</a> pipeline for annotating +5' UTR variants. The uORFs are defined on +<a href="../cgi-bin/hgTrackUi?db=hg38&g=mane">MANE Select</a> transcripts, +which provide a single well-supported, clinically relevant transcript per gene matched between +Ensembl/GENCODE and RefSeq. Only ATG-initiated uORFs are included. +</p> + +<p> +uORFs are short open reading frames found upstream of the main protein-coding sequence. When a +ribosome scans from the 5' cap, it may translate a uORF before reaching the main start codon, +which often reduces production of the downstream protein. Genetic variants that create, disrupt, +or alter uORFs can therefore change protein output and contribute to disease, particularly when they +affect genes where tight translational control is critical. +</p> + +<p> +Each uORF is classified into one of three types based on the position of its stop codon relative +to the main CDS start: +</p> +<ul> + <li>A <b>non-overlapping</b> uORF has its stop codon upstream of the main CDS start. After + the ribosome finishes translating the uORF it may re-initiate at the CDS or disengage, + reducing overall protein output.</li> + <li>An <b>overlapping</b> uORF has its stop codon within the CDS but in a different reading + frame. The ribosome traverses the CDS start codon without recognizing it, which prevents + initiation of the main protein and typically causes a stronger inhibitory effect.</li> + <li>An <b>N-terminal extension</b> is an upstream ATG that is in-frame with the main CDS and + has no intervening stop codon. The resulting protein has an extended N-terminal sequence. + Variants that convert an overlapping uORF into an N-terminal extension change the protein + product rather than merely suppressing translation.</li> +</ul> + +<h2>Display Conventions and Configuration</h2> + +<p> +Items are colored by Kozak consensus strength, using the same color scheme as all other subtracks +in this collection: +</p> +<p> +<span style="display:inline-block; background-color:#F5A623; width:18px; height:12px; vertical-align:middle;"></span> <b>Strong</b> – A/G at position −3 and G at position +4<br> +<span style="display:inline-block; background-color:#5B9BD5; width:18px; height:12px; vertical-align:middle;"></span> <b>Moderate</b> – only one of those two positions matches<br> +<span style="display:inline-block; background-color:#A9A9A9; width:18px; height:12px; vertical-align:middle;"></span> <b>Weak</b> – neither position matches<br> +<span style="display:inline-block; background-color:#D3D3D3; width:18px; height:12px; vertical-align:middle;"></span> <b>no context</b> – Kozak context not available +</p> + +<p> +Because all uORFs in this set are ATG-initiated, the non-ATG category does not apply here. +uORF type (Non-Overlapping, Overlapping, N-terminal extension) is shown in the mouseover and +can be used as a filter. +</p> + +<p> +The exon/intron structure is projected from the overlapping MANE Select transcript so that uORFs +spanning multiple exons are drawn correctly. If no suitable MANE transcript could provide intron +boundaries at the exact uORF endpoints, the GENCODE comprehensive annotation is used as a fallback. +The source transcript ID is recorded in the <b>intronsSource</b> field +(<tt>none</tt> if no host transcript was found in either pool). +</p> + +<p> +Mouseover shows the gene symbol, uORF type, rank within the gene, Kozak strength, and the donor +transcript used for intron structure. The track can be filtered by uORF type and by Kozak strength +(Strong, Moderate, Weak). +</p> + +<h2>Data Access</h2> + +<p> +The data can be explored interactively in table format with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there to +spreadsheet or tab-separated tables. From scripts, the data can be accessed through our +<a href="https://api.genome.ucsc.edu">API</a>, track=<i>fiveUltraUorfs</i>. +</p> + +<p> +For automated download and analysis, the genome annotation is stored in a bigBed file that can be +downloaded from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/fiveUltraUorfs/" +target="_blank">our download server</a>. +The file for this track is called <tt>fiveUltraUorfs.bb</tt>. +Individual regions or the whole genome annotation can be obtained using our tool +<tt>bigBedToBed</tt>, which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool can also be used to obtain features within a given range, e.g. +</p> +<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/ncOrfs/fiveUltraUorfs/fiveUltraUorfs.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt> + +<p> +The original uORF reference set is distributed with the +<a href="https://github.com/mchaldebas/5ULTRA" target="_blank">5ULTRA software package</a> +and can be obtained by installing the package and running <tt>5ULTRA-download-data</tt>. +</p> + +<h2>Methods</h2> + +<p> +Chaldebas et al. compiled a reference set of uORFs from two sources: the Ribo-uORF database +(501,554 translated uORFs supported by 1,495 ribosome profiling datasets) and the uORFdb database +of computationally predicted uORFs. Only ATG-initiated uORFs were retained. +These were mapped to the 5' UTRs of 18,775 MANE Select protein-coding transcripts from +GENCODE v45 basic annotation, yielding 22,567 uORFs. Of these, 8,067 (35.7%) have direct +ribosome profiling support. Each uORF was classified as Non-Overlapping, Overlapping, or +N-terminal extension based on the relationship between its stop codon and the main CDS start. +See Chaldebas et al. 2026 for full details. +</p> + +<p> +The uORF reference BED file was obtained from the +<a href="https://github.com/mchaldebas/5ULTRA" target="_blank">5ULTRA GitHub repository</a> +(installed via <tt>5ULTRA-download-data</tt>; file <tt>uORFs.MANE.hg38.bed</tt>). +Colors were remapped to the Okabe-Ito colorblind-safe palette. To recover exon/intron structure +for uORFs that span introns, each feature was projected onto an overlapping MANE Select transcript +using the <tt>addIntrons.py</tt> script in the kent source tree; a GENCODE v49 comprehensive +bigBed was used as fallback when no MANE candidate could provide intact boundaries at the uORF +endpoints. Of 22,567 uORFs, 3,861 received multi-block exon structure from MANE, 9 from GENCODE +fallback, and 45 overlapped a GENCODE transcript but had no introns within the uORF range; +18,652 remain as single-exon features whose genomic span contains no intron from any known host. +Build steps are recorded in the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/ncOrfs.txt" +target="_blank">makedoc</a>; +the processing scripts are at +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/ncOrfs" +target="_blank">src/hg/makeDb/scripts/ncOrfs/</a>. +</p> + +<h2>Credits</h2> + +<p> +Thanks to Matthieu Chaldebas and the 5ULTRA team for making the uORF reference data +publicly available as part of the 5ULTRA package. +</p> + +<h2>References</h2> + +<p> +Chaldebas M, Ponsin K, Bohlen J, Conil C, Mourelatos H, Stenson PD, Cooper DN, Abel L, Casanova JL, +Cobat A <em>et al</em>. +<a href="https://linkinghub.elsevier.com/retrieve/pii/S0002-9297(26)00106-0" target="_blank"> +Genome-wide detection of human 5' UTR variants that impact protein translation</a>. +<em>Am J Hum Genet</em>. 2026 Apr 2;113(4):809-827. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/41881026" target="_blank">41881026</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13087467/" target="_blank">PMC13087467</a> +</p>