e9f1b6e4a64d031a574ec74e19b37b2309fd019c jnavarr5 Tue Jun 23 14:00:08 2026 -0700 Announcing the non-canonical ORFs track for hg38, refs #35101 diff --git src/hg/htdocs/goldenPath/newsarch.html src/hg/htdocs/goldenPath/newsarch.html index 0a44fd8f6d3..0d6deedc184 100755 --- src/hg/htdocs/goldenPath/newsarch.html +++ src/hg/htdocs/goldenPath/newsarch.html @@ -52,30 +52,127 @@
You can sign-up to get these announcements via our Genome-announce email list. We send around one short announcement email every two weeks.
Smaller software changes are not announced here. A summary of the three-weekly release changes can be found here. For the full list of our daily code changes head to our GitHub page. Lastly, see our credits page for acknowledgments of the data we host.
+ ++We are pleased to announce a new +Non-canonical +ORFs track collection on the human genome assembly (GRCh38/hg38), +bringing together several public databases of open reading frames (ORFs) that +fall outside of the annotated protein-coding genes. While the human genome has +roughly 20,000 annotated protein-coding genes, ribosome profiling (Ribo-seq) and +proteomics have revealed widespread translation of ORFs in regions long +considered non-coding, including 5' and 3' UTRs, long non-coding RNAs, +pseudogenes, and alternative reading frames of known genes. +
+ ++These non-canonical ORFs include upstream ORFs (uORFs) in 5' UTRs, which can +regulate translation of the downstream coding sequence; small ORFs (sORFs), +generally under 100 codons, many of which produce functional micropeptides; +downstream ORFs (dORFs) in 3' UTRs; out-of-frame ORFs that overlap known coding +sequence in an alternative frame; and ORFs in transcripts annotated as +non-coding RNAs or pseudogenes. The collection gathers the following datasets as +individual subtracks: +
+ ++Every ORF in every subtrack is annotated with the strength of its Kozak +sequence, the sequence context around the start codon that governs how +efficiently translation initiates. Features are colored by a categorical Kozak +label: +
++Each subtrack offers filters for start codon, Kozak strength, and a numeric Kozak +translational-efficiency score, along with dataset-specific filters such as ORF +type and evidence category. +
+ ++See the +Non-canonical ORFs +collection page and the individual subtrack description pages for per-dataset +methods, item counts, download URLs, and references. +
+ ++We would like to thank the data providers who made these resources publicly +available: Xiaolei Zhang, Nicola Whiffin, and the UTRannotator team at Imperial +College London; Jonathan Mudge, Jorge Ruiz-Orera, John Prensner, Sebastiaan van +Heesch, and the GENCODE / TransCODE consortium; Matthieu Chaldebas and the +5ULTRA team; Tamara Ouspenskaia, Travis Law, Karl Clauser, and colleagues at the +Broad Institute of MIT and Harvard for nuORFdb; the MetamORF team at the TAGC +laboratory, Aix-Marseille University; and Xavier Roucou and the OpenProt team at +the Université de Sherbrooke. We also thank Eric Malekos (UCSC) for +suggesting nuORFdb, and the VuTR authors (Whiffin lab) for the Kozak-strength +implementation. Finally, we would like to thank Max Haeussler and Jairo Navarro +for the creation and release of these UCSC Genome Browser tracks. +
+We are pleased to announce a new MPRAs container track on the human genome assembly (GRCh38/hg38), gathering results from massively parallel reporter assays (MPRAs). MPRAs are high-throughput methods that measure the regulatory activity of thousands of candidate DNA sequences in parallel by linking each fragment to a barcoded reporter gene and quantifying the resulting reporter RNA.
The container brings together two complementary data