66604eefdbb0971ed43ee81bd6f18cc3a409e304 markd Sat Jan 10 13:18:44 2026 -0800 color-code clsLongReadRna modles (#36908) diff --git src/hg/makeDb/trackDb/mouse/mm10/clsLongReadRna.html src/hg/makeDb/trackDb/mouse/mm10/clsLongReadRna.html index c1ef1be9e84..8f75d75869d 100644 --- src/hg/makeDb/trackDb/mouse/mm10/clsLongReadRna.html +++ src/hg/makeDb/trackDb/mouse/mm10/clsLongReadRna.html @@ -1,107 +1,122 @@ <h2>Description</h2> <p> These tracks represent the results of targeted long-read RNA sequencing aimed at identifying lowly expressed lncRNAs in adult and embryonic tissues. The track consists of capture target regions, mappings of pre- and post-capture reads, and transcript models built from the data. </p> <p> Portions of this dataset were used to develop the lncRNA annotations - introduced in GENCODE v47. The data are a superset of the data incorporated + introduced in GENCODE VM36. The data are a superset of the data incorporated into GENCODE. The transcript models for a given RNA do not necessarily match those in GENCODE and are provided as a guide to exploring the sequencing data. + Note that GENCODE VM36 is released on GRCm39/mm39 while these tracks are on + GRCm38/mm10. </p> <p> Detailed descriptions of the data are available at the <a href="https://github.com/guigolab/CLS3_GENCODE" target="_blank">GENCODE CLS Project</a> site.</p> <h2>Display Conventions and Configuration</h2> <p> This is a multi-view composite track containing multiple data types (views). Each view includes subtracks that are displayed individually in the browser. Instructions for configuring multi-view tracks are <a href="/goldenPath/help/multiView.html" target="_blank">here</a>.<br><br> <b>Views:</b><br> <ul> <li><b>Targets:</b> Capture target regions</li> <li><b>Models:</b> Transcript models generated from reads and merging</li> <li><b>Sample models:</b> Transcript models by sample in which they were observed </li> <li><b>Per-experiment reads:</b> Read mappings per experiment</li> <li><b>Per-experiment Models:</b> Transcript models generated from the experiments</li> </ul></p> +<p><b>Model Color Coding</b> <br> +<p> +Model annotations are color-coded based on their incorporation into GENCODE VM36 +and the assigned GENCODE VM36 BioType. Note that VM36 is not on the mm10 assembly. +</p> +<ul> + <li style="color: rgb(12,12,120);"><b>coding</b></li> + <li style="color: rgb(0,100,0);"><b>non-coding</b></li> + <li style="color: rgb(255,51,255);"><b>pseudogene</b></li> + <li style="color: rgb(254,0,0);"><b>to be experimentally confirmed (TEC)</b></li> + <li style="color: rgb(255,160,122);"><b>Not incorporated into GENCODE V47</b></li> +</ul> + <h2>Methods</h2> <p> This project, led by the <a href="https://www.gencodegenes.org/" target="_blank">GENCODE consortium</a>, employed the Capture Long-read Sequencing (CLS) protocol to enrich transcripts from targeted genomic regions. It used a large capture array with orthologous probes in human and mouse genomes, targeting non-GENCODE lncRNA annotations and regions suspected of unannotated transcription. CapTrap-Seq, a cDNA library preparation protocol, was used to enrich for full-length RNA molecules (5′ to 3′). </p> <p> Matched adult and embryonic tissues from human and mouse were selected to maximize transcriptome complexity. Libraries were sequenced pre- and post-capture using PacBio and Oxford Nanopore Technologies (ONT) long-read platforms, as well as short-read technologies. </p> <p> Transcript isoform models were built from reads using the LyRic analysis software. These were merged using intron chains, with transcription start and end sites anchored using CAGE and poly(A) data. </p> <p> Data and metadata is discoverable via Array Express entry <a href="https://www.ebi.ac.uk/biostudies/ArrayExpress/studies/E-MTAB-14562" target="_blank=">E-MTAB-14562</a> </p> <h2>Credits</h2> <p> This dataset was developed by the <a href="https://www.crg.eu/roderic_guigo" target="_blank">Guigó Lab, Centre for Genomic Regulation (CRG)</a> and the <a href="https://www.gencodegenes.org/" target="_blank">GENCODE consortium</a>.<br> The track set was constructed by Sílvia Carbonell-Sala, Andrea Tanzer, and Mark Diekhans.</p> <h2>References</h2> <p> Kaur G, Perteghella T, Carbonell-Sala S, Gonzalez-Martinez J, Hunt T, Mądry T, Jungreis I, Arnan C, Lagarde J, Borsari B <em>et al</em>. <a href="https://doi.org/10.1101/2024.10.29.620654" target="_blank"> GENCODE: massively expanding the lncRNA catalog through capture long-read RNA sequencing</a>. <em>bioRxiv</em>. 2024 Oct 31;. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39554180" target="_blank">39554180</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11565817/" target="_blank">PMC11565817</a> </p> <p> Mudge JM, Carbonell-Sala S, Diekhans M, Martinez JG, Hunt T, Jungreis I, Loveland JE, Arnan C, Barnes I, Bennett R <em>et al</em>. <a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkae1078" target="_blank"> GENCODE 2025: reference gene annotation for human and mouse</a>. <em>Nucleic Acids Res</em>. 2025 Jan 6;53(D1):D966-D975. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39565199" target="_blank">39565199</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11701607/" target="_blank">PMC11701607</a> </p> <p> Pardo-Palacios FJ, Wang D, Reese F, Diekhans M, Carbonell-Sala S, Williams B, Loveland JE, De María M, Adams MS, Balderrama-Gutierrez G <em>et al</em>. <a href="https://doi.org/10.1038/s41592-024-02298-3" target="_blank"> Systematic assessment of long-read RNA-seq methods for transcript identification and quantification</a>. <em>Nat Methods</em>. 2024 Jul;21(7):1349-1363. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38849569" target="_blank">38849569</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11543605/" target="_blank">PMC11543605</a> </p> <p> Carbonell-Sala S, Perteghella T, Lagarde J, Nishiyori H, Palumbo E, Arnan C, Takahashi H, Carninci P, Uszczynska-Ratajczak B, Guigó R. <a href="https://doi.org/10.1038/s41467-024-49523-3" target="_blank"> CapTrap-seq: a platform-agnostic and quantitative approach for high-fidelity full-length RNA sequencing</a>. <em>Nat Commun</em>. 2024 Jun 27;15(1):5278. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38937428" target="_blank">38937428</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11211341/" target="_blank">PMC11211341</a> </p> <p> <em>LyRic</em>: Long RNA-seq analysis workflow <a href="https://github.com/guigolab/LyRic">https://github.com/guigolab/LyRic</a> </p>