888e7470c14eeecdca310ed36bb45c3c00ae8052 lrnassar Tue Apr 21 15:14:04 2026 -0700 QA fixes for MPRA superTrack. refs #37359 Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing hgTrackDb -strict to silently drop the subtrack. Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8 in user-visible string fields (curly quotes, primes, NBSP mojibake) that the browser does not transcode, eliminating ~246k non-ASCII occurrences across 42% of rows; and change safe_float / pval_to_score to write NaN and return score 0 for NA / out-of-range p-values instead of 0.0 and score 1000 (previously inflated untested variants to the top of score-sorted views). trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant mouseOverField, align parent mpra on, add filterValues for cell_line/assay/cellLine and filterByRange sliders for percentile_rank / fdr / log2FC, add labelFields and maxWindowToDraw. Description pages: add cross-species disclosure (mouse reporter cells used to assay human sequences), update mpraVarDb header to post-liftOver count 239,028 with Studies-table footnote, fix mpraVarDb.html download-server paths, soften imprecise "51 MPRA experiments" claim in mpra.html and mprabase.html. relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs. Expand mpra.txt makedoc with upstream provenance and QA-rebuild log. diff --git src/hg/makeDb/trackDb/human/hg38/mpra.html src/hg/makeDb/trackDb/human/hg38/mpra.html index 2d9d9aefb98..fef166bcacf 100644 --- src/hg/makeDb/trackDb/human/hg38/mpra.html +++ src/hg/makeDb/trackDb/human/hg38/mpra.html @@ -1,62 +1,71 @@ <h2>Description</h2> <p> Massively Parallel Reporter Assays (MPRAs) are high-throughput experimental methods that measure transcriptional output of thousands of short DNA sequences using sequencing. If in addition, a mutated sequence is tested, the impact of a genetic variant can be quantified. </p> <p> This track collection brings together results from two MPRA databases, one for the complete sequence fragments, one for the impact of variants in selected fragments: </p> +<p> +<b>Note on cell lines:</b> The cell line shown for each element or variant is the +reporter cell line in which the human sequence was assayed. Several studies used +mouse cell lines (e.g. Neuro-2a, N2A, NIH/3T3, MIN6, mESC) as reporter systems +for human regulatory sequences; all items retain human (hg38) coordinates. +</p> + <ul> <li><b><a href="hgTrackUi?g=mprabase">MPRA Base</a></b> — -41,275 experimentally tested cis-regulatory elements from 51 MPRA, STARR-seq, -and related reporter assay experiments, curated in the MPRA Base database +41,275 experimentally tested cis-regulatory elements curated from the MPRA Base +database, which integrates MPRA, STARR-seq, and related reporter assay +experiments across many cell types and conditions (<a href="https://pubmed.ncbi.nlm.nih.gov/38045264/" target="_blank">Zhao et al., 2023</a>). </li> <li><b><a href="hgTrackUi?g=mpraVarDb">MPRAVarDB</a></b> — -242,818 variants from 18 MPRA studies, tested for effects on transcriptional -regulatory activity across over 30 cell lines and 30 human diseases and traits +239,028 variants mapped to hg38 (of 242,818 total) from 18 MPRA studies, tested +for effects on transcriptional regulatory activity across over 30 cell lines and +30 human diseases and traits (<a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">Wang et al., 2024</a>). </li> </ul> <h2>Data Access</h2> <p> See the individual subtrack documentation pages linked above for detailed information on how to download and intersect the annotations. </p> <h2>Credits</h2> <p> Thanks to Tao Wang and colleagues at the University of Florida for <a href="https://mpravardb.rc.ufl.edu/" target="_blank">MPRAVarDB</a>, and to Varda Singhal and the <a href="https://pharm.ucsf.edu/ahituv" target="_blank">Ahituv Lab</a> at the University of California San Francisco for <a href="http://mprabase.ucsf.edu/app/mprabase" target="_blank">MPRA Base</a>. </p> <h2>References</h2> <p> Wang T, Matreyek KA, Yang X. <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank"> MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants using MPRA data</a>. <em>Bioinformatics</em>. 2024 Apr 15;40(4):btae201. PMID: <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">38617248</a>; PMC: <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11014600/" target="_blank">PMC11014600</a> </p> <p> Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA, Georgakopoulos-Soares I, Ahituv N. <a href="https://doi.org/10.1101/2023.11.19.567742" target="_blank"> MPRAbase: A Massively Parallel Reporter Assay Database</a>. <em>bioRxiv</em>. 2023 Nov 22;. PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38045264" target="_blank">38045264</a>; PMC: <a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690217/" target="_blank">PMC10690217</a> </p>