888e7470c14eeecdca310ed36bb45c3c00ae8052
lrnassar
  Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359

Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.

Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).

trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.

Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.

relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.

Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.

diff --git src/hg/makeDb/trackDb/human/hg38/mpra.html src/hg/makeDb/trackDb/human/hg38/mpra.html
index 2d9d9aefb98..fef166bcacf 100644
--- src/hg/makeDb/trackDb/human/hg38/mpra.html
+++ src/hg/makeDb/trackDb/human/hg38/mpra.html
@@ -1,62 +1,71 @@
 <h2>Description</h2>
 <p>
 Massively Parallel Reporter Assays
 (MPRAs) are high-throughput experimental methods that measure 
 transcriptional output of thousands of short DNA sequences using sequencing. 
 If in addition, a mutated sequence is tested, the impact of a genetic variant can be quantified.
 </p>
 <p>
 This track collection brings together results from two MPRA databases, one for the complete sequence fragments,
 one for the impact of variants in selected fragments:
 </p>
 
+<p>
+<b>Note on cell lines:</b> The cell line shown for each element or variant is the
+reporter cell line in which the human sequence was assayed. Several studies used
+mouse cell lines (e.g. Neuro-2a, N2A, NIH/3T3, MIN6, mESC) as reporter systems
+for human regulatory sequences; all items retain human (hg38) coordinates.
+</p>
+
 <ul>
 <li><b><a href="hgTrackUi?g=mprabase">MPRA Base</a></b> &mdash;
-41,275 experimentally tested cis-regulatory elements from 51 MPRA, STARR-seq,
-and related reporter assay experiments, curated in the MPRA Base database
+41,275 experimentally tested cis-regulatory elements curated from the MPRA Base
+database, which integrates MPRA, STARR-seq, and related reporter assay
+experiments across many cell types and conditions
 (<a href="https://pubmed.ncbi.nlm.nih.gov/38045264/" target="_blank">Zhao et al., 2023</a>).
 </li>
 <li><b><a href="hgTrackUi?g=mpraVarDb">MPRAVarDB</a></b> &mdash;
-242,818 variants from 18 MPRA studies, tested for effects on transcriptional
-regulatory activity across over 30 cell lines and 30 human diseases and traits
+239,028 variants mapped to hg38 (of 242,818 total) from 18 MPRA studies, tested
+for effects on transcriptional regulatory activity across over 30 cell lines and
+30 human diseases and traits
 (<a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">Wang et al., 2024</a>).
 </li>
 </ul>
 
 <h2>Data Access</h2>
 <p>
 See the individual subtrack documentation pages linked above for detailed information
 on how to download and intersect the annotations.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Tao Wang and colleagues at the University of Florida for
 <a href="https://mpravardb.rc.ufl.edu/" target="_blank">MPRAVarDB</a>,
 and to Varda Singhal and the
 <a href="https://pharm.ucsf.edu/ahituv" target="_blank">Ahituv Lab</a>
 at the University of California San Francisco for
 <a href="http://mprabase.ucsf.edu/app/mprabase" target="_blank">MPRA Base</a>.
 </p>
 
 <h2>References</h2>
 <p>
 Wang T, Matreyek KA, Yang X.
 <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">
 MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants using MPRA data</a>.
 <em>Bioinformatics</em>. 2024 Apr 15;40(4):btae201.
 PMID: <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">38617248</a>;
 PMC: <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11014600/" target="_blank">PMC11014600</a>
 </p>
 
 
 <p>
 Zhao J, Baltoumas FA, Konnaris MA, Mouratidis I, Liu Z, Sims J, Agarwal V, Pavlopoulos GA,
 Georgakopoulos-Soares I, Ahituv N.
 <a href="https://doi.org/10.1101/2023.11.19.567742" target="_blank">
 MPRAbase: A Massively Parallel Reporter Assay Database</a>.
 <em>bioRxiv</em>. 2023 Nov 22;.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/38045264" target="_blank">38045264</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690217/" target="_blank">PMC10690217</a>
 </p>