888e7470c14eeecdca310ed36bb45c3c00ae8052
lrnassar
  Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359

Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.

Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).

trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.

Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.

relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.

Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.

diff --git src/hg/makeDb/trackDb/human/hg38/mprabase.html src/hg/makeDb/trackDb/human/hg38/mprabase.html
index 837554a8c0c..c316baee431 100644
--- src/hg/makeDb/trackDb/human/hg38/mprabase.html
+++ src/hg/makeDb/trackDb/human/hg38/mprabase.html
@@ -1,34 +1,41 @@
 <h2>Description</h2>
 <p>
 Massively Parallel Reporter Assays (MPRAs) and related methods such as STARR-seq
 enable quantitative testing of thousands of candidate regulatory DNA sequences in
 parallel by linking each sequence to a reporter gene and measuring transcriptional
 output using sequencing.
 </p>
 
 <p>
 The <b>MPRA Base</b> track shows 41,275 experimentally tested cis-regulatory elements
-from the <a href="http://mprabase.ucsf.edu/app/mprabase" target="_blank">MPRA Base</a>
+curated from the <a href="http://mprabase.ucsf.edu/app/mprabase" target="_blank">MPRA Base</a>
 database
-(<a href="https://pubmed.ncbi.nlm.nih.gov/38045264/" target="_blank">Zhao et al., 2023</a>).
+(<a href="https://pubmed.ncbi.nlm.nih.gov/38045264/" target="_blank">Zhao et al., 2023</a>),
+drawn from MPRA, STARR-seq, and related reporter assay experiments.
 The database integrates data from multiple studies, assay platforms (lentiMPRA,
 plasmidMPRA, STARR-seq, CRE-seq, and others), and cell types while preserving
 experiment-level resolution. Only elements derived from genomic fragments that can
 be mapped to the reference genome are included; synthetic or designed oligonucleotide
 libraries without genomic coordinates are excluded.
 </p>
+<p>
+<b>Note on cell lines:</b> The cell line shown for each element is the reporter
+cell line in which the genomic fragment was assayed. One study (Mattioli et al.,
+2020) used mouse embryonic stem cells (mESC) as one of its reporter systems; the
+fragments retain their human (hg38) coordinates.
+</p>
 
 <h2>Display Conventions</h2>
 <p>
 Each item represents a genomic fragment tested within a specific experiment, defined
 as a unique combination of cell line, assay type, and publication (PMID). The same
 genomic region may appear multiple times if tested in different experiments.
 </p>
 
 <p>
 Items are colored by percentile rank of the mean raw activity score within each experiment:
 </p>
 <ul>
 <li><span style="color:blue;"><b>Blue</b></span> &mdash; percentile &lt; 50</li>
 <li><span style="color:orange;"><b>Orange</b></span> &mdash; percentile 50&ndash;74</li>
 <li><span style="color:red;"><b>Red</b></span> &mdash; percentile &ge; 75</li>