888e7470c14eeecdca310ed36bb45c3c00ae8052
lrnassar
  Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359

Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.

Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).

trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.

Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.

relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.

Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.

diff --git src/hg/makeDb/trackDb/relatedTracks.ra src/hg/makeDb/trackDb/relatedTracks.ra
index 754ef48affe..0c90ca47313 100644
--- src/hg/makeDb/trackDb/relatedTracks.ra
+++ src/hg/makeDb/trackDb/relatedTracks.ra
@@ -112,15 +112,21 @@
 
 hg19 primateAi revel REVEL, an ensemble missense pathogenicity score built from multiple predictors
 hg19 revel primateAi PrimateAI-3D, a missense pathogenicity predictor using primate variation and 3D protein structure
 
 # PromoterAI cross-links:
 hg38 promoterAi primateAi PrimateAI-3D, a companion deep-learning model from Illumina for coding (missense) variants
 hg38 primateAi promoterAi PromoterAI, a companion deep-learning model from Illumina for non-coding promoter variants
 hg38 promoterAi alphaMissense AlphaMissense, a deep-learning predictor of missense (coding) variant pathogenicity
 hg38 alphaMissense promoterAi PromoterAI, a deep-learning predictor of expression-altering variants in promoter regions
 
 # NMD Escape cross-links:
 hg38 nmd mane MANE Select transcripts from NCBI/EBI, a curated subset of RefSeq/Ensembl transcripts used as clinical reference
 hg38 mane nmd NMD Escape: predicted regions where premature termination codons escape nonsense-mediated decay
 hg38 nmd ncbiRefSeq NCBI RefSeq transcripts, the source annotation set for the NMD Escape RefSeq subtrack
 hg38 ncbiRefSeq nmd NMD Escape: predicted regions where premature termination codons escape nonsense-mediated decay
+
+# MPRA cross-links:
+hg38 mpra wgEncodeReg4 ENCODE regulatory region annotations, many of which are tested by MPRA assays
+hg38 wgEncodeReg4 mpra Experimental MPRA measurements of regulatory activity for candidate elements
+hg38 mpra cCREs Candidate cis-regulatory elements; many overlap MPRA-tested fragments
+hg38 cCREs mpra Experimentally validated regulatory activity from MPRA assays for overlapping elements