888e7470c14eeecdca310ed36bb45c3c00ae8052
lrnassar
  Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359

Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.

Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).

trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.

Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.

relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.

Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.

diff --git src/hg/makeDb/trackDb/human/hg38/mpraVarDb.html src/hg/makeDb/trackDb/human/hg38/mpraVarDb.html
index 99e1158d586..7048cc83bf5 100644
--- src/hg/makeDb/trackDb/human/hg38/mpraVarDb.html
+++ src/hg/makeDb/trackDb/human/hg38/mpraVarDb.html
@@ -1,220 +1,230 @@
 <h2>Description</h2>
 <p>
-The <b>MPRAVarDB</b> track shows 242,818 variants from 18 MPRA studies compiled
-in the MPRAVarDB database
+The <b>MPRAVarDB</b> track shows 239,028 variants successfully mapped to hg38
+(from 242,818 total) across 18 MPRA studies compiled in the MPRAVarDB database
 (<a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">Wang et al., 2024</a>).
 Each variant was experimentally tested in an MPRA experiment to evaluate whether it
 affects transcriptional regulatory activity. The database covers over 30 cell lines
 and 30 human diseases and traits, including neurodegenerative diseases, immune
 disorders, melanoma, multiple myeloma, and autoimmune diseases.
 </p>
+<p>
+<b>Note on cell lines:</b> The cell line shown for each variant is the reporter
+cell line in which the human regulatory element was assayed. Several studies
+used mouse cell lines (e.g. Neuro-2a, N2A, NIH/3T3, MIN6) as reporter systems
+for human sequences; these variants retain human (hg38) coordinates.
+</p>
 
 <h2>Display Conventions</h2>
 <p>
 Items are colored by statistical significance:
 <ul>
 <li><b><span style="color: #C80000;">Dark red</span></b>: FDR &lt; 0.05 (significant after multiple testing correction) &mdash; 22,465 variants (9.3%)</li>
 <li><b><span style="color: #FFA500;">Orange</span></b>: nominal p-value &lt; 0.05 but FDR &ge; 0.05 &mdash; 17,780 variants (7.3%)</li>
 <li><b><span style="color: #BEBEBE;">Grey</span></b>: not significant (p-value &ge; 0.05) &mdash; 202,573 variants (83.4%)</li>
 </ul>
 </p>
 <p>
 Each item shows the variant name (rsID when available, otherwise chr:pos:ref&gt;alt),
 the reference and alternate alleles, the associated disease or trait, cell line,
 log2 fold change, p-value, and FDR.
 </p>
 
 <h2>Studies</h2>
 <p>
 The following table lists the 18 MPRA studies included in MPRAVarDB, with the number of
 tested variants, diseases/traits, cell lines, and a brief description of the variant selection.
 </p>
 
 <table class="stdTbl">
 <tr>
   <th>Study</th>
   <th>Variants</th>
   <th>Disease/Trait</th>
   <th>Cell Line(s)</th>
   <th>Description</th>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/34534445/" target="_blank">Griesemer et al., 2021</a></td>
   <td>72,588</td>
   <td>NHGRI-EBI GWAS catalog</td>
   <td>GM12878, HEK293FT, HMEC, HepG2, K562, SKNSH</td>
   <td>3'UTR SNPs and indels in LD with GWAS catalog variants, variants under positive selection, and rare outlier expression variants from GTEx</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/31395865/" target="_blank">Kircher et al., 2019</a></td>
   <td>44,647</td>
   <td>Various (18 diseases including diabetes, cancer, blood disorders, limb malformations)</td>
   <td>HEK293T, HEL92.1.7, HaCaT, HeLa, HepG2, K562, LNCaP, MIN6, NIH/3T3, Neuro-2a, SK-MEL-28, SF7996</td>
   <td>Saturation mutagenesis of 20 disease-associated regulatory elements at single base-pair resolution</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/35298243/" target="_blank">Abell et al., 2022</a></td>
   <td>29,582</td>
   <td>eQTL (no specific disease)</td>
   <td>GM12878</td>
   <td>30,893 variants in LD with independent, common, top-ranked eQTL across 744 eGenes in the CEU cohort</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/27259153/" target="_blank">Tewhey et al., 2016</a></td>
   <td>27,138</td>
   <td>eQTL (no specific disease)</td>
   <td>GM12878</td>
   <td>32,373 variants associated with eQTLs in lymphoblastoid cell lines</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/37516102/" target="_blank">Schuster et al., 2023</a></td>
   <td>26,546</td>
   <td>Prostate cancer</td>
   <td>PC3</td>
   <td>14,497 single-nucleotide mutations enriched in oncogenic pathways and 3'UTR regulatory elements</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/35513721/" target="_blank">Mouri et al., 2022</a></td>
   <td>14,551</td>
   <td>Autoimmune diseases (Crohn's, IBD, psoriasis, MS, RA, T1D, ulcerative colitis)</td>
   <td>Jurkat</td>
   <td>GWAS variants from autoimmune disease loci tested for regulatory element activity in T cells</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/37868037/" target="_blank">McAfee et al., 2023</a></td>
   <td>10,310</td>
   <td>Schizophrenia</td>
   <td>HEK293s, HNPS</td>
   <td>5,173 fine-mapped schizophrenia GWAS variants</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/35981026/" target="_blank">Cooper et al., 2022</a></td>
   <td>5,340</td>
   <td>Alzheimer's disease, Progressive supranuclear palsy</td>
   <td>HEK293T</td>
   <td>5,706 noncoding SNVs from 25 AD and 9 PSP genome-wide significant loci</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/36423637/" target="_blank">Long et al., 2022</a></td>
   <td>3,980</td>
   <td>Melanoma</td>
   <td>C283T, UACC903</td>
   <td>1,992 risk-associated variants in tight LD (r2&gt;0.8) from 54 melanoma risk loci</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/31503409/" target="_blank">Myint et al., 2020</a></td>
   <td>2,158</td>
   <td>Schizophrenia, Alzheimer's disease</td>
   <td>K562, SH-SY5Y</td>
   <td>1,049 SZ and 30 AD variants in 64 SZ loci and 9 AD loci</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/32483191/" target="_blank">Choi et al., 2020</a></td>
   <td>1,664</td>
   <td>Melanoma</td>
   <td>HEK293FT, UACC903</td>
   <td>GWAS melanoma risk variants</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/35013207/" target="_blank">Ajore et al., 2022</a></td>
   <td>1,582</td>
   <td>Multiple myeloma</td>
   <td>L363, MOLP8</td>
   <td>1,039 variants in high LD (r2&gt;0.8) at 23 MM risk loci</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/31164647/" target="_blank">Klein et al., 2019</a></td>
   <td>1,119</td>
   <td>Osteoarthritis</td>
   <td>Saos-2</td>
   <td>1,605 SNPs in high LD (r2&gt;0.8) at 35 lead SNPs associated with OA via GWAS</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/33712590/" target="_blank">Lu et al., 2021</a></td>
   <td>1,038</td>
   <td>Systemic lupus erythematosus</td>
   <td>GM12878, Jurkat</td>
   <td>18,312 variants in tight LD (r2&gt;0.8) with 578 GWAS index variants at 531 loci</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/34294677/" target="_blank">Mulvey &amp; Dougherty, 2021</a></td>
   <td>275</td>
   <td>Major depressive disorder</td>
   <td>N2A</td>
   <td>Over 1,000 SNPs from 39 neuropsychiatric GWAS loci, selected by overlap with eQTL and histone marks</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/32913073/" target="_blank">Ferraro et al., 2020</a></td>
   <td>150</td>
   <td>Rare variant expression (no specific disease)</td>
   <td>GM12878</td>
   <td>Rare variants contributing to extreme expression, allelic expression, and splicing across 49 GTEx tissues</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/31477794/" target="_blank">Rao et al., 2021</a></td>
   <td>88</td>
   <td>Alcohol use disorder</td>
   <td>BLA, CE, NAC, SFC</td>
   <td>SNPs in 3'UTR of 88 genes from allele-specific expression analysis (30 AUD subjects vs 30 controls)</td>
 </tr>
 <tr>
   <td><a href="https://pubmed.ncbi.nlm.nih.gov/27259154/" target="_blank">Ulirsch et al., 2016</a></td>
   <td>62</td>
   <td>Red blood cell traits</td>
   <td>K562, K562+GATA1</td>
   <td>2,756 variants in strong LD with 75 sentinel variants associated with RBC traits</td>
 </tr>
 </table>
+<p>
+Variant counts above are from the source publications (pre-liftOver totals).
+Of 242,818 total source variants, 239,028 lifted successfully to hg38; see Methods.
+</p>
 
 <h2>Methods</h2>
 <p>
 Data was downloaded from the
 <a href="https://mpravardb.rc.ufl.edu/" target="_blank">MPRAVarDB web server</a>.
 Variants originally mapped to hg19 (213,689 of 242,818) were lifted to hg38
 using <code>liftOver</code>. 114 variants could not be mapped and were excluded.
 The remaining variants were merged with the 29,129 natively hg38-mapped variants
 to produce a total of 239,028 hg38 records.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>
 and exported from there to spreadsheet or tab-sep tables.
 From scripts, the data can be accessed through our
 <a href="https://api.genome.ucsc.edu" target="_blank">API</a>, track=<i>mpraVarDb</i>.
 </p>
 <p>
 For automated download and analysis, the genome annotation is stored in a bigBed
 file that can be downloaded from
-<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra" target="_blank">our download server</a>.
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra/mpravardb" target="_blank">our download server</a>.
 The file for this track is called <tt>mpravardb.bb</tt>. Individual
 regions or the whole genome annotation can be obtained using our tool
 <tt>bigBedToBed</tt>, which can be compiled from the source code or downloaded as a
 precompiled binary for your system. Instructions for downloading source code and
 binaries can be found
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads" target="_blank">here</a>.
 The tool can also be used to obtain features within a given range, e.g.
-<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra/mpravardb.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>
+<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mpra/mpravardb/mpravardb.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>
 </p>
 <p>
 The original annotation source data can be downloaded from the
 <a href="https://mpravardb.rc.ufl.edu/" target="_blank">MPRAVarDB web server</a>.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to Tao Wang and colleagues at the University of Florida for creating and
 maintaining the MPRAVarDB database.
 </p>
 
 <h2>References</h2>
 <p>
 Wang T, Matreyek KA, Yang X.
 <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">
 MPRAVarDB: an online database and web server for exploring regulatory effects of genetic variants using MPRA data</a>.
 <em>Bioinformatics</em>. 2024 Apr 15;40(4):btae201.
 PMID: <a href="https://pubmed.ncbi.nlm.nih.gov/38617248/" target="_blank">38617248</a>;
 PMC: <a href="https://pmc.ncbi.nlm.nih.gov/articles/PMC11014600/" target="_blank">PMC11014600</a>
 </p>