888e7470c14eeecdca310ed36bb45c3c00ae8052
lrnassar
  Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359

Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.

Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).

trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.

Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.

relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.

Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.

diff --git src/hg/makeDb/trackDb/relatedTracks.ra src/hg/makeDb/trackDb/relatedTracks.ra
index 754ef48affe..0c90ca47313 100644
--- src/hg/makeDb/trackDb/relatedTracks.ra
+++ src/hg/makeDb/trackDb/relatedTracks.ra
@@ -1,126 +1,132 @@
 # A space delimited file of track relatedness. All entries must be reciprocal. Format:
 # ucscDb track trackLinkingTo reason
 
 # hg38:
 hg38 knownGene knownGeneArchive View previous versions of GENCODE Genes
 hg38 knownGeneArchive knownGene View the latest GENCODE Genes version
 
 hg38 miRnaAtlas nonCodingRNAs View associated precursor miRnas
 hg38 nonCodingRNAs miRnaAtlas View expression of cleaved miRnas
 
 hg38 caddSuper gnomad View associated variants
 hg38 gnomad caddSuper View CADD scores for this variant and region
 
 hg38 constraintSuper gnomadPLI Predicted constraint metrics from gnomAD
 hg38 gnomadPLI constraintSuper Container track of various constraint scores
 
 hg38 gnomadStr strVar A collection of population-level STR variation tracks across the genome
 hg38 strVar gnomadStr Population-level STR variation across disease-associated loci from gnomAD v3.1.3
 
 hg38 revel liftHg38 Revel is based on hg19 and lifted to hg38. liftOver "chain" alignment from hg19 to hg38
 hg38 liftHg38 revel Revel scores were lifted using UCSC liftOver chains from hg38
 
 hg38 revel caddSuper CADD, a similar deleteriousness score, and not used as an input by REVEL
 hg38 caddSuper revel REVEL, a similar deleteriousness score
 
 hg38 liftHg19 grcIncidentDb GRC Incident database, to explore reasons why the assembly was changed
 hg38 grcIncidentDb liftHg19 LiftOver for hg38, explores how incident regions aligned between human assemblies
 
 hg38 ReMap liftHg19 NCBI ReMap, even though it has the same name, is a liftOver-like hg19/hg38 alignment, and unrelated to the ReMap database
 hg38 liftHg19 ReMap ReMap, even though it has the same name, is a database of transcription factor binding sites, unrelated to NCBI ReMap
 
 hg38 ReMap jaspar JASPAR is a database of predicted TF binding sites, based on short DNA matches. Unlike ReMap, the data is purely computational.
 
 hg38 jaspar ReMap ReMap is a database of TF binding sites inferred from ChIP-Seq Data. Unlike JASPAR predictions, these sites are supported by functional assay
 
 hg38 problematic mappability The mappability track contains regions where short sequencing reads are hard to align
 hg38 mappability problematic The problematic regions track contains various gene clusters and the ENCODE blacklist
 hg38 problematic grcIncidentDb The GRC (Genome Reference Consortium) incidents track contains regions that were flagged by the group that puts together the genome 
 hg38 grcIncidentDb problematic The problematic regions track lists unusual regions and the ones that often lead to artefacts when aligning reads to the reference genome
 
 hg38 phasedVars varFreqs The variant frequencies track contains projects where variant frequencies, aka allele frequencies, are publicly available.
 hg38 varFreqs phasedVars The phased variants track contains projects that provide haplotype-phased genotypes/variants.
 
 hg38 wgEncodeReg4 wgEncodeReg Previous ENCODE3 Regulation track
 hg38 wgEncodeReg wgEncodeReg4 New ENCODE4 Regulation track
 hg38 wgEncodeReg4 cCREs Related ENCODE4 cCRE annotations
 hg38 cCREs wgEncodeReg4 Related ENCODE4 regulation data
 
 hg38 avada varaico The AVADA track is no longer updated. See VARAICO for the latest variants mined from papers.
 hg38 varaico avada Previous literature mining track for variants extracted from publications. No longer updated.
 
 # hg19:
 hg19 caddSuper gnomad View associated variants
 hg19 gnomad caddSuper View CADD scores for this variant and region
 
 hg19 decipherHaploIns gnomadPLI Compare haploinsufficiency metrics as defined by gnomAD
 hg19 gnomadPLI decipherHaploIns Compare constraint metrics as defined by DECIPHER
 
 hg19 revel caddSuper CADD, a similar deleteriousness score, and not used as an input by REVEL
 hg19 caddSuper revel REVEL, a similar deleteriousness score
 
 hg19 liftHg38 grcIncidentDb GRC Incident database, to explore reasons why the assembly was changed
 hg19 grcIncidentDb liftHg38 LiftOver alignments between hg38 and hg38 to explore how the GRC incident assembly changes affect whole-genome alignments between hg19 and hg38 used for lifting data from hg19
 
 hg19 fixSeqLiftOverPsl liftHg38 Investigate how patches affect the whole-genome alignment used for liftOver
 hg19 liftHg38 fixSeqLiftOverPsl Investigate how assembly patches affect the liftOver alignment
 
 hg19 liftHg38 hg38ContigDiff Hg38 Diff shows contigs that were changed from hg19 to hg38
 hg19 hg38ContigDiff liftHg38 Investigate how contig changes affect the liftOver alignments
 
 hg19 jaspar ReMap ReMap is a database of TF binding sites inferred from ChIP-Seq Data. Unlike JASPAR predictions, these sites are supported by functional assay
 hg19 ReMap jaspar JASPAR is a database of predicted TF binding sites, based on short DNA matches. Unlike ReMap, the data is purely computational.
 
 hg19 ReMap liftHg38 NCBI ReMap, even though it has the same name, is a liftOver-like hg19/hg38 alignment, and unrelated to the ReMap database
 hg19 liftHg38 ReMap ReMap, even though it has the same name, is a database of transcription factor binding sites, unrelated to NCBI ReMap
 
 hg19 refSeqComposite pseudoYale60 NCBI RefSeq Curated and RefSeq Other contains pseudogenes, but the Yale annotation should be more comprehensive for this transcript type
 hg19 pseudoYale60 refSeqComposite NCBI RefSeq Curated and RefSeq Other also contain some transcribed and untranscribed pseudogenes, respectively.
 
 hg19 constraintSuper gnomadPLI Predicted constraint metrics from gnomAD
 hg19 gnomadPLI constraintSuper Container track of various constraint scores
 
 hg19 avada varaico The AVADA track is no longer updated. See VARAICO for the latest variants mined from papers.
 hg19 varaico avada Previous literature mining track for variants extracted from publications. No longer updated.
 
 # mm39:
 
 mm39 knownGene knownGeneArchive View previous versions of GENCODE Genes
 mm39 knownGeneArchive knownGene View the latest GENCODE Genes version
 
 # mm10 ENCODE4 Regulation:
 mm10 encode4Reg encode3Reg Previous ENCODE3 Regulation track
 mm10 encode3Reg encode4Reg New ENCODE4 Regulation track
 mm10 encode4Reg cCREs Related ENCODE4 cCRE annotations
 mm10 cCREs encode4Reg Related ENCODE4 regulation data
 
 # hg38 long-read SV supertrack cross-links to other SV resources:
 hg38 lrSv gnomadStructuralVariants Short-read structural variants from gnomAD v4.1
 hg38 gnomadStructuralVariants lrSv Long-read structural variants across multiple cohorts
 hg38 lrSv dbVarSv NCBI dbVar structural variants (short-read and long-read, germline and clinical)
 hg38 dbVarSv lrSv Long-read structural variants across multiple cohorts
 hg38 lrSv dgvPlus Database of Genomic Variants (DGV) structural variation catalog
 hg38 dgvPlus lrSv Long-read structural variants across multiple cohorts
 hg38 lrSv giabSv Genome in a Bottle high-confidence SV benchmark callsets
 hg38 giabSv lrSv Long-read structural variants across multiple cohorts
 
 # PrimateAI-3D cross-links:
 hg38 primateAi alphaMissense AlphaMissense, a similar deep-learning missense pathogenicity predictor
 hg38 alphaMissense primateAi PrimateAI-3D, a similar deep-learning missense pathogenicity predictor using primate variation
 hg38 primateAi revel REVEL, an ensemble missense pathogenicity score built from multiple predictors
 hg38 revel primateAi PrimateAI-3D, a missense pathogenicity predictor using primate variation and 3D protein structure
 
 hg19 primateAi revel REVEL, an ensemble missense pathogenicity score built from multiple predictors
 hg19 revel primateAi PrimateAI-3D, a missense pathogenicity predictor using primate variation and 3D protein structure
 
 # PromoterAI cross-links:
 hg38 promoterAi primateAi PrimateAI-3D, a companion deep-learning model from Illumina for coding (missense) variants
 hg38 primateAi promoterAi PromoterAI, a companion deep-learning model from Illumina for non-coding promoter variants
 hg38 promoterAi alphaMissense AlphaMissense, a deep-learning predictor of missense (coding) variant pathogenicity
 hg38 alphaMissense promoterAi PromoterAI, a deep-learning predictor of expression-altering variants in promoter regions
 
 # NMD Escape cross-links:
 hg38 nmd mane MANE Select transcripts from NCBI/EBI, a curated subset of RefSeq/Ensembl transcripts used as clinical reference
 hg38 mane nmd NMD Escape: predicted regions where premature termination codons escape nonsense-mediated decay
 hg38 nmd ncbiRefSeq NCBI RefSeq transcripts, the source annotation set for the NMD Escape RefSeq subtrack
 hg38 ncbiRefSeq nmd NMD Escape: predicted regions where premature termination codons escape nonsense-mediated decay
+
+# MPRA cross-links:
+hg38 mpra wgEncodeReg4 ENCODE regulatory region annotations, many of which are tested by MPRA assays
+hg38 wgEncodeReg4 mpra Experimental MPRA measurements of regulatory activity for candidate elements
+hg38 mpra cCREs Candidate cis-regulatory elements; many overlap MPRA-tested fragments
+hg38 cCREs mpra Experimentally validated regulatory activity from MPRA assays for overlapping elements