File Changes for lrnassar
switch to commits view, user indexv497_base to v498_preview (2026-04-20 to 2026-04-27) v498
Show details
- src/hg/htdocs/goldenPath/newsarch.html
- lines changed 35, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 2, context: html, text, full: html, text
d23d0116ff17b126a498c8d02bdef578d0ab1b53 Wed Apr 22 12:51:20 2026 -0700
Update NMD Escape newsarch entry to match shipped Rule 2 definition. refs #33737
Rule 2 is no longer the 'intronless transcript rule' after the round 4
gate refinement (single coding exon AND no 3'UTR intron). Updated the
newsarch entry to match.
- lines changed 16, context: html, text, full: html, text
395a8efc6994c18a3b0bdfcee82217ff9d78b739 Wed Apr 22 12:54:59 2026 -0700
Expand NMD Escape newsarch rules into sub-bullets. refs #33737
Break the four-rule summary into individual sub-bullets under the
ruleset line so each rule is visible at a glance.
- src/hg/htdocs/indexNews.html
- lines changed 12, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- src/hg/makeDb/doc/hg38/mpra.txt
- lines changed 93, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/doc/hg38/nmd.txt
- lines changed 3, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 22, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- lines changed 14, context: html, text, full: html, text
34d2eee845f5f45e571d1e153c632683b8a93f75 Tue Apr 21 16:17:53 2026 -0700
Refine NMD Escape Rule 2 gate to "single coding exon and no 3'UTR intron". refs #33737
Previously Rule 2 required exonCount==1 (truly intronless). This
overcorrected for single-CDS-exon transcripts whose only introns are in
the 5'UTR: biologically these have no EJC downstream of the stop codon
(5'UTR EJCs are cleared by the scanning 40S or sit upstream of the
terminating ribosome) and are NMD-immune, but the code pushed them to
Rules 1/3 under a less accurate "last coding exon" label.
New gate: len(cdsExons) == 1 AND no exon-exon junction strictly
downstream of the stop codon (strand-aware). Transcripts with a single
coding exon but a 3'UTR intron correctly stay in Rules 1/3 because that
intron deposits an EJC that can trigger NMD.
3,113 RefSeq Curated and 10,790 Gencode V49 transcripts move into Rule
2. 140 RefSeq and 1,135 Gencode single-CDS-exon transcripts with 3'UTR
introns correctly remain in Rules 1/3. Description page and makedoc
updated.
- src/hg/makeDb/doc/hg38/promoterAi.txt
- lines changed 19, context: html, text, full: html, text
f9a89b0e1ce3c937b4fbb879736c1619c35c271f Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278
Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.
Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.
trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.
Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.
- lines changed 11, context: html, text, full: html, text
6c567fd9a03e87610681a43d2183ebb43547d1ad Fri Apr 24 17:58:57 2026 -0700
PromoterAI: review followups. refs #37278
Move /gbdb/hg38/promoterAi/ to /gbdb/hg38/_promoterAi/ to match the
underscore-prefix exclusion rule for hgdownload sync (same pattern as
PrimateAI-3D under refs #37274). bigDataUrls and the makedoc updated.
Bump bigWig maxHeightPixels from 128:20:8 to 128:40:8 -- the peer-track
default of 20 is too cramped for a signed -1..+1 score.
Description page: drop the wrong primateai3d.basespace.illumina.com link
in Data Access; PromoterAI is not on BaseSpace, it's distributed via the
license agreement on the GitHub page (a download link is emailed after
submission). Reword Data Access and Methods accordingly.
Description page: add Illumina's recommended interpretation thresholds
(|score| >= 0.1, >= 0.2, >= 0.5) from the PromoterAI GitHub README, with
a note that higher cutoffs select smaller, higher-confidence sets.
- src/hg/makeDb/scripts/mpravardb/mpravardbToBed.py
- lines changed 53, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/scripts/nmd/genePredNmdEsc
- lines changed 16, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 2, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- lines changed 13, context: html, text, full: html, text
34d2eee845f5f45e571d1e153c632683b8a93f75 Tue Apr 21 16:17:53 2026 -0700
Refine NMD Escape Rule 2 gate to "single coding exon and no 3'UTR intron". refs #33737
Previously Rule 2 required exonCount==1 (truly intronless). This
overcorrected for single-CDS-exon transcripts whose only introns are in
the 5'UTR: biologically these have no EJC downstream of the stop codon
(5'UTR EJCs are cleared by the scanning 40S or sit upstream of the
terminating ribosome) and are NMD-immune, but the code pushed them to
Rules 1/3 under a less accurate "last coding exon" label.
New gate: len(cdsExons) == 1 AND no exon-exon junction strictly
downstream of the stop codon (strand-aware). Transcripts with a single
coding exon but a 3'UTR intron correctly stay in Rules 1/3 because that
intron deposits an EJC that can trigger NMD.
3,113 RefSeq Curated and 10,790 Gencode V49 transcripts move into Rule
2. 140 RefSeq and 1,135 Gencode single-CDS-exon transcripts with 3'UTR
introns correctly remain in Rules 1/3. Description page and makedoc
updated.
- lines changed 4, context: html, text, full: html, text
fe73446acf43f70e385dadbbb281634adf3cac9e Tue Apr 21 16:44:16 2026 -0700
NMD Escape QA tweaks: hide Gencode subtrack by default, bold rule numbers in mouseovers. refs #33737
- nmdEscGencode default visibility changed from on/dense to off/hide so
only the RefSeq Curated subtrack is on by default. Per Lou's request.
- RULE_DESCRIPTIONS mouseover strings wrap the rule number in <b>...</b>
so the rule shows bold in the tooltip. Both bigBeds rebuilt.
- src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as
- lines changed 2, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- src/hg/makeDb/scripts/primateai/primateAi.as
- lines changed 2, context: html, text, full: html, text
de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 Tue Apr 21 18:22:59 2026 -0700
PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover.
Variant analysts typically work at the nucleotide level, and the current
item label (amino acid change) collapses distinguishable variants: ~17%
of items share their (chrom, pos, AA-change) tuple with another item
because of codon degeneracy (e.g. three C>A, C>G, C>T at the same
position can all appear as "M>I"). Labeling by nucleotide change makes
every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on
hg19 from overlapping transcripts).
- primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)";
new field aaChange (placed before ref/alt) holds the amino acid
change.
- primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column,
and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and
a colored prediction string.
- primateAi.ra: add labelFields name,aaChange and defaultLabelFields
name so users can toggle the on-feature label between nt change
(default) and AA change.
- primateAi.html: expand Display Conventions with the label-convention
rationale and a legend for each mouseover field.
refs #37274
- src/hg/makeDb/scripts/primateai/primateAiToBigBed.py
- lines changed 13, context: html, text, full: html, text
50466766840ded6cb8bd5cb868bdf2ff3f613bc0 Tue Apr 21 11:17:15 2026 -0700
QA fixes for PrimateAI-3D track.
Config (primateAi.ra):
- Fix broken Ensembl transcript linkout: urls $S expanded to chromosome
name; switch to the Ensembl transcript page with $$
- Add numeric filters on percentile and raw score (label notes the
paper's 0.821 clinical threshold)
- Add maxWindowToDraw 2000000
Data (primateAiToBigBed.py):
- Change hardcoded strand '+' to '.': the source file has no strand
column
- Accept input/output paths as CLI args (previously hardcoded the hg38
input path)
- Handle variable field count: ~2.4M rows in the hg19 source are
missing the refseq column
Description (primateAi.html):
- Fix two broken hgTrackUi&... internal links to the Zoonomia 447-way
track
- Regenerate the first reference via getTrackReferences (wrong article
number and wrong PMC ID in the previous text)
- Fix the GitHub URL for the conversion script in Methods
- Move the Zoonomia 447-way mention out of Description; rephrase the
license note to describe precisely what is disabled
relatedTracks.ra:
- Add reciprocal cross-links for primateAi <-> alphaMissense (hg38),
primateAi <-> revel (hg38 + hg19), and primateAi <-> promoterAi
(hg38). Also includes promoterAi <-> alphaMissense cross-links.
refs #37274 #37279
- lines changed 10, context: html, text, full: html, text
de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 Tue Apr 21 18:22:59 2026 -0700
PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover.
Variant analysts typically work at the nucleotide level, and the current
item label (amino acid change) collapses distinguishable variants: ~17%
of items share their (chrom, pos, AA-change) tuple with another item
because of codon degeneracy (e.g. three C>A, C>G, C>T at the same
position can all appear as "M>I"). Labeling by nucleotide change makes
every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on
hg19 from overlapping transcripts).
- primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)";
new field aaChange (placed before ref/alt) holds the amino acid
change.
- primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column,
and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and
a colored prediction string.
- primateAi.ra: add labelFields name,aaChange and defaultLabelFields
name so users can toggle the on-feature label between nt change
(default) and AA change.
- primateAi.html: expand Display Conventions with the label-convention
rationale and a legend for each mouseover field.
refs #37274
- src/hg/makeDb/scripts/promoterAiOverlaps.as
- lines changed 6, context: html, text, full: html, text
f9a89b0e1ce3c937b4fbb879736c1619c35c271f Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278
Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.
Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.
trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.
Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.
- src/hg/makeDb/scripts/promoterAiToBigWig.py
- lines changed 111, context: html, text, full: html, text
f9a89b0e1ce3c937b4fbb879736c1619c35c271f Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278
Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.
Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.
trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.
Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.
- src/hg/makeDb/trackDb/human/hg38/mpra.html
- lines changed 13, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/trackDb/human/hg38/mpra.ra
- lines changed 27, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/trackDb/human/hg38/mpraVarDb.html
- lines changed 14, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/trackDb/human/hg38/mprabase.html
- lines changed 9, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/hg/makeDb/trackDb/human/hg38/nmd.html
- lines changed 4, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 5, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- src/hg/makeDb/trackDb/human/hg38/nmd.ra
- lines changed 8, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 13, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- lines changed 2, context: html, text, full: html, text
fe73446acf43f70e385dadbbb281634adf3cac9e Tue Apr 21 16:44:16 2026 -0700
NMD Escape QA tweaks: hide Gencode subtrack by default, bold rule numbers in mouseovers. refs #33737
- nmdEscGencode default visibility changed from on/dense to off/hide so
only the RefSeq Curated subtrack is on by default. Per Lou's request.
- RULE_DESCRIPTIONS mouseover strings wrap the rule number in <b>...</b>
so the rule shows bold in the tooltip. Both bigBeds rebuilt.
- lines changed 2, context: html, text, full: html, text
a86b49667ad82b0f6c3745379f186f4d5753e368 Wed Apr 22 13:52:14 2026 -0700
Simplify NMD Escape subtrack longLabels. refs #33737
The '50bp/100bp/intronless/400nt' rule-list became inaccurate after the
Rule 2 refinement (Rule 2 now covers single coding exon + no 3'UTR
intron, not just intronless). Drop the enumerated rules from the
longLabel and defer to the track description page for rule detail.
- src/hg/makeDb/trackDb/human/hg38/nmdDetective.html
- lines changed 2, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- src/hg/makeDb/trackDb/human/hg38/nmdEscTranscripts.html
- lines changed 36, context: html, text, full: html, text
3972ba54c468ace338d4a5578de1d20bf6c1f9ec Mon Apr 20 15:39:26 2026 -0700
Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737
Script: added a fourth rule to genePredNmdEsc. Coding exons longer than
400 bp (excluding the last coding exon, which is already covered by the
50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and
NCBI RefSeq bigBed files.
trackDb:
- nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode
default visibility to dense so the track is visible in cart-reset
views, changed all four NMDetective subtracks from "visibility full"
to "visibility hide", updated pennantIcon to the Apr. 22, 2026
release date and anchor.
- nmd.html: mention long internal exons in the overview description,
update the rule count from three to four.
- nmdEscTranscripts.html: add the long-exon rule to the rule list and
color legend (gold, #FFD700), expand the Background section with
mechanisms for the intronless, start-proximal, and long-exon rules,
correct the 50 bp rule description to include the entire last coding
exon, fix Lindeboom 2016 author initials (RG -> RGH).
News:
- newsarch.html: add the 2026-04-22 NMD Escape news entry covering all
four rules, with acknowledgements to Guido Neidhardt and Andreas
Lahner for suggesting the track and the Decipher Genome Browser team
for inspiring the visualization.
- indexNews.html: add the front-page news link.
makedoc:
- nmd.txt: dated note for the Rule 4 rebuild.
- lines changed 8, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- lines changed 16, context: html, text, full: html, text
34d2eee845f5f45e571d1e153c632683b8a93f75 Tue Apr 21 16:17:53 2026 -0700
Refine NMD Escape Rule 2 gate to "single coding exon and no 3'UTR intron". refs #33737
Previously Rule 2 required exonCount==1 (truly intronless). This
overcorrected for single-CDS-exon transcripts whose only introns are in
the 5'UTR: biologically these have no EJC downstream of the stop codon
(5'UTR EJCs are cleared by the scanning 40S or sit upstream of the
terminating ribosome) and are NMD-immune, but the code pushed them to
Rules 1/3 under a less accurate "last coding exon" label.
New gate: len(cdsExons) == 1 AND no exon-exon junction strictly
downstream of the stop codon (strand-aware). Transcripts with a single
coding exon but a 3'UTR intron correctly stay in Rules 1/3 because that
intron deposits an EJC that can trigger NMD.
3,113 RefSeq Curated and 10,790 Gencode V49 transcripts move into Rule
2. 140 RefSeq and 1,135 Gencode single-CDS-exon transcripts with 3'UTR
introns correctly remain in Rules 1/3. Description page and makedoc
updated.
- src/hg/makeDb/trackDb/human/hg38/trackDb.ra
- lines changed 1, context: html, text, full: html, text
33e9019ef1b239ca1ab8114818f09ad65f58f2d0 Wed Apr 22 13:10:23 2026 -0700
Release NMD Escape supertrack to beta + public. refs #33737
Drop the 'alpha' gate on include nmd.ra in hg38 trackDb.ra so the
supertrack flows through the trackDb push pipeline to hgwbeta and the
RR. /gbdb/hg38/nmd/*.bb files are already on the RR.
- src/hg/makeDb/trackDb/human/primateAi.html
- lines changed 16, context: html, text, full: html, text
50466766840ded6cb8bd5cb868bdf2ff3f613bc0 Tue Apr 21 11:17:15 2026 -0700
QA fixes for PrimateAI-3D track.
Config (primateAi.ra):
- Fix broken Ensembl transcript linkout: urls $S expanded to chromosome
name; switch to the Ensembl transcript page with $$
- Add numeric filters on percentile and raw score (label notes the
paper's 0.821 clinical threshold)
- Add maxWindowToDraw 2000000
Data (primateAiToBigBed.py):
- Change hardcoded strand '+' to '.': the source file has no strand
column
- Accept input/output paths as CLI args (previously hardcoded the hg38
input path)
- Handle variable field count: ~2.4M rows in the hg19 source are
missing the refseq column
Description (primateAi.html):
- Fix two broken hgTrackUi&... internal links to the Zoonomia 447-way
track
- Regenerate the first reference via getTrackReferences (wrong article
number and wrong PMC ID in the previous text)
- Fix the GitHub URL for the conversion script in Methods
- Move the Zoonomia 447-way mention out of Description; rephrase the
license note to describe precisely what is disabled
relatedTracks.ra:
- Add reciprocal cross-links for primateAi <-> alphaMissense (hg38),
primateAi <-> revel (hg38 + hg19), and primateAi <-> promoterAi
(hg38). Also includes promoterAi <-> alphaMissense cross-links.
refs #37274 #37279
- lines changed 28, context: html, text, full: html, text
de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 Tue Apr 21 18:22:59 2026 -0700
PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover.
Variant analysts typically work at the nucleotide level, and the current
item label (amino acid change) collapses distinguishable variants: ~17%
of items share their (chrom, pos, AA-change) tuple with another item
because of codon degeneracy (e.g. three C>A, C>G, C>T at the same
position can all appear as "M>I"). Labeling by nucleotide change makes
every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on
hg19 from overlapping transcripts).
- primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)";
new field aaChange (placed before ref/alt) holds the amino acid
change.
- primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column,
and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and
a colored prediction string.
- primateAi.ra: add labelFields name,aaChange and defaultLabelFields
name so users can toggle the on-feature label between nt change
(default) and AA change.
- primateAi.html: expand Display Conventions with the label-convention
rationale and a legend for each mouseover field.
refs #37274
- lines changed 11, context: html, text, full: html, text
30374e3fc3390902c35bb463510567f1b6f7a96e Wed Apr 22 13:44:44 2026 -0700
PrimateAI-3D: clarify origin of the 0.821 threshold per Max. refs #37274
Description previously juxtaposed the paper's 0.821 clinical threshold
with the 75/25 benign/pathogenic split in a way that implied the two
were related. Per Max on the ticket: the 0.821 threshold comes from
Gao et al. 2023 Fig. 5A (calibrated against de novo missense excess
in a clinical cohort, n=7,238 pathogenic calls), and the "prediction"
column values are Illumina's own calls — not a simple application of
the 0.821 threshold (some variants below it are labeled pathogenic and
vice versa).
- lines changed 7, context: html, text, full: html, text
6e61d3349b36cbcc01500c1483cc7bfbc141d9ea Wed Apr 22 13:47:33 2026 -0700
PrimateAI-3D: tighten 0.821 threshold wording per the paper. refs #37274
Confirmed against Gao 2023 (PMC10713091): the calibration cohort is the
Deciphering Developmental Disorders (DDD) neurodevelopmental cohort, not
ClinVar. The cutoff was chosen so that the count of pathogenic calls
(n=7,238) matched the excess of de novo missense mutations above the
trinucleotide background expectation in that cohort.
- src/hg/makeDb/trackDb/human/primateAi.ra
- lines changed 10, context: html, text, full: html, text
50466766840ded6cb8bd5cb868bdf2ff3f613bc0 Tue Apr 21 11:17:15 2026 -0700
QA fixes for PrimateAI-3D track.
Config (primateAi.ra):
- Fix broken Ensembl transcript linkout: urls $S expanded to chromosome
name; switch to the Ensembl transcript page with $$
- Add numeric filters on percentile and raw score (label notes the
paper's 0.821 clinical threshold)
- Add maxWindowToDraw 2000000
Data (primateAiToBigBed.py):
- Change hardcoded strand '+' to '.': the source file has no strand
column
- Accept input/output paths as CLI args (previously hardcoded the hg38
input path)
- Handle variable field count: ~2.4M rows in the hg19 source are
missing the refseq column
Description (primateAi.html):
- Fix two broken hgTrackUi&... internal links to the Zoonomia 447-way
track
- Regenerate the first reference via getTrackReferences (wrong article
number and wrong PMC ID in the previous text)
- Fix the GitHub URL for the conversion script in Methods
- Move the Zoonomia 447-way mention out of Description; rephrase the
license note to describe precisely what is disabled
relatedTracks.ra:
- Add reciprocal cross-links for primateAi <-> alphaMissense (hg38),
primateAi <-> revel (hg38 + hg19), and primateAi <-> promoterAi
(hg38). Also includes promoterAi <-> alphaMissense cross-links.
refs #37274 #37279
- lines changed 2, context: html, text, full: html, text
de2ccf6d827865f11d3c8edd9ceeb1b6394a7380 Tue Apr 21 18:22:59 2026 -0700
PrimateAI-3D: label items by nucleotide change, add aaChange field and HTML mouseover.
Variant analysts typically work at the nucleotide level, and the current
item label (amino acid change) collapses distinguishable variants: ~17%
of items share their (chrom, pos, AA-change) tuple with another item
because of codon degeneracy (e.g. three C>A, C>G, C>T at the same
position can all appear as "M>I"). Labeling by nucleotide change makes
every item uniquely distinguishable (0.0% collisions on hg38, 0.1% on
hg19 from overlapping transcripts).
- primateAi.as: field 4 (name) is now "Nucleotide change (e.g. T>C)";
new field aaChange (placed before ref/alt) holds the amino acid
change.
- primateAiToBigBed.py: write name = "{ref}>{alt}", new aaChange column,
and an HTML mouseover with terse labels (Var/AA/Score/Perc/Pred) and
a colored prediction string.
- primateAi.ra: add labelFields name,aaChange and defaultLabelFields
name so users can toggle the on-feature label between nt change
(default) and AA change.
- primateAi.html: expand Display Conventions with the label-convention
rationale and a legend for each mouseover field.
refs #37274
- lines changed 1, context: html, text, full: html, text
d07e0de4fba2fc825dd1fdaa37a7cf1f66e4721d Fri Apr 24 17:36:42 2026 -0700
PrimateAI-3D: move /gbdb dir to _primateAi/ to match the underscore-prefix exclusion rule for hgdownload sync. refs #37274
- src/hg/makeDb/trackDb/human/promoterAi.html
- lines changed 51, context: html, text, full: html, text
f9a89b0e1ce3c937b4fbb879736c1619c35c271f Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278
Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.
Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.
trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.
Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.
- lines changed 16, context: html, text, full: html, text
6c567fd9a03e87610681a43d2183ebb43547d1ad Fri Apr 24 17:58:57 2026 -0700
PromoterAI: review followups. refs #37278
Move /gbdb/hg38/promoterAi/ to /gbdb/hg38/_promoterAi/ to match the
underscore-prefix exclusion rule for hgdownload sync (same pattern as
PrimateAI-3D under refs #37274). bigDataUrls and the makedoc updated.
Bump bigWig maxHeightPixels from 128:20:8 to 128:40:8 -- the peer-track
default of 20 is too cramped for a signed -1..+1 score.
Description page: drop the wrong primateai3d.basespace.illumina.com link
in Data Access; PromoterAI is not on BaseSpace, it's distributed via the
license agreement on the GitHub page (a download link is emailed after
submission). Reword Data Access and Methods accordingly.
Description page: add Illumina's recommended interpretation thresholds
(|score| >= 0.1, >= 0.2, >= 0.5) from the PromoterAI GitHub README, with
a note that higher cutoffs select smaller, higher-confidence sets.
- src/hg/makeDb/trackDb/human/promoterAi.ra
- lines changed 24, context: html, text, full: html, text
f9a89b0e1ce3c937b4fbb879736c1619c35c271f Tue Apr 21 12:11:02 2026 -0700
QA fixes for PromoterAI track. refs #37278
Description page: replaced the wrong reference (Gao et al. 2023, the PrimateAI-3D
paper) with the actual PromoterAI citation (Jaganathan et al. Science 2025, PMID
40440429), corrected the score-direction wording (negative = under-expression,
positive = over-expression, not "tolerated vs disruptive"), fixed the Data Access
source link (Illumina BaseSpace, not the GitHub repo), and corrected the mouseover
blurb to match mouseOverFunction noAverage behavior.
Converter and AS: the overlap bigBed now carries the real per-transcript strand
from the source TSV (was hardcoded '+'), with a new strands column in the AS, and
the name field concatenates unique gene symbols so bidirectional-promoter items
read as "HES4,ISG15" etc. BED score is now |PromoterAI|*1000 so scoreFilter is
meaningful. Rewrote the converter to stream (sorted input), which drops peak
memory from ~40 GB to a few MB.
trackDb: added filterLabel/filterLimits on scoreDiff (the filter was unusable
without labels), scoreFilter + scoreLabel, alwaysZero and autoScale off on the
bigWig subtracks, color 200,0,0 / altColor 0,0,200 so signed bigWig bars draw
red (over-expression) above zero and blue (under-expression) below, matching
the overlap track itemRgb. Added maxWindowToDraw and maxItems on the overlap
subtrack.
Makedoc updated to describe the streaming pipeline, the new strands column,
and the rebuild workflow.
- lines changed 9, context: html, text, full: html, text
6c567fd9a03e87610681a43d2183ebb43547d1ad Fri Apr 24 17:58:57 2026 -0700
PromoterAI: review followups. refs #37278
Move /gbdb/hg38/promoterAi/ to /gbdb/hg38/_promoterAi/ to match the
underscore-prefix exclusion rule for hgdownload sync (same pattern as
PrimateAI-3D under refs #37274). bigDataUrls and the makedoc updated.
Bump bigWig maxHeightPixels from 128:20:8 to 128:40:8 -- the peer-track
default of 20 is too cramped for a signed -1..+1 score.
Description page: drop the wrong primateai3d.basespace.illumina.com link
in Data Access; PromoterAI is not on BaseSpace, it's distributed via the
license agreement on the GitHub page (a download link is emailed after
submission). Reword Data Access and Methods accordingly.
Description page: add Illumina's recommended interpretation thresholds
(|score| >= 0.1, >= 0.2, >= 0.5) from the PromoterAI GitHub README, with
a note that higher cutoffs select smaller, higher-confidence sets.
- src/hg/makeDb/trackDb/relatedTracks.ra
- lines changed 15, context: html, text, full: html, text
50466766840ded6cb8bd5cb868bdf2ff3f613bc0 Tue Apr 21 11:17:15 2026 -0700
QA fixes for PrimateAI-3D track.
Config (primateAi.ra):
- Fix broken Ensembl transcript linkout: urls $S expanded to chromosome
name; switch to the Ensembl transcript page with $$
- Add numeric filters on percentile and raw score (label notes the
paper's 0.821 clinical threshold)
- Add maxWindowToDraw 2000000
Data (primateAiToBigBed.py):
- Change hardcoded strand '+' to '.': the source file has no strand
column
- Accept input/output paths as CLI args (previously hardcoded the hg38
input path)
- Handle variable field count: ~2.4M rows in the hg19 source are
missing the refseq column
Description (primateAi.html):
- Fix two broken hgTrackUi&... internal links to the Zoonomia 447-way
track
- Regenerate the first reference via getTrackReferences (wrong article
number and wrong PMC ID in the previous text)
- Fix the GitHub URL for the conversion script in Methods
- Move the Zoonomia 447-way mention out of Description; rephrase the
license note to describe precisely what is disabled
relatedTracks.ra:
- Add reciprocal cross-links for primateAi <-> alphaMissense (hg38),
primateAi <-> revel (hg38 + hg19), and primateAi <-> promoterAi
(hg38). Also includes promoterAi <-> alphaMissense cross-links.
refs #37274 #37279
- lines changed 6, context: html, text, full: html, text
4bd316f5f1ca47328bd3f9a181214b788055f0bc Tue Apr 21 13:29:26 2026 -0700
NMD Escape QA round 3: switch RefSeq to curated, fix Rule 2 misclassification. refs #33737
Switched the NMD Escape RefSeq subtrack input from hg38.ncbiRefSeq.txt.gz (all)
to hg38.ncbiRefSeqCurated.txt.gz (NM_/NR_ only, no XM_/XR_ predicted models)
per Max's feedback. longLabel updated to "NCBI RefSeq Curated transcripts".
Fixed Rule 2 in genePredNmdEsc to test rec["exonCount"]==1 instead of
len(cdsExons)==1. The old test misclassified multi-exon transcripts with a
single CDS exon (UTR introns) as "intronless" and silently suppressed their
Rule 1/3/4 assignments via the if/else short-circuit. 3,253 RefSeq curated
and ~2,000 Gencode transcripts reassigned from Rule 2 to Rules 1/3. Rebuilt
both tracks.
Added Rule 1 caveat to nmdEscTranscripts.html for transcripts with a
penultimate coding exon shorter than 50 bp.
Added reciprocal relatedTracks.ra entries for nmd <-> mane and nmd <-> ncbiRefSeq.
QA cleanups: non-ASCII prime char replaced with ′, mailing list links
given target="_blank" across all three HTML pages, dead commented nmdGencode
block removed from nmd.ra, AutoSQL field comments updated to cover Rule 4
color and the gene-symbol-to-transcript-ID fallback.
Makedoc updated with the full Gencode + RefSeq pipeline and /gbdb symlinks.
- lines changed 6, context: html, text, full: html, text
888e7470c14eeecdca310ed36bb45c3c00ae8052 Tue Apr 21 15:14:04 2026 -0700
QA fixes for MPRA superTrack. refs #37359
Fix broken mpraVarDb bigDataUrl — pointed at /gbdb/hg38/mpra/mpravardb.bb
but the file is at /gbdb/hg38/mpra/mpravardb/mpravardb.bb, causing
hgTrackDb -strict to silently drop the subtrack.
Rebuild mpravardb.bb after two fixes in mpravardbToBed.py: sanitize UTF-8
in user-visible string fields (curly quotes, primes, NBSP mojibake) that
the browser does not transcode, eliminating ~246k non-ASCII occurrences
across 42% of rows; and change safe_float / pval_to_score to write NaN
and return score 0 for NA / out-of-range p-values instead of 0.0 and
score 1000 (previously inflated untested variants to the top of
score-sorted views).
trackDb stanza cleanup: shorten mpraVarDb longLabel, drop superfluous
type bed 4 from superTrack, make bigBed 9+13 explicit, remove redundant
mouseOverField, align parent mpra on, add filterValues for
cell_line/assay/cellLine and filterByRange sliders for percentile_rank /
fdr / log2FC, add labelFields and maxWindowToDraw.
Description pages: add cross-species disclosure (mouse reporter cells
used to assay human sequences), update mpraVarDb header to post-liftOver
count 239,028 with Studies-table footnote, fix mpraVarDb.html
download-server paths, soften imprecise "51 MPRA experiments" claim in
mpra.html and mprabase.html.
relatedTracks.ra: reciprocal mpra <-> wgEncodeReg4 and mpra <-> cCREs.
Expand mpra.txt makedoc with upstream provenance and QA-rebuild log.
- src/utils/redmineCli
- lines changed 8, context: html, text, full: html, text
993da626132958795cab63a9b26d64ce2052f40d Tue Apr 21 16:51:13 2026 -0700
Make redmineCli prepend_attribution idempotent. refs #37339
Skip adding the '**From Claude:**' header if the body already begins
with a From Claude attribution line (any bold/italic asterisk variant,
case-insensitive). Fixes the periodic doubled header when Claude models
mimic prior journal entries that already carried the prefix.
switch to commits view, user index