3972ba54c468ace338d4a5578de1d20bf6c1f9ec lrnassar Mon Apr 20 15:39:26 2026 -0700 Adding Rule 4 (long-exon rule, Lindeboom 2016) to NMD Escape tracks and releasing on Apr. 22, 2026. refs #33737 Script: added a fourth rule to genePredNmdEsc. Coding exons longer than 400 bp (excluding the last coding exon, which is already covered by the 50 bp rule) are flagged as NMD-escape regions. Rebuilt the Gencode and NCBI RefSeq bigBed files. trackDb: - nmd.ra: appended "/400nt" to the nmdEsc longLabels, set nmdEscGencode default visibility to dense so the track is visible in cart-reset views, changed all four NMDetective subtracks from "visibility full" to "visibility hide", updated pennantIcon to the Apr. 22, 2026 release date and anchor. - nmd.html: mention long internal exons in the overview description, update the rule count from three to four. - nmdEscTranscripts.html: add the long-exon rule to the rule list and color legend (gold, #FFD700), expand the Background section with mechanisms for the intronless, start-proximal, and long-exon rules, correct the 50 bp rule description to include the entire last coding exon, fix Lindeboom 2016 author initials (RG -> RGH). News: - newsarch.html: add the 2026-04-22 NMD Escape news entry covering all four rules, with acknowledgements to Guido Neidhardt and Andreas Lahner for suggesting the track and the Decipher Genome Browser team for inspiring the visualization. - indexNews.html: add the front-page news link. makedoc: - nmd.txt: dated note for the Rule 4 rebuild. diff --git src/hg/makeDb/doc/hg38/nmd.txt src/hg/makeDb/doc/hg38/nmd.txt index ca02de5aac9..5a013417d73 100644 --- src/hg/makeDb/doc/hg38/nmd.txt +++ src/hg/makeDb/doc/hg38/nmd.txt @@ -1,61 +1,64 @@ ####################################################################### # NMD escape regions from Gencode (2025-03-24 max/Claude) # Two outputs: decorator bigBed (per-transcript) and collapsed bigBed (merged by coordinates) # Collapsed version uses gene symbols from input, colors by rule, transcript lists # Script accepts -f bigGenePred (gencode .bb) or -f genePredExt (ncbiRefSeq .txt.gz) +# +# 2026-04-20 lrnassar: Added Rule 4 (long-exon rule, Lindeboom 2016) - coding +# exons >400 nt excluding the last coding exon. Rebuilt Gencode + RefSeq. cd /hive/data/genomes/hg38/bed/nmd/gencode/ # run the script on gencode bigGenePred - produces decorator + collapsed BED files ~/kent/src/hg/makeDb/scripts/nmd/genePredNmdEsc -f bigGenePred \ /hive/data/genomes/hg38/bed/gencodeV49/build/hg38.gencodeV49.bb \ knownGeneNmdDeco.bed nmdEscRegions.bed # build decorator bigBed bedSort knownGeneNmdDeco.bed knownGeneNmdDeco.bed bedToBigBed knownGeneNmdDeco.bed ../../../chrom.sizes knownGeneNmdDeco.bb \ -tab -type=bed12+5 -as=${HOME}/kent/src/hg/makeDb/scripts/nmd/nmdEscDecoration.as # build collapsed bigBed bedSort nmdEscRegions.bed nmdEscRegions.bed bedToBigBed nmdEscRegions.bed ../../../chrom.sizes nmdEscRegions.bb \ -tab -type=bed9+2 -as=${HOME}/kent/src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as ####################################################################### # NMD escape regions from NCBI RefSeq (2025-03-24 max) cd /hive/data/genomes/hg38/bed/nmd/ncbiRefSeq/ # run the script on ncbiRefSeq genePredExt # Using all of RefSeq, not just refseq curated - good idea? # This is the file for RefSeq curated: /hive/data/genomes/hg38/bed/ncbiRefSeq.p14.2025-08-13/archive/hg38.ncbiRefSeqCurated.txt.gz ~/kent/src/hg/makeDb/scripts/nmd/genePredNmdEsc -f genePredExt \ /hive/data/genomes/hg38/bed/ncbiRefSeq.p14.2025-08-13/archive/hg38.ncbiRefSeq.txt.gz \ nmdNcbiRefSeqDeco.bed nmdEscNcbiRefSeq.bed # not building decorator file - needed? Useful? # build collapsed bigBed bedSort nmdEscNcbiRefSeq.bed nmdEscNcbiRefSeq.bed bedToBigBed nmdEscNcbiRefSeq.bed ../../../chrom.sizes nmdEscNcbiRefSeq.bb \ -tab -type=bed9+2 -as=${HOME}/kent/src/hg/makeDb/scripts/nmd/nmdEscCollapsed.as # symlink to gbdb ln -sf /hive/data/genomes/hg38/bed/nmd/ncbiRefSeq/nmdEscNcbiRefSeq.bb /gbdb/hg38/nmd/nmdEscNcbiRefSeq.bb ####################################################################### # Lindeboom et al. NMDetective scores (2025-03-23 max/Claude) # NMD efficiency predictions from Lindeboom et al. 2016, Nat Genet. # Four bedGraph custom track files downloaded to: # /hive/data/genomes/hg38/bed/nmd/lindeboom/ # Data downloaded from https://figshare.com/articles/dataset/NMDetective/7803398 # Custom track data in the session links from that page # - NMDetectiveA.ct - Random forest prediction of NMD efficiency # - NMDetectiveB.ct - Decision tree prediction of NMD efficiency # - nmdDectA-ptc.ct - Random forest, first out-of-frame PTC # - nmdDectB-ptc.ct - Decision tree, first out-of-frame PTC # Convert bedGraph custom tracks to bigWig and symlink from /gbdb: cd /hive/data/genomes/hg38/bed/nmd/lindeboom/ bash ~/kent/src/hg/makeDb/scripts/nmd/lindeboomToBigWig.sh