a8284e2d5f9f74b52d107a8b99de3e8a03a263a2 lrnassar Tue May 26 11:09:11 2026 -0700 Adding PMS2CL Paralog Variants track to InSiGHT hub. Projects PMS2 curated ClinVar variants onto PMS2CL coordinates to help analysts recognize PMS2CL calls that may correspond to known PMS2 variants. refs #36582 diff --git src/hg/makeDb/doc/InSiGHT.txt src/hg/makeDb/doc/InSiGHT.txt index 59a95fca4d9..b3dc98eaaa1 100644 --- src/hg/makeDb/doc/InSiGHT.txt +++ src/hg/makeDb/doc/InSiGHT.txt @@ -1,176 +1,196 @@ #RM#36582 # InSiGHT VCEP Track Hub # International Society for Gastrointestinal Hereditary Tumours (InSiGHT) # Variant Curation Expert Panel (VCEP) # Lynch syndrome mismatch repair genes: MLH1, MSH2, MSH6, PMS2 # CSpec v2.0.0 # Assemblies: hg38 and hg19 # Working directory for all track data mkdir -p /hive/users/lrnassar/insightHub # Build scripts are located here: ~/kent/src/hg/makeDb/scripts/insight/ # Quick link for github: # https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/insight # Hub structure: # /hive/users/lrnassar/insightHub/ # hub.txt, genomes.txt # hg38/trackDb.txt, hg19/trackDb.txt # insight.html (shared description page) # clinDomains/ - Clinical Domains track data # pvs1/ - PVS1 Regions track data # afFrequencies/ - Allele Frequencies track data # hciPriors/ - HCI Priors track data # functionalAssays/ - Functional Assays track data # lovdVars/ - InSiGHT Curated Variants track data # pms2Caution/ - PMS2 Pseudogene Caution track data # Canonical transcripts used across all tracks: # MLH1: NM_000249.4 (chr3, + strand) # MSH2: NM_000251.3 (chr2, + strand) # MSH6: NM_000179.3 (chr2, + strand) # PMS2: NM_000535.7 (chr7, - strand) ############################################################################## # Track 1: Clinical Domains (PM1) ############################################################################## # Clinically relevant protein domains for the 4 MMR genes. # Domain definitions are hardcoded in the script from the InSiGHT VCEP specs. # Generates bigBed 9+4 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/clinDomains python3 ~/kent/src/hg/makeDb/scripts/insight/insightClinDomains.py # Output: InSiGHTclinDomainsHg38.bb, InSiGHTclinDomainsHg19.bb ############################################################################## # Track 2: PVS1 Regions ############################################################################## # PVS1 decision tree regions based on NMD predictions and critical functional # regions. Gene-specific codon boundaries from the InSiGHT VCEP specs: # MLH1: NMD <=684, CritRegion 685-753, FuncUnknown 754-756, n.a. >756 # MSH2: NMD <=861, CritRegion 862-891, FuncUnknown 892-934, n.a. >934 # MSH6: NMD <=1317, CritRegion 1318-1341, FuncUnknown 1342-1360, n.a. >1360 # PMS2: NMD <=798, FuncUnknown 799-862, n.a. >862 # Generates bigBed 9+3 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/pvs1 python3 ~/kent/src/hg/makeDb/scripts/insight/insightPVS1.py # Output: InSiGHTPVS1Hg38.bb, InSiGHTPVS1Hg19.bb ############################################################################## # Track 3: Allele Frequencies (BA1/BS1/PM2) ############################################################################## # ACMG allele frequency classifications from gnomAD v4.1 exomes. # Gene-specific thresholds from the InSiGHT VCEP specs. # Requires access to gnomAD v4.1 bigBed files in /gbdb/hg38/gnomAD/v4.1/exomes/ # Generates bigBed 9+3 files for hg38 and hg19 (hg19 via liftOver). cd /hive/users/lrnassar/insightHub/afFrequencies python3 ~/kent/src/hg/makeDb/scripts/insight/insightAFfrequencies.py # Output: InSiGHTAFHg38.bb, InSiGHTAFHg19.bb ############################################################################## # Track 4: HCI Priors (PP3/BP4) ############################################################################## # HCI prior probability predictions for missense variants. # Source data: LOVD database exports (tab-delimited files downloaded manually # from the LOVD shared database for each gene's priors table). # Requires LOVD priors files in the hciPriors/ directory: # LOVD_MLH1_priors_*.txt # LOVD_MSH2_priors_*.txt # LOVD_MSH6_priors_*.txt # LOVD_PMS2_priors_*.txt # Thresholds: PP3_moderate >0.81, PP3_supporting 0.68-0.81, BP4_supporting <0.11 # Generates bigBed 9+5 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/hciPriors python3 ~/kent/src/hg/makeDb/scripts/insight/insightHCIPriors.py # Output: InSiGHTHCIPriorsHg38.bb, InSiGHTHCIPriorsHg19.bb ############################################################################## # Track 5: Functional Assays (PS3/BS3) ############################################################################## # Functional assay evidence from 4 publications: # Drost et al. 2018 (PMID:30504929) - 74 MLH1/MSH2 variants, CIMRA assay # Drost et al. 2020 (PMID:31965077) - 87 MSH6 variants, CIMRA assay # Jia et al. 2021 (PMID:33357406) - 16,749 MSH2 variants, deep mutational scan # Rath et al. 2022 (PMID:36054288) - 26 MLH1 variants, cell-based assay # # Requires supplementary data files in the functionalAssays/ directory: # drost2020_supplement.docx (Drost 2020 S1/S3/S5 tables) # mmc2.xlsx (Jia 2021 TableS4/S5) # (Drost 2018 and Rath 2022 data are hardcoded from their supplements) # # Also requires openpyxl: pip install openpyxl # Generates bigBed 9+7 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/functionalAssays python3 ~/kent/src/hg/makeDb/scripts/insight/insightFunctionalAssays.py # Output: insightFunctionalAssaysHg38.bb, insightFunctionalAssaysHg19.bb ############################################################################## # Track 6: InSiGHT Curated Variants (from ClinVar) ############################################################################## # InSiGHT VCEP expert panel classifications fetched from ClinVar API. # Queries ClinVar for variants submitted by InSiGHT on MLH1, MSH2, MSH6, PMS2. # No local data files needed -- fetches directly from NCBI E-utilities. # This is the track that should be periodically rebuilt (ClinVar updates monthly). # Generates bigBed 9+7 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/lovdVars python3 ~/kent/src/hg/makeDb/scripts/insight/buildInsightClinVar.py -# Output: insightClinVarHg38.bb, insightClinVarHg19.bb +# Output: +# insightClinVarHg38.bb, insightClinVarHg19.bb (curated variants track) +# pms2clParalogVarsHg38.bb, pms2clParalogVarsHg19.bb (PMS2CL paralog +# projection track -- built in the same script run, see Track 8 below) ############################################################################## # Track 7: PMS2 Pseudogene Caution Regions ############################################################################## # PMS2 exons with high sequence homology to the PMS2CL pseudogene, plus the # PMS2CL pseudogene region itself. Flags regions where short-read NGS variant # calls may be pseudogene-derived. Per-exon caution levels: # Exons 1-8, 10: Safe (no homology) # Exon 9: Moderate (~98% homology) # Exons 11-15: High (>=99% homology) # PMS2CL: Pseudogene region (chr7:6,735,304-6,751,601 hg38) # Generates bigBed 9+5 files for hg38 and hg19. cd /hive/users/lrnassar/insightHub/pms2Caution python3 ~/kent/src/hg/makeDb/scripts/insight/insightPMS2Caution.py # Output: InSiGHTPMS2CautionHg38.bb, InSiGHTPMS2CautionHg19.bb +############################################################################## +# Track 8: PMS2CL Paralog Variants +############################################################################## + +# PMS2 curated variants from Track 6 projected onto PMS2CL coordinates, to help +# analysts recognize when a PMS2CL call may correspond to a known PMS2 variant. +# Uses the cDNA alignment between NM_000535.7 and NR_002217.1 with offset: +# PMS2 c. = PMS2CL n. + 1060 (most of the homologous region) +# PMS2 c. = PMS2CL n. + 903 (exon 9) +# PMS2 c. = PMS2CL n. + 1059 (exon 11 before the 1-bp indel) +# Built as part of the buildInsightClinVar.py run (Track 6). +# Generates bigBed 9+5 files for hg38 and hg19. ~36 projected variants. + +# (No separate command -- comes out of the Track 6 build above.) + +# Output: pms2clParalogVarsHg38.bb, pms2clParalogVarsHg19.bb (in lovdVars/) + ############################################################################## # Hub deployment ############################################################################## # The hub is served from: # https://hgwdev-lrnassar.gi.ucsc.edu/~lrnassar/track_hubs/insightHub/hub.txt # # The public_html symlink points to the working directory: # /cluster/home/lrnassar/public_html/track_hubs/insightHub -> /hive/users/lrnassar/insightHub # # To rebuild all tracks from scratch: cd /hive/users/lrnassar/insightHub cd clinDomains && python3 ~/kent/src/hg/makeDb/scripts/insight/insightClinDomains.py && cd .. cd pvs1 && python3 ~/kent/src/hg/makeDb/scripts/insight/insightPVS1.py && cd .. cd afFrequencies && python3 ~/kent/src/hg/makeDb/scripts/insight/insightAFfrequencies.py && cd .. cd hciPriors && python3 ~/kent/src/hg/makeDb/scripts/insight/insightHCIPriors.py && cd .. cd functionalAssays && python3 ~/kent/src/hg/makeDb/scripts/insight/insightFunctionalAssays.py && cd .. cd lovdVars && python3 ~/kent/src/hg/makeDb/scripts/insight/buildInsightClinVar.py && cd .. cd pms2Caution && python3 ~/kent/src/hg/makeDb/scripts/insight/insightPMS2Caution.py && cd ..