fc0444e2770896dfa3e5d4c60b3ef4d506036183 gperez2 Thu Jan 29 11:41:50 2026 -0800 Updating the makedoc about the change in the script, which automatically detects and uses the most recent ncbiRefSeq patch version available, refs #36779 diff --git src/hg/makeDb/doc/hg19.txt src/hg/makeDb/doc/hg19.txt index 15a4f9ac5e2..0e5675d05e6 100644 --- src/hg/makeDb/doc/hg19.txt +++ src/hg/makeDb/doc/hg19.txt @@ -32239,32 +32239,34 @@ # Made a script using claude.ai that automates HGMD data processing for hg38 and hg19. # Location: ~/kent/src/hg/makeDb/scripts/hgmd/process_hgmd.py # What the script does: # 1. Creates BED files from HGMD TSV data with variant classifications # 2. Converts BED to BigBed format # 3. Creates symlinks in /gbdb/{db}/bbi/ # 4. Registers BigBed files with hgBbiDbLink # 5. Extracts transcript IDs from hg38 HGMD file (column 7) # 6. Filters ncbiRefSeq gene predictions to HGMD transcripts only # 7. Loads filtered gene predictions into ncbiRefSeqHgmd table # # Key features: # - Always uses hg38 file for transcript extraction (hg19 file lacks column 7) -# - Auto-detects ncbiRefSeq version: p13 for hg19, p14 for hg38 +# - Auto-detects and selects latest ncbiRefSeq patch version (p15, p14, p13, etc.) # - Falls back to previous years if specified year's ncbiRefSeq not found +# - Uses regex pattern matching to extract version and date from directory names +# Example: "ncbiRefSeq.p14.2024-09-18" -> extracts "p14" and "2024-09-18" # wc -l: 331959 /hive/data/genomes/hg19/bed/hgmd/hgmd.bed # wc -l: 15209 /hive/data/genomes/hg19/bed/hgmd/ncbiRefSeq.p13.2024-09-18/hgmd.curated.gp # Usage: python3 ~/kent/src/hg/makeDb/scripts/hgmd/process_hgmd.py --year 2025 --db hg19 # Sample output: # hg19 BigBed completed successfully! # Output files: /hive/data/genomes/hg19/bed/hgmd/hgmd.bed, /hive/data/genomes/hg19/bed/hgmd/hgmd.bb # Symlink created: /gbdb/hg19/bbi/hgmd.bb # hgBbiDbLink run: hgBbiDbLink hg19 hgmd /gbdb/hg19/bbi/hgmd.bb