f93b8662afd701763c24634879d05dc08b3178de max Fri Jun 5 02:24:16 2026 -0700 Add exon search: jump to GENE exon N from position box I'm comitting this thinking that the way that we implement searches leads to duplication of code that doesn't look great to me. While this feature looks good, the code duplication across C/JS should probably get reduced with a different approach to the "quick jump" way of the page. We have currently three ways to quick jump, I think: - chr:start-end - rsxxxxx - gene symbol + autosuggest pick - HGVS? They are recognized by both the javascript and the C code with regexes. I think all of these should be probably be only implemented in the C code. The JS only sends the current string to the C code and then gets back if this can be autocompleted and to which position and what to show in the autosuggest area. For example if you type "SOD1<space>e" the C code could send back "Continue typing to jump to exon" and once you're at "SOD1<space>exon 5" the C code sends back "Hit enter to jump to chrX:123123-123213". This would work with any type of identifier and the code would stay in the C code, not more duplication and it would be much clearer to the user what is recognized in the search box. Users can now type "TP53 exon 5" or "TP53:e.5[+/-offset]" in the genome browser position/search box to navigate directly to that exon. The ":e.N" notation follows the VICC Gene Fusion Specification. An optional intronic offset (":e.5+2") lands N bases past the exon boundary, useful for splice site inspection. C (hgFind.c): findGeneExon() resolves the query against the SQL genePred tables listed in the hg.conf "geneTracks" key (default: mane, ncbiRefSeqSelect, knownGene, ncbiRefSeq, ncbiRefSeqHistorical). bigGenePred tracks (e.g. mane) are supported via bigBedOpenExtraIndex. Uses the existing exonToPos() function for strand-aware exon lookup. fixSinglePos() is called so hgp->singlePos is populated for callers. hgApi.c: new cmd=geneExonToPos returns {"pos":"chrom:start-end"} JSON so JS can navigate in place without a full page redirect to hgSearch. Direct URL links (hgTracks?position=GENE+exon+N) also work because findGeneExon() is hooked into hgPositionsFind(). JS: autocomplete.js injects a local "Jump to exon N" suggestion as soon as the exon pattern is detected, or a hint item when the query is still partial ("GENE ex"). Selecting either navigates via hgApi. hgTracks.js routes the two new autocomplete item types to the hgApi call. utils.js adds the two regexes (geneExonExp, geneExonCoordExp). query.html: documents both syntaxes; the :e.N notation links to the VICC Gene Fusion Specification at fusions.cancervariants.org. diff --git src/hg/htdocs/goldenPath/help/query.html src/hg/htdocs/goldenPath/help/query.html index f2c9861a993..9e2474660c0 100755 --- src/hg/htdocs/goldenPath/help/query.html +++ src/hg/htdocs/goldenPath/help/query.html @@ -26,63 +26,74 @@ <li> Chromosomal coordinate ranges</li> <li> Gene names</li> <li> Accession numbers</li> <li> An mRNA, EST or STS marker</li> <li> Keywords from the GenBank description of an mRNA</li> <li> <a href="http://varnomen.hgvs.org/" target="_blank">HGVS</a> terms</li> <li> gnomAD variant IDs</li> <li> + Exon positions: <code>SYMBOL exon N</code> or <code>SYMBOL:e.N[+/-offset]</code> + (e.g. <code>TP53 exon 5</code>, <code>BRCA2:e.10</code>, <code>NM_000546:e.5+2</code>)</li> + <li> HGVS and accession searches on outdated RefSeq accession versions is available on hg38</li> </ul> <p> To specify a genome position:</p> <ol> <li> Select the desired clade, genome and assembly</li> <li> Enter the desired query in the "Position/Search Term" box (see sample queries below)</li> <li> Click the "Go" button</li> </ol> <p> A query may have multiple results. If this is the case, a results page will appear listing each result along with the track it is associated with. Once selected, the result will be displayed in the Browser with a highlighted label, making it easier to identify. If you have further questions, you can search the <a href="/FAQ/index.html" target="_blank">Genome Browser FAQ</a> page and find links to further resources. Also, developers of track hubs can create <a href="hubQuickStartSearch.html" target="_blank">searchable track hubs</a> using the <a href="trix.html" target="_blank"><code>searchTrix</code></a> setting.</p> <p> -To quickly jump to a codon or exon of a gene transcript:</p> -<ol> - <li> - Use one of the searches below to jump to a gene, to show all transcripts of a gene or range of interest - <li> - Right-click any transcript, select "Choose exon" or "Zoom to codon" and enter the exon - or codon position of interest -</ol> -</p> +To jump directly to a specific exon, type the gene symbol or transcript ID followed by the +exon number in one of two formats:</p> +<ul> + <li><code>SYMBOL exon N</code> — e.g. <code>TP53 exon 5</code> or <code>NM_000546 exon 5</code></li> + <li><code>SYMBOL:e.N</code> — compact notation from the + <a href="https://fusions.cancervariants.org/en/latest/" target="_blank">VICC Gene Fusion Specification</a>, + e.g. <code>TP53:e.5</code> or <code>NM_000546:e.5</code>. Optionally add an intronic offset: + <code>SYMBOL:e.N+offset</code> navigates <em>offset</em> bases past the 3’ end of the exon + (into the downstream intron), and <code>SYMBOL:e.N-offset</code> navigates <em>offset</em> bases + before the 5’ start (into the upstream intron). Useful for splice site inspection: + <code>BRCA2:e.10+2</code> lands 2 bp into the intron after exon 10.</li> +</ul> +<p> +Exon numbering is 1-based and follows transcript order (exon 1 is the 5′ exon). +Genes are looked up in order: MANE, GENCODE/UCSC (knownGene), +all RefSeq, then historical RefSeq. To jump to a codon instead, right-click +any transcript in the browser and select “Zoom to codon”.</p> <h2>Sample queries</h2> <p> Below is a list of examples that might be used to query the Genome Browser. Note that not every query listed here will produce a result in every assembly. The list serves only to illustrate the different types of queries that can be performed. <table border="1"> <tr><th width="200">Query</th><th width="250">Genome Browser Response</th></tr> <tr> <td>chr7</td> <td>Displays all of chromosome 7</td></tr> <tr> <td>chr3:1-1000000</td> <td>Displays the first million bases of chromosome 3, counting from the p-arm telomere</td></tr> <tr> @@ -137,30 +148,40 @@ <td>PRNP</td> <td>Displays the region containing HUGO Gene Nomenclature Committee identifier PRNP</td></tr> <tr> <tr> <td>Q99697</td> <td>Displays the region containing the alignment of the UniProt/SwissProt protein sequence with accession Q99697 (PITX2)</td></tr> <tr> <td nowrap>RH18061;RH80175<br>15q11;15q13<br>NM_012090.5;NM_012421.4</td> <td nowrap>Displays the region between genome landmarks, such as the STS markers RH18061 and RH80175, or chromosome<br> bands 15q11 to 15q13, or SNPs NM_000310.4 and NM_012090.5. This syntax may also be used for other range queries,<br>such as between uniquely determined ESTs, mRNAs, refSeqs, SNPS, etc.</td></tr> <tr> <td>NR_026861.1:1-1000</td> <td>Works with any other type of accession from this page: Displays the first 1000bp of NR_026861.1</td></tr> + <tr> + <td nowrap>TP53 exon 5<br>NM_000546 exon 5</td> + <td>Jumps to exon 5 of TP53 using the verbose notation. + A transcript ID (NM_, NR_, XM_, XR_, ENST) may be used in place of the gene symbol.</td></tr> + <tr> + <td nowrap>TP53:e.5<br>NM_000546.6:e.5<br>BRCA2:e.10+2<br>BRCA2:e.10-3</td> + <td>Compact exon notation from the + <a href="https://fusions.cancervariants.org/en/latest/" target="_blank">VICC Gene Fusion Specification</a>. + Jumps to exon 5 of TP53, or to 2 bp past the end / 3 bp before the start of BRCA2 exon 10 + (useful for splice site inspection). The <code>+N</code>/<code>-N</code> offset is optional.</td></tr> <tr id="HGVS"> <td nowrap>NM_000310.4(PPT1):c.271_287del17insTT<br> NM_007262.5(PARK7):c.-24+75_-24+92dup<br> NM_006172.4(NPPA):c.456_*1delAA<br> MYH11:c.503-14_503-12del<br> NM_198576.4(AGRN):c.1057C>T<br> NM_198056.3:c.1654G>T<br> NP_002993.1:p.Asp92Glu<br> NP_002993.1:p.D92E<br> BRCA1 Ala744Cys<br> BRCA1 A744C<br> LRG_100t1:c.4G>A<br> LRG_100t1:n.1<br> LRG_456p1:p.Ser190Leu<br>LRG_321:g.16409_16461del<br>ENST00000002596.6:c.-108-6848A>G<br> ENSP00000005178.5:p.Val20Gly<br> chrX:g.31500000_31600000del<br> NR_111987:n.-1 <br> NM_015102.5:n.3038-2<br> NM_001372044:c.1528_1530del</td> <td>Displays the region that matches the <a href="http://varnomen.hgvs.org/" target="_blank">HGVS</a> expression, usually in the format <tt><transcript or protein>:<position> <amino acid or nucleotide change></tt><br>If a gene symbol is used, HGVS search will try all RefSeq transcripts to find the nucleotide or amino acid at the position indicated in the expression. If there are multiple matches, a disambiguation page will be shown. If the RefSeq sequence differs from the genome sequence, then currently the search will use the genome, not the transcript, for codon counting and amino acid / nucleotide comparison. Please contact us if this is inconvenient.</td></tr> <tr> <td>NM_198056.2:c.1A>C</td> <td>An example of an HGVS search on a previous NM version that is now outdated. Support for previous NM accessions is only available on hg38.</td></tr>