d65389eaca84cfb6c3beb9a731ddc89c48b6fc58 jnavarr5 Mon Mar 31 15:03:35 2025 -0700 Moving the <p> tag before the 'Data format' to match the styling in the rest of this FAQ section. no Remdine diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index 04bb7f4bdc9..64ca3c1abb0 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -408,31 +408,33 @@ </p> <p> An anecdotal and rare example is SHANK2 and SHANK3 in hg19. It is impossible for either NCBI or BLAT to get the correct alignment and gene model because the genome sequence is missing for part of the gene. NCBI and BLAT find slightly different exon boundaries at the edge of the problematic region. NCBI's aligner tries very hard to find exons that align to any transcript sequence, so it calls a few small dubious "exons" in the affected genomic region. GENCODE V19 also used an aligner that tried very hard to find exons, but it found small dubious "exons" in different places than NCBI. The <a target=_blank href="../cgi-bin/hgTrackUi?db=hg38&g=refSeqComposite">RefSeq Alignments</a> subtrack makes the problematic region very clear with double lines indicating unalignable transcript sequence. </p> -<b>Data format:</b> <p>A small difference is the data format, which matters if you integrate our files into pipelines: +<p> +<b>Data format:</b> +A small difference is the data format, which matters if you integrate our files into pipelines: The refGene table qName field stores the RefSeq accession but without the version number. The ncbiRefSeq tables show the full accession, with the version number. To add the version number to the refGene table, use a MySQL command like this: <pre> SELECT matches,misMatches,repMatches,nCount,qNumInsert,qBaseInsert,tNumInsert,tBaseInsert,strand,concat(qName, '.', gbSeq.version),qSize,qStart,qEnd,tName,tSize,tStart,tEnd,blockCount,blockSizes,qStarts,tStarts from refSeqAli, hgFixed.gbSeq WHERE refSeqAli.qname=gbSeq.acc</pre> <p>To remove the transcripts on haplotypes, add this condition at the end:</p> <pre>and tName NOT LIKE '%_hap%' AND tName not like '%_alt%' AND tNAME NOT LIKE '%_fix%'</pre> <p>A word of caution on the NCBI RefSeq track on hg19: NCBI is not fully supporting hg19 anymore. As a result, some genes are not located on the main chromosomes in anymore. An example is NM_001129826/CSAG3. For hg19, you may prefer UCSC RefSeq for now.</p> <a name="mito"></a> <h2>What is the best gene track for mitochondrial gene annotations</h2> <p> The mitochondrial sequence included in assembly sequence files is a special case and most of what has been explained on this page does not apply