99aa2466dea4cf74a4a226debfcbbcf388637c3a gperez2 Mon Feb 24 09:56:52 2025 -0800 Updating the duplicate IDs in ensemb/gencod FAQ entry, refs #35222 diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html index fab6a0fefe0..62eee32e1f7 100755 --- src/hg/htdocs/FAQ/FAQgenes.html +++ src/hg/htdocs/FAQ/FAQgenes.html @@ -213,35 +213,37 @@
This is related to the question What is the difference between "NCBI RefSeq" and "UCSC RefSeq"? below. Briefly, the UCSC refGene track aligns the RefSeq transcripts to the genome with BLAT, with no special filtering but a 95% identity, the NCBI RefSeq track is NCBI's mapping and the NCBI alignments were filtered using manual annotations to make sure that a transcript is mapped only once, even if it is perfectly aligning twice. NCBI uses manual curation to decide on the best placement, for example, if a gene is annotated on chr4, any alignments, even 100% identical, from other chromosomes are removed. As a result, the UCSC RefSeq track contains duplicates if the transcripts align very well to both loci and alerts the user to this fact, where as the NCBI alignments were filtered manually to make sure that every transcript maps only once.
There are seven genes in the PAR regions -of the human genome. These genes have identical sequences on chrX and chrY. Because of -the identical sequences, they used to be given identical accessions by the Ensembl team. -Since Ensembl release 110 (identical to Gencode release 44), these genes get different -accessions. If you see duplicates in Ensembl/Gencode files, these probably predate the changes at the EBI.
+The human genome has seven genes located in the pseudoautosomal regions (PARs), +which have identical sequences on both chrX and chrY. The Ensembl team assigned these genes +identical accessions due to their identical sequences. Since Ensembl release 110 (identical to +Gencode release 44), these genes now receive distinct accessions. If you encounter duplicates in +Ensembl/Gencode files, they likely originate from versions predating this update at the EBI. +
Officially, the Ensembl and GENCODE gene models are the same. On the latest human and mouse genome assemblies (hg38 and mm10), the identifiers, transcript sequences, and exon coordinates are almost identical between equivalent Ensembl and GENCODE versions (excluding alternative sequences or fix sequences).