src/hg/htdocs/FAQ/FAQgenes.html cac0d55cab41854d0d152a5dfbd1ced98228618d

cac0d55cab41854d0d152a5dfbd1ced98228618d
gperez2
  Tue Feb 25 09:57:55 2025 -0800
Updating Max entry of the 'Why does the UCSC RefSeq track (refGene) include duplicates' FAQ, refs #35222

diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html
index 4a7c7f65c1c..25b2597126d 100755
--- src/hg/htdocs/FAQ/FAQgenes.html
+++ src/hg/htdocs/FAQ/FAQgenes.html
@@ -213,34 +213,34 @@
 <h6>Why does the UCSC RefSeq track ("refGene") include duplicates, and some transcripts map to two loci?</h6>
 
 <p>This is related to the question <a href="#ncbiRefSeq">What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</a>
 below. Briefly, the UCSC refGene track aligns the RefSeq transcripts to the genome with BLAT, with no special filtering but a
 95% identity, the NCBI RefSeq track is NCBI's mapping and the NCBI alignments were filtered using manual annotations
 to make sure that a transcript is mapped only once, even if it is perfectly
 aligning twice (there is one exception, genes in the PAR regions, see the
 paragraph below). NCBI uses manual curation to decide on the best placement,
 for example, if a gene is annotated on chr4, any alignments, even 100%
 identical, from other chromosomes are removed. As a result, the UCSC RefSeq
 track contains duplicates if the transcripts align very well to both loci and
 alerts the user to this fact, where as the NCBI alignments were filtered
 manually to make sure that every transcript maps only once.
 </p>
 <p>
-NCBI's transcript mapping which we provide in our NCBI RefSeq track, does
-contain a few duplicates, but these have a biological explanation: They are
+NCBI's transcript mapping, which we provide in our NCBI RefSeq track, does
+contain a few duplicates, but these have a biological explanation: they are
 transcripts in the <a target=_blank href='https://en.wikipedia.org/wiki/Pseudoautosomal_region'>pseudoautosomal regions</a>
-(PARs), so they have identical sequences and by NCBI rules this means identical
+(PARs). Because they have identical sequences, NCBI rules assign them identical
 accessions. See the section below for how Ensembl/Gencode handle these cases.
 </p>
 
 <a name="duplicatesEns"></a>
 <h6>Why does the Gencode/Ensembl tracks ("knownGene", "ensGene" or "wgEncodeGencodeVXX") include a few duplicates, and some transcripts map to two loci?</h6>
 
 <p>The human genome has seven genes located in the <a target=_blank
 href='https://en.wikipedia.org/wiki/Pseudoautosomal_region'>pseudoautosomal regions</a> (PARs),
 which have identical sequences on both chrX and chrY. The Ensembl team assigned these genes
 identical accessions due to their identical sequences. Since Ensembl release 110 (identical to
 Gencode release 44), these genes now receive distinct accessions. If you encounter duplicates in
 Ensembl/Gencode files, they likely originate from file versions predating this update at the EBI.
 </p>