b88a0d542ed4fb37af9aef0ba798ac92f0eb74ce
max
  Tue Feb 25 03:40:19 2025 -0800
docing duplicate refseqs in par regions, refs #35222

diff --git src/hg/htdocs/FAQ/FAQgenes.html src/hg/htdocs/FAQ/FAQgenes.html
index 62eee32e1f7..4a7c7f65c1c 100755
--- src/hg/htdocs/FAQ/FAQgenes.html
+++ src/hg/htdocs/FAQ/FAQgenes.html
@@ -203,46 +203,56 @@
 
 <p>The Genome Browser Group only displays transcripts provided by others. 
 But both RefSeq and Gencode have dedicated staff that look manually at each and every transcript and they 
 know everything there is to know about gene models.
 They are happy to answer your questions and they can change the transcript annotation. Submit your questions
 via the <a href="https://www.ncbi.nlm.nih.gov/projects/RefSeq/update.cgi" target=_blank>RefSeq contact form</a>
 or the <a href="https://www.gencodegenes.org/pages/contact.html" target=_blank>Gencode context form.</a>
 </p>
 
 <a name="duplicates"></a>
 <h6>Why does the UCSC RefSeq track ("refGene") include duplicates, and some transcripts map to two loci?</h6>
 
 <p>This is related to the question <a href="#ncbiRefSeq">What is the difference between "NCBI RefSeq" and "UCSC RefSeq"?</a>
 below. Briefly, the UCSC refGene track aligns the RefSeq transcripts to the genome with BLAT, with no special filtering but a
 95% identity, the NCBI RefSeq track is NCBI's mapping and the NCBI alignments were filtered using manual annotations
-to make sure that a transcript is mapped only once, even if it is perfectly aligning twice. NCBI uses manual curation
-to decide on the best placement, for example, if a gene is annotated on chr4, any alignments, even 100% identical,
-from other chromosomes are removed. As a result, the UCSC RefSeq track contains duplicates if the transcripts align
-very well to both loci and alerts the user to this fact, where as the NCBI alignments were filtered manually
-to make sure that every transcript maps only once.
+to make sure that a transcript is mapped only once, even if it is perfectly
+aligning twice (there is one exception, genes in the PAR regions, see the
+paragraph below). NCBI uses manual curation to decide on the best placement,
+for example, if a gene is annotated on chr4, any alignments, even 100%
+identical, from other chromosomes are removed. As a result, the UCSC RefSeq
+track contains duplicates if the transcripts align very well to both loci and
+alerts the user to this fact, where as the NCBI alignments were filtered
+manually to make sure that every transcript maps only once.
+</p>
+<p>
+NCBI's transcript mapping which we provide in our NCBI RefSeq track, does
+contain a few duplicates, but these have a biological explanation: They are
+transcripts in the <a target=_blank href='https://en.wikipedia.org/wiki/Pseudoautosomal_region'>pseudoautosomal regions</a> 
+(PARs), so they have identical sequences and by NCBI rules this means identical
+accessions. See the section below for how Ensembl/Gencode handle these cases.
 </p>
 
 <a name="duplicatesEns"></a>
 <h6>Why does the Gencode/Ensembl tracks ("knownGene", "ensGene" or "wgEncodeGencodeVXX") include a few duplicates, and some transcripts map to two loci?</h6>
 
 <p>The human genome has seven genes located in the <a target=_blank
 href='https://en.wikipedia.org/wiki/Pseudoautosomal_region'>pseudoautosomal regions</a> (PARs),
 which have identical sequences on both chrX and chrY. The Ensembl team assigned these genes
 identical accessions due to their identical sequences. Since Ensembl release 110 (identical to
 Gencode release 44), these genes now receive distinct accessions. If you encounter duplicates in
-Ensembl/Gencode files, they likely originate from versions predating this update at the EBI.
+Ensembl/Gencode files, they likely originate from file versions predating this update at the EBI.
 </p>
 
 
 <a name="ens"></a>
 <h2>The differences</h2>
 
 Some of our gene tracks look similar and contain very similar information which can be confusing.
 
 <h6>What are Ensembl and GENCODE and is there a difference?</h6>
 
 <p> 
 Officially, the Ensembl and GENCODE gene models are the same. On the latest human and mouse genome 
 assemblies (hg38 and mm10), the identifiers, transcript sequences, and exon coordinates are almost
 identical between equivalent Ensembl and GENCODE versions (excluding <a target=_blank 
 href="FAQdownloads.html#downloadAlt">alternative sequences</a> or <a target=_blank