9eda90a2965a1f62f801718e26fda708bcf7a950
lrnassar
  Tue May 24 17:37:59 2022 -0700
Creating new GenCC annotation track for hg19 and hg38, refs #28166

diff --git src/hg/makeDb/trackDb/human/genCC.html src/hg/makeDb/trackDb/human/genCC.html
new file mode 100644
index 0000000..92002d2
--- /dev/null
+++ src/hg/makeDb/trackDb/human/genCC.html
@@ -0,0 +1,130 @@
+<h2>Description</h2>
+
+<p>
+This track shows annotations from <a target="_blank"
+href="https://thegencc.org/">The Gene Curation Coalition (GenCC)</a>.
+The GenCC provides information pertaining to the validity of gene-disease relationships, 
+with a current focus on Mendelian diseases. Curated gene-disease relationships are submitted 
+by GenCC member organizations that currently provide online resources (e.g. ClinGen, DECIPHER, 
+Orphanet, etc.), as well as diagnostic laboratories that have committed to sharing their internal 
+curated gene-level knowledge (e.g. Ambry, Illumina, Invitae, etc.).</p>
+<p>
+The GenCC aims to clarify overlap between gene curation efforts and develop
+consistent terminology for validity, allelic requirement and mechanism
+of disease. Each item on this track corresponds with a gene, and contains
+a large number of information such as associated disease, evidence classification,
+specific submission notes and identifiers from different databases. In cases where
+multiple annotations exist within the same gene, multiple items are displayed.</p> 
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each item displayed represents a submission to the GenCC database. The displayed 
+name is a combination of the gene symbol and the disease's original submission ID. 
+This submission ID is either the OMIM#, MONDO# or Orphanet#. Clicking
+on any item will display the complete meta data for that item, including
+linkouts to the GenCC, NCBI, Ensembl, HGNC, genecards, pombase (MONDO),
+and Human Phenotype Ontology (HP). Mousing over any item will display the
+associated disease for that submission.</p>
+
+<p>
+Items are colored based on the GenCC classification, or validation, of the
+evidence in the color scheme seen in the table below. 
+For more information on this process see the <a target="_blank"
+href="https://thegencc.org/faq.html#validity-termsdelphi-survey">GenCC
+validity terms FAQ</a>. A filter for the track is also available
+to display a subset of the items based on their classification.</p>
+<p>
+
+<p>
+<table cellpadding='2'>
+  <thead><tr>
+    <th style="border-bottom: 2px solid;">Color</th>
+    <th style="border-bottom: 2px solid;">Evidence classification</th>
+  </tr></thead>
+  <tr><td style="background-color: #27C149"></td><td>Definitive</td></tr>
+  <tr><td style="background-color: #38A169"></td><td>Strong</td></tr>
+  <tr><td style="background-color: #68D391"></td><td>Moderate</td></tr>
+  <tr><td style="background-color: #63B3ED"></td><td>Supportive</td></tr>
+  <tr><td style="background-color: #FC8181"></td><td>Limited</td></tr>
+  <tr><td style="background-color: #E53E3E"></td><td>Disputed Evidence</td></tr>
+  <tr><td style="background-color: #9B2C2C"></td><td>Refuted Evidence</td></tr>
+  <tr><td style="background-color: #718096"></td><td>No Known Disease Relationship</td></tr>
+</table>
+</p>
+
+<p>
+<b>Limitations:</b> Most entries include both NM_ accessions as well as ESNT and ENSG identifiers.
+From the original file, which contains no coordinates, two genes were not mapped
+to the hg38 genome, SLCO1B7 and ATXN8. This results in two fewer items part
+of this track which can be found in the GenCC database. For hg19 one additional
+gene was not mapped, KCNJ18. In addition to this the GenCC data in the Genome
+Browser does not include OMIM data due to licensing restrictions. For more
+information see the Methods section below.</p>
+
+<h2>Data access</h2>
+<p>
+The source data can be explored in <a target="_blank" href="https://search.thegencc.org/">
+GenCC database</a>. The source files can also be found on the <a target="_blank"
+href="https://search.thegencc.org/download">GenCC downloads page</a>.</p>
+
+<p>
+The GenCC data on the UCSC Genome Browser can be explored interactively with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a>.
+For automated download and analysis, the genome annotation is stored at UCSC in bigBed
+files that can be downloaded from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/genCC.bb" target="_blank">our download server</a>.
+The data may also be explored interactively using our
+<a href="../goldenPath/help/api.html" target="_blank">REST API</a>.</p>
+
+<p>
+The file for this track may also be locally explored using our tools <tt>bigBedToBed</tt> 
+which can be compiled from the source code or downloaded as a precompiled
+binary for your system. Instructions for downloading source code and binaries can be found
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.
+The tools can also be used to obtain features confined to a given range, e.g.,
+<br><br>
+<tt>bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/genCC.bb stdout</tt></p>
+
+<h2>Methods</h2>
+
+<p>
+The data were downloaded from the <a target="_blank" 
+href="https://search.thegencc.org/download">GenCC downloads page</a> in tsv format. Manual
+curation was performed on the file to remove newline characters and tab characters present in 
+the submission notes, in total fewer than 20 manual edits were made.</p>
+<p>
+The track was first built on hg38 by associating the gene symbols with the NCBI MANE 1.0 
+release transcripts. These coordinates were added to the items as well as the NM_ accession,
+ENST ID and ENSG ID. For items where there was no gene symbol match in MANE (~130), the gene
+symbols were queried against GENCODEv40 comprehensive set release. In places where multiple
+transcript matches were found, the earliest transcription start and latest end site was used
+from among the transcripts to encompass the entire gene coordinates. Two genes were not able
+to be mapped for hg38, SLCO1B7 and ATXN8, resulting in two missing submissions in the Genome
+Browser when compared to the raw file. Lastly, the items were colored according to their
+evidence classification as seen on the GenCC database.</p>
+<p>
+For hg19, the hg38 NM_ accessions were used to convert the item coordinates according to the
+latest hg19 refseq release. For items that failed to convert, the gene symbols were queried
+using the GENCODEv40 hg19 lift comprehensive set. One additional gene symbol failed to map in
+hg19, KCNJ18, leading to 3 fewer items on this track when compared to the raw file.</p>
+<p>
+For both assemblies, GenCC OMIM data is excluded do to data restrictions.
+For complete documentation of the processing of these tracks, read the
+<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/genCC.txt">
+GenCC MakeDoc</a>.</p>
+
+<h2>Credits</h2>
+<p>
+Thanks to the entire <a target="_blank" href="https://thegencc.org/about.html">GenCC
+committee</a> for creating these annotations and making them available.</p>
+
+<h2>References</h2>
+<p>
+DiStefano MT, Goehringer S, Babb L, Alkuraya FS, Amberger J, Amin M, Austin-Tse C, Balzotti M, Berg
+JS, Birney E <em>et al</em>.
+<a href="https://www.gimjournal.org/article/S1098-3600(22)00746-8/fulltext" target="_blank">
+The Gene Curation Coalition: A global effort to harmonize gene-disease evidence resources</a>.
+<em>Genet Med</em>. 2022 May 4;.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35507016" target="_blank">35507016</a>
+</p>