efbc896ec61ccbd94910a56f5baae6d35f93cd19 lrnassar Wed Dec 20 10:40:28 2023 -0800 Staging QA Ready the dosage sensitivity track from Collins et al 2022 for hg38 and hg19. Refs #31991 diff --git src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html new file mode 100644 index 0000000..bab56d4 --- /dev/null +++ src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html @@ -0,0 +1,139 @@ +<h2>Description</h2> + +<p> +This container track represents dosage sensitivity map data from <a target="_blank" +href="https://europepmc.org/article/MED/35917817">Collins et al 2022</a>. There are +two tracks, one corresponding to the probability of haploinsufficiency (pHaplo) and +one to the probability of triplosensitivity (pTriplo).</p> +<p> +Rare copy-number variants (rCNVs) include deletions and duplications that occur +infrequently in the global human population and can confer substantial risk for +disease. Collins et al aimed to quantify the properties of haploinsufficiency (i.e., +deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout +the human genome by analyzing rCNVs from nearly one million individuals to construct a +genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage +sensitive segments associated with at least one disorder. These segments were typically +gene-dense and often harbored dominant dosage sensitive driver genes. An ensemble +machine learning model was built to predict dosage sensitivity probabilities (pHaplo & +pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 +triplosensitive genes, including 648 that were uniquely triplosensitive. +</p> + +<h2>Display Conventions and Configuration</h2> + +<p> +Each of the tracks is displayed with a distinct item (bed track) covering the entire gene locus wherever +a score was available. Clicking on an item provides a link to <a target="_blank" +href="https://www.deciphergenomics.org">DECIPHER</a> which contains the sensitivity scores as well as +additional information. Mousing over the items will display the gene symbol, the ESNG ID for that gene, +and the respective sensitivity score for the track rounded to two decimal places. Filters are +also available to specify specific score thresholds to display for each of the tracks.</p> + +<h3>Coloring and Interpretation</h3> + +<p> +<p> +Each of the tracks is colored based on standardized cutoffs for pHaplo and pTriplo as described by the +authors:</p> +<p> +<b>pHaplo</b> scores ≥0.86 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(<a target="_blank" href="https://europepmc.org/articles/PMC7334197/">Karczewski et al., 2020</a>). +pHaplo scores ≥0.55 indicate an odds ratio ≥2.</p> +<p> +<b>pTriplo</b> scores ≥0.94 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(<a target="_blank" href="https://europepmc.org/articles/PMC7334197/">Karczewski et al., 2020</a>). +pHaplo scores ≥0.68 indicate an odds ratio ≥2.</p> +<p> +Applying these cutoffs defined 2,987 haploinsufficient (pHaplo≥0.86) and 1,559 +triplosensitive (pTriplo≥0.94) genes with rCNV effect sizes comparable to loss-of-function +of gold-standard PTV-constrained genes.</p> +<p> +<p>See below for a summary of the color scheme:</p> + +<ul> +<li><b style="color: rgb(181, 2, 14);">Dark red items</b> - pHaplo ≥ 0.86</li> +<li><b style="color: rgb(250, 42, 27);">Bright red items</b> - pHaplo < 0.86</li> +<li><b style="color: rgb(0, 9, 138);">Dark blue items</b> - pTriplo ≥ 0.94</li> +<li><b style="color: rgb(87, 92, 252);">Bright blue items</b> - pTriplo < 0.94</li> +</ul> + +<h2>Methods</h2> + +<p> +The data were downloaded from <a target="_blank" +href="https://zenodo.org/records/6347673">Zenodo</a> which consisted of a 3-column file with +gene symbols, pHaplo, and pTriplo scores. Since the data were created using +GENCODEv19 models, the hg19 data was mapped using those coordinates by picking the earliest +transcription start site of all of the respective gene transcripts and the furthest +transcription end site. This leads to some gene boundaries that are not representative of a real +transcript, but since the data are for gene loci annotations this maximum coverage was used. +Finally, both scores were rounded to two decimal points for easier interpretation.</p> +<p> +For hg38, we attempted to use updated gene positions using a few different datasets since +gene symbols have been updated many times since GENCODEv19. A summary of the workflow +can be seen below, with each subsequent step being used only for genes where mapping failed:</p> +<ul> +<li>1. Gene symbols were mapped using MANE1.0. < 2000 items failed mapping here.</li> +<li>2. Mapping with GENCODEv45 was attempted.</li> +<li>3. Mapping with GENCODEv20 was attempted. At this point, 448 items were not mapped.</li> +<li>4. Finally, any missing items were lifted using the hg19 track. 19/448 items failed +mapping due to their regions having been split from hg19 to hg38.</li></ul> + +<p> +In summary, the hg19 track was mapped using the original GENCODEv19 mappings, and a series +of steps were taken to map the hg38 gene symbols with updated coordinates. 19/18641 items +could not be mapped and are missing from the hg38 tracks.</p> +<p> +The complete <a target="_blank" +href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/dosageSensitivityCollins.txt"> +makeDoc</a> can be found online. This includes all of the track creation steps.</p> + +<h2>Data Access</h2> +<p> +The raw data can be explored interactively with the <a href="../hgTables">Table Browser</a>, or +the <a href="../hgIntegrator">Data Integrator</a>. For automated access, this track, like all +others, is available via our <a href="../goldenPath/help/api.html">API</a>. However, for bulk +processing, it is recommended to download the dataset. +</p> + +<p> +For automated download and analysis, the genome annotation is stored at UCSC in bigBed +files that can be downloaded from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/dosageSensitivityCollins2022/" +target="_blank">our download server</a>. +Individual regions or the whole genome annotation can be obtained using our tool +<tt>bigBedToBed</tt> which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tools can also be used to obtain features confined to a given range, e.g., +<br><br> +<tt>bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/dosageSensitivityCollins2022/pHaploDosageSensitivity.bb stdout</tt> +<br> +</p> + +<p> +Please refer to our +<a HREF="../FAQ/FAQdownloads.html#download36" target=_blank>Data Access FAQ</a> +for more information. +</p> + +<h2>Credits</h2> + +<p> +Thanks to DECIPHER for their support and assistance with the data. We would also like to +thank Anna Benet-Pagès for suggesting and assisting in track development and interpretation. +</p> + +<h2>References</h2> + +<p> +Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, +Ulirsch J <em>et al</em>. +<a href="https://linkinghub.elsevier.com/retrieve/pii/S0092-8674(22)00788-7" target="_blank"> +A cross-disorder dosage sensitivity map of the human genome</a>. +<em>Cell</em>. 2022 Aug 4;185(16):3041-3055.e25. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35917817" target="_blank">35917817</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9742861/" target="_blank">PMC9742861</a> +</p>