efbc896ec61ccbd94910a56f5baae6d35f93cd19 lrnassar Wed Dec 20 10:40:28 2023 -0800 Staging QA Ready the dosage sensitivity track from Collins et al 2022 for hg38 and hg19. Refs #31991 diff --git src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html new file mode 100644 index 0000000..bab56d4 --- /dev/null +++ src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html @@ -0,0 +1,139 @@ +
+This container track represents dosage sensitivity map data from Collins et al 2022. There are +two tracks, one corresponding to the probability of haploinsufficiency (pHaplo) and +one to the probability of triplosensitivity (pTriplo).
++Rare copy-number variants (rCNVs) include deletions and duplications that occur +infrequently in the global human population and can confer substantial risk for +disease. Collins et al aimed to quantify the properties of haploinsufficiency (i.e., +deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout +the human genome by analyzing rCNVs from nearly one million individuals to construct a +genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage +sensitive segments associated with at least one disorder. These segments were typically +gene-dense and often harbored dominant dosage sensitive driver genes. An ensemble +machine learning model was built to predict dosage sensitivity probabilities (pHaplo & +pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 +triplosensitive genes, including 648 that were uniquely triplosensitive. +
+ ++Each of the tracks is displayed with a distinct item (bed track) covering the entire gene locus wherever +a score was available. Clicking on an item provides a link to DECIPHER which contains the sensitivity scores as well as +additional information. Mousing over the items will display the gene symbol, the ESNG ID for that gene, +and the respective sensitivity score for the track rounded to two decimal places. Filters are +also available to specify specific score thresholds to display for each of the tracks.
+ ++
+Each of the tracks is colored based on standardized cutoffs for pHaplo and pTriplo as described by the +authors:
++pHaplo scores ≥0.86 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(Karczewski et al., 2020). +pHaplo scores ≥0.55 indicate an odds ratio ≥2.
++pTriplo scores ≥0.94 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(Karczewski et al., 2020). +pHaplo scores ≥0.68 indicate an odds ratio ≥2.
++Applying these cutoffs defined 2,987 haploinsufficient (pHaplo≥0.86) and 1,559 +triplosensitive (pTriplo≥0.94) genes with rCNV effect sizes comparable to loss-of-function +of gold-standard PTV-constrained genes.
++
See below for a summary of the color scheme:
+ ++The data were downloaded from Zenodo which consisted of a 3-column file with +gene symbols, pHaplo, and pTriplo scores. Since the data were created using +GENCODEv19 models, the hg19 data was mapped using those coordinates by picking the earliest +transcription start site of all of the respective gene transcripts and the furthest +transcription end site. This leads to some gene boundaries that are not representative of a real +transcript, but since the data are for gene loci annotations this maximum coverage was used. +Finally, both scores were rounded to two decimal points for easier interpretation.
++For hg38, we attempted to use updated gene positions using a few different datasets since +gene symbols have been updated many times since GENCODEv19. A summary of the workflow +can be seen below, with each subsequent step being used only for genes where mapping failed:
++In summary, the hg19 track was mapped using the original GENCODEv19 mappings, and a series +of steps were taken to map the hg38 gene symbols with updated coordinates. 19/18641 items +could not be mapped and are missing from the hg38 tracks.
++The complete +makeDoc can be found online. This includes all of the track creation steps.
+ ++The raw data can be explored interactively with the Table Browser, or +the Data Integrator. For automated access, this track, like all +others, is available via our API. However, for bulk +processing, it is recommended to download the dataset. +
+ +
+For automated download and analysis, the genome annotation is stored at UCSC in bigBed
+files that can be downloaded from
+our download server.
+Individual regions or the whole genome annotation can be obtained using our tool
+bigBedToBed which can be compiled from the source code or downloaded as a precompiled
+binary for your system. Instructions for downloading source code and binaries can be found
+here.
+The tools can also be used to obtain features confined to a given range, e.g.,
+
+bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/dosageSensitivityCollins2022/pHaploDosageSensitivity.bb stdout
+
+
+Please refer to our +Data Access FAQ +for more information. +
+ ++Thanks to DECIPHER for their support and assistance with the data. We would also like to +thank Anna Benet-Pagès for suggesting and assisting in track development and interpretation. +
+ ++Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, +Ulirsch J et al. + +A cross-disorder dosage sensitivity map of the human genome. +Cell. 2022 Aug 4;185(16):3041-3055.e25. +PMID: 35917817; PMC: PMC9742861 +