efbc896ec61ccbd94910a56f5baae6d35f93cd19 lrnassar Wed Dec 20 10:40:28 2023 -0800 Staging QA Ready the dosage sensitivity track from Collins et al 2022 for hg38 and hg19. Refs #31991 diff --git src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html new file mode 100644 index 0000000..bab56d4 --- /dev/null +++ src/hg/makeDb/trackDb/human/dosageSensitivityCollins2022.html @@ -0,0 +1,139 @@ +

Description

+ +

+This container track represents dosage sensitivity map data from Collins et al 2022. There are +two tracks, one corresponding to the probability of haploinsufficiency (pHaplo) and +one to the probability of triplosensitivity (pTriplo).

+Rare copy-number variants (rCNVs) include deletions and duplications that occur +infrequently in the global human population and can confer substantial risk for +disease. Collins et al aimed to quantify the properties of haploinsufficiency (i.e., +deletion intolerance) and triplosensitivity (i.e., duplication intolerance) throughout +the human genome by analyzing rCNVs from nearly one million individuals to construct a +genome-wide catalog of dosage sensitivity across 54 disorders, which defined 163 dosage +sensitive segments associated with at least one disorder. These segments were typically +gene-dense and often harbored dominant dosage sensitive driver genes. An ensemble +machine learning model was built to predict dosage sensitivity probabilities (pHaplo & +pTriplo) for all autosomal genes, which identified 2,987 haploinsufficient and 1,559 +triplosensitive genes, including 648 that were uniquely triplosensitive. +

+ +

Display Conventions and Configuration

+ +

+Each of the tracks is displayed with a distinct item (bed track) covering the entire gene locus wherever +a score was available. Clicking on an item provides a link to DECIPHER which contains the sensitivity scores as well as +additional information. Mousing over the items will display the gene symbol, the ESNG ID for that gene, +and the respective sensitivity score for the track rounded to two decimal places. Filters are +also available to specify specific score thresholds to display for each of the tracks.

+ +

Coloring and Interpretation

+ +

+Each of the tracks is colored based on standardized cutoffs for pHaplo and pTriplo as described by the +authors:

+pHaplo scores ≥0.86 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(Karczewski et al., 2020). +pHaplo scores ≥0.55 indicate an odds ratio ≥2.

+pTriplo scores ≥0.94 indicate that the average effect sizes of deletions are as strong as +the loss-of-function of genes known to be constrained against protein truncating variants (average OR≥2.7) +(Karczewski et al., 2020). +pHaplo scores ≥0.68 indicate an odds ratio ≥2.

+Applying these cutoffs defined 2,987 haploinsufficient (pHaplo≥0.86) and 1,559 +triplosensitive (pTriplo≥0.94) genes with rCNV effect sizes comparable to loss-of-function +of gold-standard PTV-constrained genes.

See below for a summary of the color scheme:

+ +

Dark red items - pHaplo ≥ 0.86
Bright red items - pHaplo < 0.86
Dark blue items - pTriplo ≥ 0.94
Bright blue items - pTriplo < 0.94

+ +

Methods

+ +

+The data were downloaded from Zenodo which consisted of a 3-column file with +gene symbols, pHaplo, and pTriplo scores. Since the data were created using +GENCODEv19 models, the hg19 data was mapped using those coordinates by picking the earliest +transcription start site of all of the respective gene transcripts and the furthest +transcription end site. This leads to some gene boundaries that are not representative of a real +transcript, but since the data are for gene loci annotations this maximum coverage was used. +Finally, both scores were rounded to two decimal points for easier interpretation.

+For hg38, we attempted to use updated gene positions using a few different datasets since +gene symbols have been updated many times since GENCODEv19. A summary of the workflow +can be seen below, with each subsequent step being used only for genes where mapping failed:

1. Gene symbols were mapped using MANE1.0. < 2000 items failed mapping here.
2. Mapping with GENCODEv45 was attempted.
3. Mapping with GENCODEv20 was attempted. At this point, 448 items were not mapped.
4. Finally, any missing items were lifted using the hg19 track. 19/448 items failed +mapping due to their regions having been split from hg19 to hg38.

+ +

+In summary, the hg19 track was mapped using the original GENCODEv19 mappings, and a series +of steps were taken to map the hg38 gene symbols with updated coordinates. 19/18641 items +could not be mapped and are missing from the hg38 tracks.

+The complete +makeDoc can be found online. This includes all of the track creation steps.

+ +

Data Access

+The raw data can be explored interactively with the Table Browser, or +the Data Integrator. For automated access, this track, like all +others, is available via our API. However, for bulk +processing, it is recommended to download the dataset. +

+ +

+For automated download and analysis, the genome annotation is stored at UCSC in bigBed +files that can be downloaded from +our download server. +Individual regions or the whole genome annotation can be obtained using our tool +bigBedToBed which can be compiled from the source code or downloaded as a precompiled +binary for your system. Instructions for downloading source code and binaries can be found +here. +The tools can also be used to obtain features confined to a given range, e.g., +

+bigBedToBed -chrom=chr1 -start=100000 -end=100500 http://hgdownload.soe.ucsc.edu/gbdb/$db/bbi/dosageSensitivityCollins2022/pHaploDosageSensitivity.bb stdout +
+

+ +

+Please refer to our +Data Access FAQ +for more information. +

+ +

Credits

+ +

+Thanks to DECIPHER for their support and assistance with the data. We would also like to +thank Anna Benet-Pagès for suggesting and assisting in track development and interpretation. +

+ +

References

+ +

+Collins RL, Glessner JT, Porcu E, Lepamets M, Brandon R, Lauricella C, Han L, Morley T, Niestroj LM, +Ulirsch J et al. + +A cross-disorder dosage sensitivity map of the human genome. +Cell. 2022 Aug 4;185(16):3041-3055.e25. +PMID: 35917817; PMC: PMC9742861 +