aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/schema.html src/hg/makeDb/trackDb/human/schema.html new file mode 100644 index 00000000000..623791d2633 --- /dev/null +++ src/hg/makeDb/trackDb/human/schema.html @@ -0,0 +1,73 @@ +

Description

+

+The SCHEMA (Schizophrenia Exome +Meta-Analysis) consortium is an international collaboration that aggregated and harmonized +whole-exome sequencing data to study the role of rare coding variants in schizophrenia. +The dataset includes 24,248 cases and 97,322 controls from diverse global cohorts. +SCHEMA identified genes with exome-wide significant rare variant burden in schizophrenia, +providing insights into the biological underpinnings of the disorder. +

+ +

Data Access

+

+Since the data can be downloaded from the SCHEMA website, and does not seem to be under a license, +we assume that we area allowed to redistribute it in VCF format. +The data can be explored on our website interactively with the +Table Browser or the +Data Integrator. +For programmatic access, our REST API can be used; the +track name is schema. +For bulk download, the VCF file can be obtained from +our download server. +

+

+Summary statistics and variant-level results are also available from the +SCHEMA Browser. +

+ +

Methods

+

+The SCHEMA (Schizophrenia Exome Meta-Analysis) consortium aggregated whole-exome sequencing +data from 24,248 schizophrenia cases and 97,322 controls (including non-psychiatric, +non-neurological samples from the gnomAD consortium) across multiple international cohorts. +Exome sequencing was performed using various capture platforms and Illumina sequencing +instruments across cohorts sequenced over approximately a decade. Sequence data were +uniformly reprocessed through the BWA-Picard-GATK best practices pipeline as part of the +gnomAD v2 infrastructure, including alignment to GRCh37/hg19, duplicate marking, base +quality score recalibration, and per-sample variant calling with GATK HaplotypeCaller, +followed by joint genotyping across all samples. A novel exon-by-exon coverage estimation +pipeline was developed to account for differences in capture technology across sequencing +batches, and both site-level and genotype-level quality filters were applied. Protein-truncating +variants (PTVs) were annotated using LOFTEE (Loss-Of-Function Transcript Effect Estimator), +and missense variant deleteriousness was scored using MPC (Missense badness, PolyPhen-2, +and Constraint). Gene-level association testing combined: (1) a case-control rare variant +burden test aggregating ultra-rare PTVs (Class I: PTV and MPC > 3; Class II: missense +MPC 2–3) across 18,321 protein-coding genes; and (2) de novo variant enrichment +from 3,402 schizophrenia proband-parent trios assessed via a Poisson rate test against +gnomAD-derived baseline mutation rates; with the two components combined using a weighted +Z-score meta-analysis. This identified 10 genes at exome-wide significance (P < 2.14 +× 10-6) with odds ratios for PTVs ranging from 3 to 50, and 32 genes at +FDR < 5%. Full data are available at +schema.broadinstitute.org +(Singh, Neale, Daly & the SCHEMA Consortium, +Nature 2022). +

+

+We downloaded the TSV data from the SCHEMA website +and converted it to VCF format using a custom Python script. The VCF was lifted to hg38 using our hg19ToHg38 chain +file. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

References

+

+Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, Bass N, Bigdeli TB, Breen G, +Bromet EJ et al. + +Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for +schizophrenia. +Nature. 2022 Apr;604(7906):509-516. +PMID: 35396579; PMC: PMC9392855 +