aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/schema.html src/hg/makeDb/trackDb/human/schema.html new file mode 100644 index 00000000000..623791d2633 --- /dev/null +++ src/hg/makeDb/trackDb/human/schema.html @@ -0,0 +1,73 @@ +<h2>Description</h2> +<p> +The <a href="https://schema.broadinstitute.org/" target="_blank">SCHEMA</a> (Schizophrenia Exome +Meta-Analysis) consortium is an international collaboration that aggregated and harmonized +whole-exome sequencing data to study the role of rare coding variants in schizophrenia. +The dataset includes 24,248 cases and 97,322 controls from diverse global cohorts. +SCHEMA identified genes with exome-wide significant rare variant burden in schizophrenia, +providing insights into the biological underpinnings of the disorder. +</p> + +<h2>Data Access</h2> +<p> +Since the data can be downloaded from the SCHEMA website, and does not seem to be under a license, +we assume that we area allowed to redistribute it in VCF format. +The data can be explored on our website interactively with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>. +For programmatic access, our <a href="https://api.genome.ucsc.edu">REST API</a> can be used; the +track name is <em>schema</em>. +For bulk download, the VCF file can be obtained from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/$db/varFreqs/" target="_blank">our download server</a>. +</p> +<p> +Summary statistics and variant-level results are also available from the +<a href="https://schema.broadinstitute.org/" target="_blank">SCHEMA Browser</a>. +</p> + +<h2>Methods</h2> +<p> +The SCHEMA (Schizophrenia Exome Meta-Analysis) consortium aggregated whole-exome sequencing +data from 24,248 schizophrenia cases and 97,322 controls (including non-psychiatric, +non-neurological samples from the gnomAD consortium) across multiple international cohorts. +Exome sequencing was performed using various capture platforms and Illumina sequencing +instruments across cohorts sequenced over approximately a decade. Sequence data were +uniformly reprocessed through the BWA-Picard-GATK best practices pipeline as part of the +gnomAD v2 infrastructure, including alignment to GRCh37/hg19, duplicate marking, base +quality score recalibration, and per-sample variant calling with GATK HaplotypeCaller, +followed by joint genotyping across all samples. A novel exon-by-exon coverage estimation +pipeline was developed to account for differences in capture technology across sequencing +batches, and both site-level and genotype-level quality filters were applied. Protein-truncating +variants (PTVs) were annotated using LOFTEE (Loss-Of-Function Transcript Effect Estimator), +and missense variant deleteriousness was scored using MPC (Missense badness, PolyPhen-2, +and Constraint). Gene-level association testing combined: (1) a case-control rare variant +burden test aggregating ultra-rare PTVs (Class I: PTV and MPC > 3; Class II: missense +MPC 2–3) across 18,321 protein-coding genes; and (2) de novo variant enrichment +from 3,402 schizophrenia proband-parent trios assessed via a Poisson rate test against +gnomAD-derived baseline mutation rates; with the two components combined using a weighted +Z-score meta-analysis. This identified 10 genes at exome-wide significance (P < 2.14 +× 10<sup>-6</sup>) with odds ratios for PTVs ranging from 3 to 50, and 32 genes at +FDR < 5%. Full data are available at +<a href="https://schema.broadinstitute.org" target="_blank">schema.broadinstitute.org</a> +(Singh, Neale, Daly & the SCHEMA Consortium, +<a href="https://doi.org/10.1038/s41586-022-04556-w" target="_blank"><em>Nature</em> 2022</a>). +</p> +<p> +We downloaded the TSV data from the <a href="https://schema.broadinstitute.org/" target="_blank">SCHEMA</a> website +and converted it to VCF format using a custom Python script. The VCF was lifted to hg38 using our hg19ToHg38 chain +file. +We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track. +For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>. +</p> + +<h2>References</h2> +<p> +Singh T, Poterba T, Curtis D, Akil H, Al Eissa M, Barchas JD, Bass N, Bigdeli TB, Breen G, +Bromet EJ <em>et al</em>. +<a href="https://doi.org/10.1038/s41586-022-04556-w" target="_blank"> +Exome sequencing identifies rare coding variants in 10 genes which confer substantial risk for +schizophrenia</a>. +<em>Nature</em>. 2022 Apr;604(7906):509-516. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35396579" target="_blank">35396579</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9392855/" target="_blank">PMC9392855</a> +</p>