3180d71425ab40bc022712bb95868bfe80747375 max Fri May 29 08:52:38 2026 -0700 [Claude] varFreqs: split SPARK+SCHEMA by phenotype, add disease + array combined tracks, drop array cohorts from varFreqsAll #Preview2 week - bugs introduced now will need a build patch to fix Split SFARI SPARK WES and WGS by autism status using fill-tags -S with the SPARK individuals_registration TSV (AC_AUT / AN_AUT / AF_AUT plus AC_NON_AUT / AN_NON_AUT / AF_NON_AUT). Added matching SCHEMA case/control sums (AC_CASE etc.). Two new combined bigBed tracks: varFreqsDisease (SPARK, SFARI WGS, TOPMed, SCHEMA, GREGoR, GA4K) and varFreqsArray (TPMI, MexBB, UKBB). TPMI and MexBB are removed from varFreqsAll so the main combined track is purely WGS/WES. Build scripts parameterized so the same code drives all three combined builds: mergeAndAnnotate.sh gains --databases / --tag, vcfToBigBed.py gains --databases-file / --populations-file and a per-track autoSql table name. mergeAndAnnotate.sh now pins /cluster/software/src/bcftools-1.22 in PATH (--unify-chr-names is a 1.22 feature; conda's 1.14 silently fails). refs #36642 diff --git src/hg/makeDb/trackDb/human/sfariSparkExomes.html src/hg/makeDb/trackDb/human/sfariSparkExomes.html index f3428e3fa2a..5a710f64574 100644 --- src/hg/makeDb/trackDb/human/sfariSparkExomes.html +++ src/hg/makeDb/trackDb/human/sfariSparkExomes.html @@ -1,54 +1,76 @@

Description

The Simons Foundation Autism Research Initiative (SFARI) recruited a large cohort of families with autistic children who provided DNA samples and phenotypes. 54,558 families, parents and their children were sequenced, a total of 142,357 individuals with whole-exome (WES) and 12,519 with whole-genome sequencing (WGS). The data contains 32,559 trios and 8,895 quads (one sibling without autism), and 824 twins.

The same frequencies shown here are also available publicly on the SFARI Genome Browser. See (SPARK et al, Neuron 2018) for details.

+

Phenotype-stratified counts

+

+In addition to the overall allele count (AC), allele number (AN), and allele +frequency (AF), each variant record carries counts split by autism status +(the asd column of the SPARK individual registration file): +

+ +

+A small minority of samples have a blank asd value and so contribute +only to the overall AC/AN/AF, not to either group total. +

+

Data Access

Due to license restrictions, the data for this track cannot be downloaded from the UCSC Genome Browser. The Table Browser, Data Integrator, and download server are not available for this track.

Allele frequencies can also be displayed on the SFARI Genome Browser. Full CRAMs and VCFs with genotypes are available from SFARI Base. They require a data access request, which is usually reviewed quickly. More information is available in the SPARK Welcome Packet.

Methods

The genome browser track project was approved by the Simons Foundation under request number 14584.1. WES and WGS data were downloaded from SFARI Base. pVCFs were downloaded, anonymized with a script using bcftools and its "fill-tags" plugin and -normalized. There was no minimum allele frequency cutoff.

+normalized. There was no minimum allele frequency cutoff. +The ASD-status sample-group file derived from the SPARK individuals_registration +TSV was passed to fill-tags via its -S option, which adds the per-group +AC_AUT/AN_AUT/AF_AUT and AC_NON_AUT/AN_NON_AUT/AF_NON_AUT +tags alongside the overall AC/AN/AF.

The methods are documented as follows by SFARI: