68c5b3b5dfc4053ff78a6b1d236bd1ac90251cfa lrnassar Mon Jun 1 14:40:45 2026 -0700 varFreqs: description pages for the three combined tracks and "SNV" rename sweep. Add varFreqsDisease.html and varFreqsArray.html so the two new combined tracks have full Description/Display/Methods/Data Access/References. Add a Caveats section on varFreqsArray about chip-data quality vs sequencing. Update varFreqsAll.html and the supertrack varFreqs.html to reflect the three-combined-track family (cross-links between siblings, new "Combined Tracks" section, new table rows, and updated source/variant counts). Add a GoNL row to the supertrack table. Sweep 37 subtrack longLabels and four cross-referencing description pages (colorsDbSnv.html, mei.html, meiSwegen.html, phasedVars.html) from "Variant Frequencies:" to "SNV Frequencies:" to match the supertrack shortLabel. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqsDisease.html src/hg/makeDb/trackDb/human/varFreqsDisease.html new file mode 100644 index 00000000000..613013eac96 --- /dev/null +++ src/hg/makeDb/trackDb/human/varFreqsDisease.html @@ -0,0 +1,198 @@ +<h2>Description</h2> +<p> +This track merges variants from six disease-focused or clinically-recruited cohorts into a +single bigBed file with predicted protein consequences and cross-database filtering. It +contains 932 million variants from SFARI SPARK (WES + WGS, autism families), TOPMed +(NHLBI heart, lung and blood disease cohorts), SCHEMA (schizophrenia case/control), +GREGoR (rare-disease families), and GA4K (PacBio long-read pediatric rare disease). Where +the source dataset provides per-phenotype counts, those are exposed as separate AC/AF +columns and as filter widgets. +</p> + +<p> +For a summary of all available variant frequency databases, including the population-scale +control track and the genotyping-array track, see the +<a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> supertrack page. +</p> + +<p> +Each variant is annotated with its predicted consequence on protein-coding genes +(using <a href="https://samtools.github.io/bcftools/howtos/csq-calling.html" +target="_blank">bcftools csq</a> with +<a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a> +gene models), and colored by severity. Allele counts and frequencies are shown for each +source database and, where available, broken down by phenotype. +</p> + +<h2>Display Conventions</h2> + +<h3>Color by Consequence</h3> +<p>Variants are colored by their most severe predicted consequence:</p> +<table class="stdTbl"> +<tr><th>Color</th><th>Consequence class</th><th>Examples</th></tr> +<tr> + <td style="background-color: rgb(255,0,0); color: white; text-align: center;"><b>Red</b></td> + <td>Protein-truncating / Loss-of-function</td> + <td>stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost</td> +</tr> +<tr> + <td style="background-color: rgb(31,119,180); color: white; text-align: center;"><b>Blue</b></td> + <td>Missense / In-frame</td> + <td>missense, inframe_insertion, inframe_deletion, protein_altering</td> +</tr> +<tr> + <td style="background-color: rgb(0,128,0); color: white; text-align: center;"><b>Green</b></td> + <td>Synonymous</td> + <td>synonymous, stop_retained</td> +</tr> +<tr> + <td style="background-color: rgb(128,128,128); color: white; text-align: center;"><b>Grey</b></td> + <td>Non-coding / Intergenic</td> + <td>intron, non_coding, intergenic, UTR</td> +</tr> +</table> + +<h3>Amino Acid Change Notation</h3> +<p> +The "AA change" field uses bcftools csq notation: <b>23I>23V</b> means position +23 changed from Isoleucine (I) to Valine (V) (missense). <b>23I</b> alone (no arrow) +means position 23 is Isoleucine and unchanged (synonymous). A "*" indicates a +stop codon (e.g. 45R>45* is a stop_gained). +</p> + +<h2>Filters</h2> +<p> +This track supports filtering via the track settings page. Click the track title or use the +"Configure" button to access filters. +</p> + +<h3>Variant Type and Consequence</h3> +<ul> + <li><b>Variant Type</b>: SNV, Insertion, Deletion, or MNV.</li> + <li><b>Consequence</b>: Missense, Synonymous, Stop Gained, Frameshift, Splice Donor, + Splice Acceptor, Intron, 3' UTR, 5' UTR, Non-coding, Intergenic, or Other. The filter + uses OR logic across the comma-separated consequence tokens on each variant. See the + <a href="hgTrackUi?g=varFreqsAll">All Databases Combined</a> description page for a + complete description of the "Other" bucket.</li> +</ul> + +<h3>Frequency and Count Filters</h3> +<ul> + <li><b>Max Allele Frequency</b>: Filter by the maximum allele frequency observed across + the six disease cohorts. Useful for finding rare variants enriched in cases.</li> + <li><b>Total Allele Count</b>: Filter by the sum of allele counts across all six + databases.</li> + <li><b>Per-database AF and AC</b>: Filter by allele frequency or count in any specific + source. For example, restrict to variants with SCHEMA case AF > 0.001.</li> +</ul> + +<h3>Phenotype-stratified Filters</h3> +<p> +Four of the six sources publish counts split by phenotype, which lets you compare allele +frequencies between affected and unaffected groups within the same cohort: +</p> +<ul> + <li><b>SPARK WES</b> and <b>SFARI WGS</b>: ASD proband counts versus non-ASD family + members (mostly parents and unaffected siblings). The split is from the SPARK + individuals_registration <code>asd</code> column.</li> + <li><b>SCHEMA</b>: Schizophrenia case counts versus controls, summed across the 39 + analysis cohorts in the original meta-analysis.</li> + <li><b>GREGoR</b>: Affected, Unaffected, and Unknown disease-status counts.</li> +</ul> + +<h3>Source Database</h3> +<p> +The <b>Source Database</b> filter restricts the display to variants present in specific +databases. It uses OR logic: selecting multiple databases shows variants found in any of +the selected sources. +</p> + +<h3>Length Filters</h3> +<ul> + <li><b>Reference/Alternate Length</b>: Filter by the length of the reference or alternate allele.</li> + <li><b>Length Change</b>: Filter by the size difference between alternate and reference + (positive = insertion, negative = deletion, zero = SNV or MNV).</li> +</ul> + +<h2>Methods</h2> +<p> +The same merge-and-annotate pipeline used for the +<a href="hgTrackUi?g=varFreqsAll">All Databases Combined</a> track was run on the +disease-cohort subset of source VCFs. Each VCF was stripped of its INFO fields, +normalized with <code>bcftools norm</code> (splitting multi-allelic sites), and merged with +<code>bcftools merge</code>. The merged VCF was then annotated with predicted protein +consequences using <code>bcftools csq</code> with the +<a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a> +GRCh38 release 115 gene annotation (GFF3). +</p> + +<p> +The SPARK WES and WGS sites VCFs were rebuilt for this track so each variant carries +phenotype-stratified counts in addition to overall AC/AN/AF. The split uses the +<code>asd</code> column of the SPARK <code>individuals_registration</code> TSV via +<code>bcftools +fill-tags -S</code>, producing AC_AUT / AN_AUT / AF_AUT and +AC_NON_AUT / AN_NON_AUT / AF_NON_AUT. SCHEMA was processed the same way, summing +AC_CASE/AN_CASE/AF_CASE and AC_CTRL/AN_CTRL/AF_CTRL across its 39 analysis cohorts. +GREGoR ships AC/AN/AF triples for affected, unaffected and unknown disease status +directly in its release. +</p> + +<p> +The track's +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" +target="_blank">makeDoc file</a> documents how each source VCF was converted. Scripts are +available from +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" +target="_blank">Github</a>. +</p> + +<h2>Data Access</h2> +<p> +The data can be explored interactively with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a>. For programmatic access, our +<a href="https://api.genome.ucsc.edu" target="_blank">REST API</a> can be used; the track +name is <em>varFreqsDisease</em>. +</p> +<p> +Because the merged callset includes data from multiple sources whose redistribution +licenses differ, the combined bigBed is <b>not available for download</b> from our download +server. The combined track can be reconstructed from the individual source VCFs using the +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" +target="_blank">conversion scripts on GitHub</a> together with the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" +target="_blank">build documentation</a>. +</p> + +<h2>Credits</h2> +<p> +This track is only possible thanks to the data from millions of volunteers around the +world, who donated blood, signed consent forms and provided health information about +themselves and sometimes their families. Click on any of the individual tracks in the +<a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> supertrack to see the specific credits +for each project. Thanks to Alex Ioannidis, UCSC, for the motivation for this track and +to Andreas Lahner, MGZ, for feedback. +</p> + +<h2>References</h2> +<p> +For primary citations of each source dataset, see the References section on the +<a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> supertrack page. The merged-track +build itself uses the following tools: +</p> +<p> +Danecek P, McCarthy SA. +<a href="https://doi.org/10.1093/bioinformatics/btx100" target="_blank"> +BCFtools/csq: haplotype-aware variant consequences</a>. +<em>Bioinformatics</em>. 2017 Jul 1;33(13):2037-2039. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28205675" target="_blank">28205675</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870570/" target="_blank">PMC5870570</a> +</p> +<p> +McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P, Cunningham F. +<a href="https://doi.org/10.1186/s13059-016-0974-4" target="_blank"> +The Ensembl Variant Effect Predictor</a>. +<em>Genome Biol</em>. 2016 Jun 6;17(1):122. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/27268795" target="_blank">27268795</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4893825/" target="_blank">PMC4893825</a> +</p>