af9a5b388259e680dd34bc47b2cad4ff6e3d162f lrnassar Sat Jun 13 03:00:51 2026 -0700 varFreqs: pre-release polish from comprehensive sanity check. * Sync the new combined-track shortLabels into the four description pages: "Affected/Case Individuals" -> "Disease cohorts" and "Population + Unaffected" -> "Population reference" (matches the trackdb shortLabels users now see). * Add a paragraph in the supertrack Methods section describing the pooled affectedAF / backgroundAF formulation (sum AC / sum AN) and the default_an configuration that handles AF-only cohorts. * Update the in-track Methods paragraphs on varFreqsAffected.html and varFreqsBackground.html: replace "summed/maximized" with "pooled". * Fix supertrack table downloadability column to match the underscore-prefix convention: allofus "Yes" -> "No" (description page already says license restricted); gregor "No" -> "Yes" (description page says VCF is on our download server, and the gbdb path is not underscore-prefixed). * Add a 2026-06-12 makedoc section documenting the pooled-AF rebuild, the default_an mechanism, the new affectedAN/backgroundAN columns, the before/after spot-check at APOE rs429358, and the build commands. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqsBackground.html src/hg/makeDb/trackDb/human/varFreqsBackground.html index e63ca161efb..4a7f3b9f2b2 100644 --- src/hg/makeDb/trackDb/human/varFreqsBackground.html +++ src/hg/makeDb/trackDb/human/varFreqsBackground.html @@ -1,35 +1,35 @@ <h2>Description</h2> <p> This track shows small variants (SNVs and short indels) seen in <b>population reference cohorts and in unaffected or control individuals</b> of disease-study cohorts, annotated with their predicted protein consequence and colored by severity. It is the background half of a matched pair: the companion -<a href="hgTrackUi?g=varFreqsAffected">Affected/Case Individuals</a> track shows the same +<a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a> track shows the same kind of variants seen in affected or case individuals. Displaying the two together lets you see how common a variant is in the general/unaffected population compared with affected individuals. For the full list of contributing projects, see the <a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> collection page. </p> <p> The background combines two kinds of data: the population/biobank reference cohorts (such as gnomAD HGDP+1kG, TOPMed, ALFA, HRC and the many national WGS projects), and the unaffected/control or unknown-phenotype arms of the disease-study cohorts (non-ASD family members in SFARI SPARK WES/WGS, SCHEMA controls, and GREGoR unaffected/unknown participants). Genotyping-array cohorts are not included. A variant that also appears in affected individuals is shown in both this track and the -<a href="hgTrackUi?g=varFreqsAffected">Affected/Case Individuals</a> track. +<a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a> track. </p> <h2>Display Conventions</h2> <h3>Color by Consequence</h3> <p>Variants are colored by their most severe predicted consequence:</p> <table class="stdTbl"> <tr><th>Color</th><th>Consequence class</th><th>Examples</th></tr> <tr><th style="background-color:#FF0000;width:2em"> </th> <td>Protein-truncating / loss-of-function</td> <td>stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost</td></tr> <tr><th style="background-color:#1F77B4;width:2em"> </th> <td>Missense / in-frame</td> <td>missense, inframe_insertion, inframe_deletion, protein_altering</td></tr> <tr><th style="background-color:#008000;width:2em"> </th> <td>Synonymous</td> @@ -77,34 +77,34 @@ <li><b>Reference/Alternate Length</b> and <b>Length Change</b>: filter by allele length.</li> </ul> <h2>Methods</h2> <p> Variant-frequency VCFs from the contributing cohorts were stripped of unneeded INFO fields, normalized with <code>bcftools norm</code> (splitting multi-allelic sites), and merged with <code>bcftools merge</code>. The merged callset was annotated with predicted protein consequences using <a href="https://samtools.github.io/bcftools/howtos/csq-calling.html" target="_blank">bcftools csq</a> against the <a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a> GRCh38 release 115 gene models. </p> <p> A custom Python script (<code>vcfToBigBed.py</code>) then read the per-cohort allele -frequencies and, for each variant, summed/maximized the counts across the population cohorts -and unaffected/control subgroups to produce this track, and across the affected arms to -produce the companion -<a href="hgTrackUi?g=varFreqsAffected">Affected/Case Individuals</a> track. A variant seen +counts and frequencies and, for each variant, pooled the allele counts and allele numbers +across the population cohorts and unaffected/control subgroups to produce this track, and +across the affected arms to produce the companion +<a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a> track. A variant seen in both groups appears in both tracks. The build is documented in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc</a>, and the scripts are on <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>. </p> <h2>Data Access</h2> <p> Because the merged callset combines cohorts whose redistribution licenses differ, this track is <b>not available for download</b> and is not in the Table Browser. It can be reconstructed from the individual source VCFs using the <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">conversion scripts</a> and the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt"