af9a5b388259e680dd34bc47b2cad4ff6e3d162f
lrnassar
  Sat Jun 13 03:00:51 2026 -0700
varFreqs: pre-release polish from comprehensive sanity check.

* Sync the new combined-track shortLabels into the four description pages:
"Affected/Case Individuals" -> "Disease cohorts" and "Population + Unaffected"
-> "Population reference" (matches the trackdb shortLabels users now see).
* Add a paragraph in the supertrack Methods section describing the pooled
affectedAF / backgroundAF formulation (sum AC / sum AN) and the default_an
configuration that handles AF-only cohorts.
* Update the in-track Methods paragraphs on varFreqsAffected.html and
varFreqsBackground.html: replace "summed/maximized" with "pooled".
* Fix supertrack table downloadability column to match the underscore-prefix
convention: allofus "Yes" -> "No" (description page already says license
restricted); gregor "No" -> "Yes" (description page says VCF is on our
download server, and the gbdb path is not underscore-prefixed).
* Add a 2026-06-12 makedoc section documenting the pooled-AF rebuild, the
default_an mechanism, the new affectedAN/backgroundAN columns, the
before/after spot-check at APOE rs429358, and the build commands.

refs #36642

diff --git src/hg/makeDb/trackDb/human/varFreqsAffected.html src/hg/makeDb/trackDb/human/varFreqsAffected.html
index 36af10b8e43..cd7631d1313 100644
--- src/hg/makeDb/trackDb/human/varFreqsAffected.html
+++ src/hg/makeDb/trackDb/human/varFreqsAffected.html
@@ -1,22 +1,22 @@
 <h2>Description</h2>
 <p>
 This track shows small variants (SNVs and short indels) that were observed in
 <b>affected or case individuals</b> of disease-study cohorts, annotated with their
 predicted protein consequence and colored by severity. It is one half of a matched pair:
 the companion
-<a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a> track shows the same
+<a href="hgTrackUi?g=varFreqsBackground">Population reference</a> track shows the same
 kind of variants seen in population reference cohorts and in unaffected relatives or
 controls. Displaying the two together lets you compare, for example, how often a
 loss-of-function variant in a gene of interest is seen in affected individuals versus the
 general/unaffected background. For the full list of contributing projects, see the
 <a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> collection page.
 </p>
 <p>
 The affected counts are drawn from the affected or case arm of five disease-study cohorts:
 SFARI SPARK WES and SFARI SPARK WGS (autism spectrum disorder probands), SCHEMA
 (schizophrenia cases), GREGoR (affected rare-disease participants), and GA4K (a pediatric
 rare-disease cohort). For SPARK, SFARI WGS, SCHEMA, and GREGoR the source data carries an
 explicit affected/unaffected (or case/control) label and only the affected arm feeds this
 track. GA4K reports a single cohort-wide frequency with no per-individual label; because it
 is a rare-disease cohort it is counted as affected here, with the caveat that it enrolls
 parent-child trios, so a minority of its carriers are unaffected parents. Genotyping-array
@@ -55,66 +55,66 @@
 that publish only AC and have no <code>default_an</code> set (currently GREGoR's per-arm
 AC_AFFECTED/UNAFFECTED/UNKNOWN) are listed in <b>affectedCohorts</b> but do not contribute
 to the pool numerator or denominator; their carriers are visible in the per-database AC
 column instead. The pooled rate is preferred over a max-across-cohorts statistic so a
 small cohort with a high local AF cannot dominate the displayed frequency.
 </p>
 
 <h3>Finding case-enriched loss-of-function variants</h3>
 <p>
 To look for protein-truncating variants that are common in affected individuals but rare
 in the background, set the Consequence filter to Stop Gained, Frameshift, Splice Donor and
 Splice Acceptor (these appear red), then add an upper limit on the
 <b>Background AF</b> filter. Each variant here carries both its affected frequency and its
 background frequency, so this isolates variants seen in cases with little or no presence in
 the population/unaffected set. Comparing visually against the
-<a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a> track shows the same
+<a href="hgTrackUi?g=varFreqsBackground">Population reference</a> track shows the same
 contrast across a whole gene.
 </p>
 
 <h2>Filters</h2>
 <ul>
   <li><b>Variant Type</b> and <b>Consequence</b>: restrict to SNV/insertion/deletion/MNV
       and to predicted consequence classes (the Consequence filter uses OR logic over the
       comma-separated tokens on each variant).</li>
   <li><b>Affected/case AF</b>, <b>AC</b>, <b>AN</b>: the pooled allele frequency
       (sum AC / sum AN), summed allele count, and summed allele number across the
       contributing affected arms. See &quot;Pooled allele frequency&quot; above.</li>
   <li><b>Background AF</b>, <b>AC</b>, <b>AN</b>: the same triple computed across the
       population + unaffected background, for filtering case-enriched variants.</li>
   <li><b>Affected/case cohort</b>: restrict to variants seen in specific disease cohorts
       (for example, only the two autism cohorts).</li>
   <li><b>Reference/Alternate Length</b> and <b>Length Change</b>: filter by allele length.</li>
 </ul>
 
 <h2>Methods</h2>
 <p>
 Variant-frequency VCFs from the contributing cohorts were stripped of unneeded INFO fields,
 normalized with <code>bcftools norm</code> (splitting multi-allelic sites), and merged with
 <code>bcftools merge</code>. The merged callset was annotated with predicted protein
 consequences using <a href="https://samtools.github.io/bcftools/howtos/csq-calling.html"
 target="_blank">bcftools csq</a> against the
 <a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a>
 GRCh38 release 115 gene models.
 </p>
 <p>
 A custom Python script (<code>vcfToBigBed.py</code>) then read the per-cohort allele
-frequencies and, for each variant, summed/maximized the counts across the affected arms
-(case/proband subgroups, plus GA4K whole-cohort) to produce this track, and across the
-population cohorts and unaffected/control subgroups to produce the companion
-<a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a> track. A variant seen in
-both groups appears in both tracks. The build is documented in the
+counts and frequencies and, for each variant, pooled the allele counts and allele numbers
+across the affected arms (case/proband subgroups, plus GA4K whole-cohort) to produce this
+track, and across the population cohorts and unaffected/control subgroups to produce the
+companion <a href="hgTrackUi?g=varFreqsBackground">Population reference</a> track. A variant
+seen in both groups appears in both tracks. The build is documented in the
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt"
 target="_blank">makeDoc</a>, and the scripts are on
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs"
 target="_blank">GitHub</a>.
 </p>
 
 <h2>Data Access</h2>
 <p>
 Because the merged callset combines cohorts whose redistribution licenses differ, this
 track is <b>not available for download</b> and is not in the Table Browser. It can be
 reconstructed from the individual source VCFs using the
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs"
 target="_blank">conversion scripts</a> and the
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt"
 target="_blank">build documentation</a>. The per-project subtracks on the