af9a5b388259e680dd34bc47b2cad4ff6e3d162f lrnassar Sat Jun 13 03:00:51 2026 -0700 varFreqs: pre-release polish from comprehensive sanity check. * Sync the new combined-track shortLabels into the four description pages: "Affected/Case Individuals" -> "Disease cohorts" and "Population + Unaffected" -> "Population reference" (matches the trackdb shortLabels users now see). * Add a paragraph in the supertrack Methods section describing the pooled affectedAF / backgroundAF formulation (sum AC / sum AN) and the default_an configuration that handles AF-only cohorts. * Update the in-track Methods paragraphs on varFreqsAffected.html and varFreqsBackground.html: replace "summed/maximized" with "pooled". * Fix supertrack table downloadability column to match the underscore-prefix convention: allofus "Yes" -> "No" (description page already says license restricted); gregor "No" -> "Yes" (description page says VCF is on our download server, and the gbdb path is not underscore-prefixed). * Add a 2026-06-12 makedoc section documenting the pooled-AF rebuild, the default_an mechanism, the new affectedAN/backgroundAN columns, the before/after spot-check at APOE rs429358, and the build commands. refs #36642 diff --git src/hg/makeDb/trackDb/human/varFreqsArray.html src/hg/makeDb/trackDb/human/varFreqsArray.html index da0c5f25410..b92b7fb5a94 100644 --- src/hg/makeDb/trackDb/human/varFreqsArray.html +++ src/hg/makeDb/trackDb/human/varFreqsArray.html @@ -1,28 +1,28 @@ <h2>Description</h2> <p> This track merges variants from three genotyping-array cohorts into a single bigBed file with predicted protein consequences and cross-database filtering. It contains 14.7 million variants from the Taiwan Precision Medicine Initiative (TPMI Axiom TPM1 chip, ~1 million Han Chinese), the Mexico Biobank (MexBB, 6,011 individuals), and UK Biobank (361k unrelated white British, imputed from the Neale Lab Round 2 release). </p> <p> The array track is kept separate from the sequencing-based combined tracks -(<a href="hgTrackUi?g=varFreqsAffected">Affected/Case Individuals</a> and -<a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a>) so that +(<a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a> and +<a href="hgTrackUi?g=varFreqsBackground">Population reference</a>) so that sequencing-based and array-based frequencies can be inspected independently. For a summary of all available variant frequency databases, see the <a href="hgTrackUi?g=varFreqs">SNV Frequencies</a> supertrack page. </p> <h2>Display Conventions</h2> <h3>Color by Consequence</h3> <p>Variants are colored by their most severe predicted consequence:</p> <table class="stdTbl"> <tr><th>Color</th><th>Consequence class</th><th>Examples</th></tr> <tr><th style="background-color:#FF0000;width:2em"> </th> <td>Protein-truncating / loss-of-function</td> <td>stop_gained, frameshift, splice_donor, splice_acceptor, stop_lost, start_lost</td></tr> <tr><th style="background-color:#1F77B4;width:2em"> </th> @@ -47,31 +47,31 @@ <h2>Caveats</h2> <p> Allele frequencies from genotyping arrays are not directly comparable to those from whole-genome or whole-exome sequencing. Two limitations to keep in mind: </p> <ul> <li><b>Probe coverage is sparse and curated.</b> Array variants are only those the manufacturer designed probes for. Absence from this track does <em>not</em> mean a variant is absent in that population, only that it was not on the chip.</li> <li><b>Per-variant call confidence varies and is sometimes unreported.</b> TPMI publishes a per-probe <code>NGS_concordance</code> value (chip-vs-sequencing concordance from its own validation) in the source VCF; high-AF claims with low concordance are common. MexBB ships only AN/AF/AC with no FILTER column and no per-site QC at all. For both arrays, high-AF rare-disease candidates should be cross-checked against the sequencing-based - <a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a> track before + <a href="hgTrackUi?g=varFreqsBackground">Population reference</a> track before drawing conclusions.</li> </ul> <h2>Filters</h2> <p> This track supports filtering via the track settings page. Click the track title or use the "Configure" button to access filters. </p> <h3>Variant Type and Consequence</h3> <ul> <li><b>Variant Type</b>: SNV, Insertion, Deletion, or MNV.</li> <li><b>Consequence</b>: Missense, Synonymous, Stop Gained, Frameshift, Splice Donor, Splice Acceptor, Intron, 3' UTR, 5' UTR, Non-coding, Intergenic, or Other. The filter uses OR logic across the comma-separated consequence tokens on each variant. See the @@ -93,32 +93,32 @@ <p> The <b>Source Database</b> filter restricts the display to variants present in specific databases. It uses OR logic. </p> <h3>Length Filters</h3> <ul> <li><b>Reference/Alternate Length</b>: Filter by the length of the reference or alternate allele.</li> <li><b>Length Change</b>: Filter by the size difference between alternate and reference (positive = insertion, negative = deletion, zero = SNV or MNV).</li> </ul> <h2>Methods</h2> <p> The same merge-and-annotate pipeline used for the sequencing-based combined tracks -(<a href="hgTrackUi?g=varFreqsAffected">Affected/Case Individuals</a> and -<a href="hgTrackUi?g=varFreqsBackground">Population + Unaffected</a>) was run on the +(<a href="hgTrackUi?g=varFreqsAffected">Disease cohorts</a> and +<a href="hgTrackUi?g=varFreqsBackground">Population reference</a>) was run on the array-cohort subset of source VCFs. Each VCF was stripped of its INFO fields, normalized with <code>bcftools norm</code> (splitting multi-allelic sites), and merged with <code>bcftools merge</code>. The merged VCF was then annotated with predicted protein consequences using <code>bcftools csq</code> with the <a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a> GRCh38 release 115 gene annotation (GFF3). </p> <p> The track's <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> documents how each source VCF was converted. Scripts are available from <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">Github</a>.