src/hg/makeDb/trackDb/human/varFreqs.html bafa5b1fa3546c23f9091e08304355490ded9ead

bafa5b1fa3546c23f9091e08304355490ded9ead
lrnassar
  Tue May 26 11:34:39 2026 -0700
Address CR feedback on v498 description pages. refs #37533

varFreqs.html: fix contact-us sentence punctuation/grammar, replace
"ships" with "provides"/"includes" in two spots, and rewrite the
"trackUI labels" sentence to refer to the track configuration page and
the combined-track bigBed (per Gerardo's question about what trackUI
labels meant).
clinPred.html: replace em-dashes around "those that do not change the
encoded amino acid" with commas.
predictionScoresSuper.html: reorder references alphabetically by first
author.

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index d6f2839a8dd..fa9d6dbb231 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -9,31 +9,31 @@
 <a href="hgTrackUi?g=varFreqsAll">combined track</a> merges all databases into one summary track,
 with filters, summed population frequencies and recalculated protein-effect annotations.
 There is also one subtrack per project with the original VCF data and all the annotations that the project provides.
 The different projects use different pipelines and sequencing technologies. Click any of the projects
 above or below for a summary of their sample selection, sequencing assay and software pipeline.
 Many projects do not allow us to distribute the data, but we document how to request it
 and provide all converters.</p>
 
 <p>
 Data from projects that provide haplotype-phased genotypes can also be found
 elsewhere: 1000 Genomes is also a separate track, and the phased genotypes HGDP, SGDP,
 HGDP+1000 Genomes and Mexico Biobank can also be found in the &quot;Phased Variants&quot; track.
 Their VCF versions below show only the isolate frequency per variant.
 </p>
 
-<p>Please contact us (<A HREF="mailto:&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u">&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u</A><!-- above address is genome at soe.ucsc.edu -->), if you know a project that we should add. So far,
+<p>Please contact us (<A HREF="mailto:&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u">&#103;en&#111;&#109;&#101;&#64;&#115;&#111;&#101;.&#117;&#99;s&#99;.&#101;&#100;u</A><!-- above address is genome at soe.ucsc.edu -->) if you know of a project that we should add. So far,
 Regeneron&apos;s Million Exomes and Mexico City Studies (request rejected) and Taiwan Biobank (pending).
 </p>
 
 <h2>Combined Track (All Databases)</h2>
 <p>
 The &quot;All Databases Combined&quot; track merges variants from all individual databases into a single
 bigBed file with consequence annotations, totaling 1.17 billion variants from ~1.7 million individuals.
 The track supports filtering by variant type
 (SNV, insertion, deletion, MNV), predicted consequence (missense, synonymous, stop gained,
 frameshift, splice, intron, intergenic), source database, allele frequency (overall maximum
 and per-database), and allele count (total or per-database). The track is useful in dense mode
 to get a quick overview of variant density across all projects, or with filters to find
 variants present in specific databases or within certain frequency ranges. With the &quot;clone track&quot;
 feature you can clone this track and keep multiple versions, each with different filters activated.
 The &quot;Density mode&quot; checkbox on the track configuration page shows a plot of the
@@ -328,31 +328,31 @@
 <tr>
   <td><a href="hgTrackUi?g=tishkoff180">Indigenous Africans 180</a></td>
   <td>Africa (Ethiopia, Tanzania, Cameroon, Botswana)</td>
   <td>180</td>
   <td>WGS (&gt;30x)</td>
   <td>12 indigenous populations across all four African language phyla (Khoesan, Niger-Congo, Nilo-Saharan, Afroasiatic)</td>
   <td>&mdash;</td>
   <td>No</td>
 </tr>
 </table>
 
 <h2>Notes on Specific Sub-tracks</h2>
 
 <h3>AllOfUs &mdash; local-ancestry-stratified frequencies</h3>
 <p>
-The AllOfUs subtrack ships <b>local-ancestry-stratified</b> allele frequencies, not the
+The AllOfUs subtrack provides <b>local-ancestry-stratified</b> allele frequencies, not the
 global ancestry categories used in the All of Us Research Program 2024 Nature paper
 (see References). Each variant's per-ancestry AF/AC counts only the haplotypes whose
 inferred local ancestry at that exact genomic position belongs to the named group
 (strict-both-haps mode). The six ancestry classes
 (African, Indigenous American, East Asian, European, Oceanian, South Asian) match HGDP-derived
 local-ancestry reference panels and so include Oceanian, which is not one of the
 paper's six global Rye categories (those are AFR, AMR, EAS, EUR, Middle Eastern, SAS).
 For an admixed individual, the local-ancestry AF at a position can therefore differ
 substantially from the AF among self-reported members of the same ancestry group.
 The Ioannidis lab (Phoenix, UCSC) developed the pipeline that produced this VCF
 and applied it to the AllOfUs v7 release; only variants with cohort allele count &ge; 20
 were retained.
 </p>
 
 <h3>gnomAD HGDP+1kG &mdash; cohort vs full-release frequencies</h3>
@@ -360,47 +360,48 @@
 This subtrack derives from the gnomAD v3.1.2 release, which embeds the
 4,094-genome jointly-called HGDP+1kG cohort (Koenig et al. 2024) inside the larger
 gnomAD aggregation. To save space, we kept only INFO fields useful for clinical and
 population-genetic interpretation. Two allele-frequency
 sets are exposed:
 </p>
 <ul>
   <li>The <b>cohort-level</b> AC/AF/AN fields (no prefix) are computed across the
       ~3,400 unrelated HGDP+1kG individuals (allele number &asymp; 6,800).</li>
   <li>The <b>per-population</b> filter fields (gnomAD v3.1.2 African AF, gnomAD v3.1.2
       Latino AF, etc.) are values from the <b>full gnomAD v3.1.2 release</b>
       (~76,000 genomes), not just the 4,094-genome HGDP+1kG cohort.
       The corresponding allele numbers are typically tens of thousands per population.</li>
 </ul>
 <p>
-The trackUI labels and bigBed field descriptions reflect this distinction. Per-population
+The filter labels on the track configuration page, and the field descriptions in the
+combined-track bigBed, reflect this distinction. Per-population
 HGDP+1kG-cohort frequencies are not exposed because the cohort is too small for
 stable per-population estimates in many populations.
 </p>
 
 <h2>Display Conventions</h2>
 
 <p>Most tracks only show the variant and allele frequencies on mouseover or clicks.
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter; heterozygotes are shown with both
 letters. All VCF files are normalized, with one allele per annotation (no multi-allele
 lines).
 </p>
 
 <h2>Methods</h2>
 <p>
-Each subtrack ships the upstream project's VCF largely as-released; per-subtrack pipelines
+Each subtrack includes the upstream project's VCF largely as-released; per-subtrack pipelines
 (coordinate liftover, format conversion, header normalization) are documented on each
 subtrack's own description page and recorded in the
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">build documentation</a>.
 The conversion scripts (<em>e.g.</em> <code>finngen_to_vcf.py</code>, <code>kovaToVcf.py</code>,
 <code>schema_addAcAnAf.py</code>, <code>svatalogFreqToVcf.py</code>) live alongside the makedoc
 in the <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs" target="_blank">scripts directory</a>.
 </p>
 <p>
 The combined &quot;All Databases&quot; subtrack is built by a separate pipeline:
 each per-subtrack VCF is normalized (<code>bcftools norm</code>), all sites are merged into a single
 multi-sample callset, consequence annotations are recomputed against Ensembl with <code>bcftools csq</code>,
 and the result is converted to bigBed via <code>vcfToBigBed.py</code> + <code>bedToBigBed</code>.
 The mapping from upstream INFO fields to bigBed columns is driven by two configuration files in the
 scripts directory: <code>databases.tsv</code> (one row per source dataset) and
 <code>populations.tsv</code> (per-population AC/AF columns within each source).