src/hg/makeDb/trackDb/human/varFreqsAll.html ec5c73f4dc3ef4beae16fa1c12b7e5bf872bb73d

ec5c73f4dc3ef4beae16fa1c12b7e5bf872bb73d
lrnassar
  Tue May 5 15:04:39 2026 -0700
varFreqs: fix gaspIndel bigDataUrl after Max's GenomeAsia hg38 lift; add Tishkoff180 to combined-track filter UI; sync databases.tsv with deployed bigBed; minor description-page corrections. refs #36642

GenomeAsia hg38 lift (May 5 2026, by Max):
- gaspIndel.bigDataUrl was pointing at the old GRCh37 filename "All.indels.annot.cont_withmaf.vcf.gz" which was renamed to "ga100k.indels.vcf.gz" during the lift; this left the gaspIndel track broken on the sandbox until the trackdb stanza was updated to match.
- gasp/gaspIndel dataVersion strings updated from "Pilot 2019 (GRCh37 - to be lifted)" to "Pilot 2019 (lifted to hg38, May 2026)".
- databases.tsv: also updated GenomeAsiaIndel path to ga100k.indels.vcf.gz so the next varFreqsAll rebuild reads from the lifted file.

Tishkoff180 in varFreqsAll.bb but unfilterable (fresh-eyes audit finding):
- Added Tishkoff180 to filterValues.sources and added filterByRange.Tishkoff180AF / Tishkoff180AC entries.
- Added Tishkoff180 (and SVatalog) rows to databases.tsv to match the deployed bigBed (which already has those columns).

Description-page corrections:
- varFreqsAll.html: "20 population databases" -> "25 source databases" (matches actual count); HGDP+1kG bullet "European" -> "Non-Finnish European" to disambiguate from Finnish (gnomAD's nfe).
- varFreqs.html: GenomeAsia row in the Available Datasets table updated from 3 to 7 sub-populations (NEA/SEA/SAS plus the previously hidden OCE/AMR/AFR/WER) so the table matches what the data exposes once Max's rebuild populates the new filter columns.
- KOVA longLabel: "1.9k WGS+3.5k WES" -> "1.9k WGS+3.4k WES" (3.4k is correct per Lee 2017 and kova.html).

diff --git src/hg/makeDb/trackDb/human/varFreqsAll.html src/hg/makeDb/trackDb/human/varFreqsAll.html
index 75b48c59f13..8ebfc217aaa 100644
--- src/hg/makeDb/trackDb/human/varFreqsAll.html
+++ src/hg/makeDb/trackDb/human/varFreqsAll.html
@@ -1,20 +1,20 @@
 <h2>Description</h2>
 <p>
 This track merges variants from all individual variant frequency databases into a single
 bigBed file with predicted protein consequences and cross-database filtering. It contains
-over 1.1 billion variants from 20 population databases worldwide. For a summary of
+over 1.1 billion variants from 25 source databases worldwide. For a summary of
 all available databases, see the
 <a href="hgTrackUi?g=varFreqs">Variant Frequencies</a> supertrack page.
 </p>
 
 <p>
 Each variant is annotated with its predicted consequence on protein-coding genes
 (using <a href="https://samtools.github.io/bcftools/howtos/csq-calling.html"
 target="_blank">bcftools csq</a> with
 <a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a>
 gene models), and colored by severity.
 Allele counts and frequencies are shown for each source database and, where available,
 broken down by ancestry or population group.
 </p>
 
 <h2>Display Conventions</h2>
@@ -87,44 +87,44 @@
 The <b>Source Database</b> filter lets you restrict to variants present in specific databases.
 For example, select only &quot;GREGoR&quot; to see variants found in the rare disease cohort.
 This filter uses OR logic: selecting multiple databases shows variants found in
 <em>any</em> of the selected databases.
 </p>
 
 <h3>Population-Specific Filters</h3>
 <p>
 Several databases provide ancestry-specific allele frequencies:
 </p>
 <ul>
   <li><b>AllOfUs</b>: African, Indigenous American, East Asian, European, Oceanian, South Asian
       (from local ancestry inference)</li>
   <li><b>GenomeAsia</b>: Northeast Asian, Southeast Asian, South Asian</li>
   <li><b>gnomAD HGDP+1kG</b>: African, Amish, Latino, Ashkenazi Jewish, East Asian, Finnish,
-      Middle Eastern, European, Other, South Asian</li>
+      Middle Eastern, Non-Finnish European, Other, South Asian</li>
   <li><b>GREGoR</b>: Affected, Unaffected, Unknown (disease status, not ancestry)</li>
 </ul>
 
 <h3>Length Filters</h3>
 <ul>
   <li><b>Reference/Alternate Length</b>: Filter by the length of the reference or alternate allele.</li>
   <li><b>Length Change</b>: Filter by the size difference between alternate and reference
       (positive = insertion, negative = deletion, zero = SNV or MNV).</li>
 </ul>
 
 <h2>Methods</h2>
 <p>
-Variant frequency VCF files from 20 databases were stripped of their INFO fields
+Variant frequency VCF files from 25 databases were stripped of their INFO fields
 (to reduce size), normalized with <code>bcftools norm</code> (splitting multi-allelic sites),
 and merged with <code>bcftools merge</code>. The merged VCF was then annotated with predicted
 protein consequences using <code>bcftools csq</code> with the
 <a href="https://www.ensembl.org/info/data/ftp/index.html" target="_blank">Ensembl</a>
 GRCh38 release 115 gene annotation (GFF3).
 </p>
 
 <p>
 The annotated VCF was converted to bigBed format using a custom Python script
 (<code>vcfToBigBed.py</code>) that reads frequency data from each source VCF in parallel,
 matches variants by position/ref/alt, and writes a BED file with consequence coloring,
 per-database allele counts and frequencies, and population breakdowns.
 The database configuration (which VCFs to include, field mappings, and population definitions)
 is stored in two TSV files
 (<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/varFreqs/databases.tsv"