38bafc856320cf5360e0482faeee72b78f2ea963
lrnassar
  Tue May 5 14:13:30 2026 -0700
QA pass on varFreqs per-subtrack description pages: encode 3 plain emails, add target=_blank to 15 boilerplate REST API links, and add missing References sections (and Data Access on varFreqsAll). refs #36642

Mechanical fixes across 18 per-subtrack description pages:
- Encoded 3 plain author/contact emails: pfeliciano@simonsfoundation.org (sfariSparkExomes), m.hobbs@garvan.org.au (mgrb), contact_npco@a-star.edu.sg (npm).
- Added target="_blank" to 15 occurrences of the boilerplate "<a href=https://api.genome.ucsc.edu>REST API</a>" link across allofus, topmed, sfariSparkExomes, tommo60kjpn, alfaVcf, gasp, abraom, indigenomes, hrc, saudi, schema, sgdpFreq, gregor, hgdp1kFreq, colorsDbSnv.

Added missing References sections:
- allofus.html: All of Us Research Program 2024 Nature.
- topmed.html: Taliun 2021 Nature.
- alfaVcf.html: NCBI ALFA documentation citation (no peer-reviewed paper yet).
- gregor.html: GREGoR R04 Methods document + consortium website (no flagship publication yet).
- varFreqsAll.html: pointer to the supertrack's References section, plus tool citations (bcftools csq, Ensembl VEP).

Added missing Data Access section on varFreqsAll.html explaining that the merged callset is not downloadable due to mixed source-data licensing, but can be reconstructed from the per-subtrack VCFs using the conversion scripts on GitHub.

All 25 unique varFreqs description pages now have Description, Methods, Data Access, References. No non-ASCII characters and no inline event handlers across the set.

diff --git src/hg/makeDb/trackDb/human/alfaVcf.html src/hg/makeDb/trackDb/human/alfaVcf.html
index ee9d5f9ce22..e24d60d4069 100644
--- src/hg/makeDb/trackDb/human/alfaVcf.html
+++ src/hg/makeDb/trackDb/human/alfaVcf.html
@@ -1,44 +1,53 @@
 <h2>Description</h2>
 <p>
 The <a href="https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/" target="_blank">NCBI ALlele Frequency
 Aggregator (ALFA)</a> pipeline computes allele frequencies from approved, unrestricted dbGaP studies
 and makes them publicly available through dbSNP. Its goal is to release frequency data from over
 one million dbGaP subjects to aid discoveries involving common and rare variants with biological
 or disease relevance. The R4 release includes 408,709 subjects and allele frequencies for
 15.5 million rs sites, including nearly one million ClinVar variants.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a>.
-For programmatic access, our <a href="https://api.genome.ucsc.edu">REST API</a> can be used; the
+For programmatic access, our <a href="https://api.genome.ucsc.edu" target="_blank">REST API</a> can be used; the
 track name is <em>alfaVcf</em>.
 For bulk download, the VCF file can be obtained from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/" target="_blank">our download server</a>.
 </p>
 <p>
 We converted the NCBI track hub to VCF format; the data is freely available.
 Genotype and associated individual-level data are accessible through the dbGaP
 <a href="https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login"
 target="_blank">authorized access request</a> system.
 </p>
 
 <h2>Methods</h2>
 <p>
 The ALFA pipeline processes genotype data from approved, unrestricted dbGaP studies, including
 chip array, exome, and genomic sequencing data. Selected study data undergoes quality assurance
 and transformation to standard VCF format. Variants are converted to SPDI notation and normalized
 using VOCA, then aggregated, remapped, and clustered to existing dbSNP rs identifiers or assigned
 new ones. Sample ancestries are validated using GRAF-pop and assigned to 12 major populations.
 QC exclusions include variants and subjects with call rate &lt;95%, datasets failing Ancestry
 Informative Markers consistency checks, and array datasets with conflicting or flipped allele
 orientation.
 </p>
 <p>
 The ALFA R4 bigBed files (904M variants) were converted to VCF using a custom script, retaining
 the 163M variants with non-zero allele frequency (146M SNPs, 17M indels).
 We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target="_blank">makeDoc file</a> of the track.
 For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target="_blank">GitHub</a>.
 </p>
+
+<h2>References</h2>
+<p>
+NCBI ALFA does not yet have a peer-reviewed primary publication. Cite the project as:
+Phan L, Jin Y, Zhang H, Qiang W, Shekhtman E, Shao D <em>et al</em>.
+<a href="https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/" target="_blank">
+ALFA: Allele Frequency Aggregator</a>.
+National Center for Biotechnology Information, U.S. National Library of Medicine, 10 March 2020.
+</p>