src/hg/makeDb/trackDb/human/varFreqs.html ad032e072d0427c066c425a88672288ed1b6c133

ad032e072d0427c066c425a88672288ed1b6c133
max
  Wed Jan 21 08:13:31 2026 -0800
fixes after code review and feedback from mark, refs #36978, and refs #36917

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index e2b62660951..39d2482b47a 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -36,31 +36,31 @@
 
     <li>
         <b><a href="https://topmed.nhlbi.nih.gov/" target="_blank">NHLBI TOPMED Freeze 10</a></b>:
         NHLBI TOPMed (Trans-Omics for Precision
         Medicine) program, launched by the U.S. National Heart, Lung, and Blood
         Institute, integrates whole-genome sequencing with molecular, clinical,
         and environmental data from large, well-phenotyped cohorts. Its goal is to
         uncover the biological mechanisms underlying heart, lung, blood, and sleep
         disorders to advance precision medicine and improve population health. Freeze
         10 contains 868,581,653 variants from 150,899 whole genomes. VCFs were
         downloaded from <a href="https://bravo.sph.umich.edu/terms.html"
         target="_blank">BRAVO</a>.
     </li>
 
     <li>
-        <b><a href=""https://sparkforautism.org/ target="_blank">SFARI SPARK</a></b>:
+        <b><a href="https://sparkforautism.org/" target="_blank">SFARI SPARK</a></b>:
         The Simons Foundation Autism Research Initiative (SFARI) recruited
         a large cohort of families with autistic children who provided DNA
         samples and phenotypes.  54,558 families, parents and their children
         were sequenced, a total of 142,357 individuals with whole-exome (WES)
         and 12,519 with whole-genome sequencing (WGS).  The data contains
         32,559 trios and 8,895 quads (one sibling without autism), and 824
         twins. The same frequencies shown here
         are also available publicly on the <a href="https://genomes.sfari.org/" target=_blank>SFARI Genome Browser</a>. 
        See (SPARK et al, Neuron 2018) for details or the methods below on this page.
     </li>
 
     <li>
         <b><a href="https://www.genomeasia100k.org/"
         target="_blank">GenomeAsia Pilot (GAsP)</a></b>:
         Whole-genome sequencing data of 1,739 individuals from 219 population groups across Asia.
@@ -168,31 +168,31 @@
 When zoomed in, tracks display alleles with base-specific coloring. Homozygote
 data are shown as one letter, while heterozygotes will be displayed with both
 letters.
 </p>
 
 <p>
 For <b>NCBI ALFA:</b> This track has no single VCF with INFO fields, but uses multiple subtracks
 instead, one per ancestry.
 </p>
 
 
 <h2>Data Access</h2>
 <p>Most of the data in these tracks are not available for download from UCSC.
 Data can be browsed on our website.
 But the data can be downloaded for free from the original projects. Accessing the 
-data usually requires a click-through license or access request on the respectice websites, links are either
+data usually requires a click-through license or access request on the respective websites, links are either
 provided above in the project description or with more details here:
 </p>
 
 <p>
 <b>MXB:</b> Allele frequencies by geographical state and ancestry are available via
 the <a target="_blank" href="https://morenolab.shinyapps.io/mexvar/">MexVar platform</a>.
 Raw genotype data are available under controlled access at the
 EGA (Study: EGAS00001005797; Dataset: EGAD00010002361). For the VCFs, email
 andres.moreno@cinvestav.mx.
 </p>
 <p>
 <b>MCPS:</b> VCFs with summarized allele frequencies are available from
 the <a target="_blank" href="https://rgc-mcps.regeneron.com/">MCPS website</a>.
 </p>
 <p>
@@ -251,31 +251,32 @@
 merged with bcftools and lifted to hg38 with CrossMap. 
 </p>
 <p>
 <b>KOVA:</b>Raw reads were aligned to the GRCh38+decoy reference using BWA-MEM v0.7.17 with default parameters, followed by duplicate marking and coordinate sorting with MarkDuplicatesSpark, and base quality score recalibration using BQSRPipelineSpark in GATK v4.1.3.0; mapping quality control metrics were generated with Qualimap v2.2.1. Single-nucleotide variants and small insertions/deletions were called per sample using GATK HaplotypeCaller in GVCF mode (-ERC GVCF), and joint genotyping was performed by creating a GenomicsDB with GenomicsDBImport and following GATK Best Practices, including variant quality score recalibration (VQSR) retaining 99.7% of true SNVs and 99.0% of true indels based on training sets (workflow detailed in Supplementary Fig. 1). Downstream analyses followed a modified version of the gnomAD quality-control framework and were primarily conducted using Hail, an open-source Python library for large-scale genome analysis; after merging WES and WGS data in Hail, multiallelic variants and variants with genotype quality <20, read depth <10, allelic balance <0.2, or overlapping low-complexity regions were excluded (Supplementary Fig. 2).
 <br>
 At UCSC, V7 of the TSV.gz was obtained from the KOVA staff by email and converted to VCF. It is not
 available for download from our site but can be requested from the KOVA website.
 </p>
 
 <p>
 <b>ABraOM:</b> For Academic use only. Licensing for commercial use might be available under request and agreement.
 By using this resource you agree to cite the flagship paper (Naslavsky et al. Nat Comm 2022).
 Whole-genome sequencing was performed at Human Longevity Inc. using TruSeq Nano DNA HT libraries sequenced on Illumina HiSeqX instruments with 150 bp paired-end reads targeting 30× coverage, and reads were mapped to GRCh38 using ISIS software. Sample sex was validated by comparing CPMs of X chromosome and male-specific Y (MSY) reads relative to autosomes, yielding the expected female (~55,000 X CPM, <200 MSY CPM) and male (~27,500 X CPM, >550 MSY CPM) patterns. Germline SNVs and indels were called following GATK Best Practices (GATK v3.7) via per-sample GVCFs (HaplotypeCaller), joint genotyping (CombineGVCFs, GenotypeGVCFs), and Variant Quality Score Recalibration (VQSR-AS); multiallelic variants were split with an in-house script, left-aligned with BCFtools, and annotated using Annovar and custom scripts against dbSNP, 1000 Genomes, and gnomAD, with putative loss-of-function variants identified using LOFTEE v0.3-beta irrespective of confidence labels. Variant and genotype quality was further assessed using the in-house CEGH-Filter two-step algorithm based on depth and allele balance, and analyses retained only GATK VQSR-AS PASS variants and higher-confidence CEGH-Filter calls. Relatedness was assessed using KING and PC-Relate (GENESIS), retaining a single proband per related pair and excluding one contaminated sample (>3% by verifyBAMID), resulting in a final dataset of 1,171 unrelated individuals. Final samples achieved mean coverages ranging from 31.3× to 64.8×, with an average of 38.65× and a median of 36.6×.
 </p>
 
-<p><b>SFARI SPARK:</b> The project as approved by the Simons Foundation as 14584.1. WES and WGS Data were downloaded from 
+<p><b>SFARI SPARK:</b> The genome browser track project was approved by the Simons 
+Foundation as 14584.1. WES and WGS Data were downloaded from 
 <a href="https://base.sfari.org/" target="_blank">SFARI Base</a>.
 pVCFs were downloaded, anonymized with a script using bcftools and the fill-tags plugin and normalized,
 without a minimum allele frequency cutoff.<br>
 The methods are documented as follows by SFARI:<br>
 <b>WES</b>:
 This release consists of sequence and variant call data for 12,519
 unique individuals, of which 12,517 (99.98%) have available genome-wide
 SNP genotype data. Sequencing and genotyping of all samples in this
 release was performed at New York Genome Center (NYGC). DNA from saliva
 samples were extracted and prepared with PCR-free methods and sequenced
 with paired-end sequencing of 150 bases on the Illumina NovaSeq 6000
 system.  Alignment of reads to the human reference genome version
 GRCh38, duplicate read marking, and Base Quality Score Recalibration
 (BQSR) were performed by New York Genome Center (NYCG). Whole-genome
 sequencing data were processed using a standardized, functionally