aa61ebc800429515f9ced7e28f669c6042219f43
max
  Wed Mar 18 09:09:13 2026 -0700
varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642

Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and
Methods sections for all 20+ subtrack HTML files with consistent formatting,
sequencing methods from source papers, and links to makeDoc and Github scripts.
Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and
update makeDoc paths accordingly.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/human/swefreq.html src/hg/makeDb/trackDb/human/swefreq.html
new file mode 100644
index 00000000000..637bf7c416d
--- /dev/null
+++ src/hg/makeDb/trackDb/human/swefreq.html
@@ -0,0 +1,68 @@
+<h2>Description</h2>
+<p>
+<a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a> provides
+whole-genome sequencing variant frequencies for 1,000 Swedish individuals.
+The 1,000 individuals represent a cross-section of the Swedish population and no disease
+information was used for the selection. The frequency data may therefore include genetic variants
+that are associated with, or causative of, disease. SweGen also provides SV calls, TEs, MELT
+results for TEs, HLAs and a FASTA file with new sequence not in hg38. There is
+also a version for the T2T CHM13 assembly.  The full dataset can be browsed at
+the
+<a href="https://swefreq.nbis.se/dataset/SweGen/browser" target="_blank">SweGen Browser</a>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+Due to license restrictions, the data for this track cannot be downloaded from the UCSC
+Genome Browser. The Table Browser, Data Integrator, and download server are not available
+for this track.
+</p>
+<p>
+VCF files can be requested at
+<a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a> via a form. The request
+needs manual approval, which usually is quick. If there is no reply, email SweGen directly.
+</p>
+
+<h2>Methods</h2>
+<p>
+Fragment size 350bp on a Covaris E220. Paired-end sequencing with 150bp read length was performed
+on Illumina HiSeq X (HiSeq Control Software 3.3.39/RTA 2.7.1) with v2.5 sequencing chemistry.
+Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and
+indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from
+multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed
+GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator,
+IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score
+recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were
+generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle,
+with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by
+merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with
+GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using
+VariantRecalibrator and ApplyRecalibration.
+</p>
+<p>
+At UCSC, the hg38 VCF was downloaded from
+<a href="https://swefreq.nbis.se/dataset/SweGen/download" target="_blank">SweFreq</a> and loaded as-is.
+The file that we use is swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz.
+We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track.
+For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+The SweGen allele frequency data was generated by Science for Life Laboratory. 
+Any redistributed data derived from the SweGen data set must follow the SweGen terms and conditions.
+The data may not be used to attempt to identify any individual in this or other studies.
+Thanks to the SweGen patients and SciLifeLab for making the data available.
+</p>
+
+<h2>References</h2>
+<p>
+Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, K&auml;h&auml;ri AK,
+Lundin P, Che H <em>et al</em>.
+<a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank">
+SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish
+population</a>.
+<em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a>
+</p>