aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/human/swefreq.html src/hg/makeDb/trackDb/human/swefreq.html new file mode 100644 index 00000000000..637bf7c416d --- /dev/null +++ src/hg/makeDb/trackDb/human/swefreq.html @@ -0,0 +1,68 @@ +<h2>Description</h2> +<p> +<a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a> provides +whole-genome sequencing variant frequencies for 1,000 Swedish individuals. +The 1,000 individuals represent a cross-section of the Swedish population and no disease +information was used for the selection. The frequency data may therefore include genetic variants +that are associated with, or causative of, disease. SweGen also provides SV calls, TEs, MELT +results for TEs, HLAs and a FASTA file with new sequence not in hg38. There is +also a version for the T2T CHM13 assembly. The full dataset can be browsed at +the +<a href="https://swefreq.nbis.se/dataset/SweGen/browser" target="_blank">SweGen Browser</a>. +</p> + +<h2>Data Access</h2> +<p> +Due to license restrictions, the data for this track cannot be downloaded from the UCSC +Genome Browser. The Table Browser, Data Integrator, and download server are not available +for this track. +</p> +<p> +VCF files can be requested at +<a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a> via a form. The request +needs manual approval, which usually is quick. If there is no reply, email SweGen directly. +</p> + +<h2>Methods</h2> +<p> +Fragment size 350bp on a Covaris E220. Paired-end sequencing with 150bp read length was performed +on Illumina HiSeq X (HiSeq Control Software 3.3.39/RTA 2.7.1) with v2.5 sequencing chemistry. +Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and +indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from +multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed +GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator, +IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score +recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were +generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle, +with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by +merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with +GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using +VariantRecalibrator and ApplyRecalibration. +</p> +<p> +At UCSC, the hg38 VCF was downloaded from +<a href="https://swefreq.nbis.se/dataset/SweGen/download" target="_blank">SweFreq</a> and loaded as-is. +The file that we use is swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz. +We provide documentation that indicates how all source files of the varFreqs track were converted in the <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/varFreqs.txt" target=_blank>makeDoc file</a> of the track. +For some tracks, python scripts were necessary and are also available from <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/varFreqs" target=_blank>Github</a>. +</p> + +<h2>Credits</h2> +<p> +The SweGen allele frequency data was generated by Science for Life Laboratory. +Any redistributed data derived from the SweGen data set must follow the SweGen terms and conditions. +The data may not be used to attempt to identify any individual in this or other studies. +Thanks to the SweGen patients and SciLifeLab for making the data available. +</p> + +<h2>References</h2> +<p> +Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, +Lundin P, Che H <em>et al</em>. +<a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank"> +SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish +population</a>. +<em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a> +</p>