aa61ebc800429515f9ced7e28f669c6042219f43 max Wed Mar 18 09:09:13 2026 -0700 varFreqs supertrack: add GREGoR track, update all HTML docs, move scripts to varFreqs/, refs #36642 Add GREGoR R04 WGS track to varFreqs superTrack. Update Data Access and Methods sections for all 20+ subtrack HTML files with consistent formatting, sequencing methods from source papers, and links to makeDoc and Github scripts. Move all varFreqs conversion scripts into scripts/varFreqs/ subdirectory and update makeDoc paths accordingly. Co-Authored-By: Claude Opus 4.6 diff --git src/hg/makeDb/trackDb/human/swefreq.html src/hg/makeDb/trackDb/human/swefreq.html new file mode 100644 index 00000000000..637bf7c416d --- /dev/null +++ src/hg/makeDb/trackDb/human/swefreq.html @@ -0,0 +1,68 @@ +

Description

+

+SweGen provides +whole-genome sequencing variant frequencies for 1,000 Swedish individuals. +The 1,000 individuals represent a cross-section of the Swedish population and no disease +information was used for the selection. The frequency data may therefore include genetic variants +that are associated with, or causative of, disease. SweGen also provides SV calls, TEs, MELT +results for TEs, HLAs and a FASTA file with new sequence not in hg38. There is +also a version for the T2T CHM13 assembly. The full dataset can be browsed at +the +SweGen Browser. +

+ +

Data Access

+

+Due to license restrictions, the data for this track cannot be downloaded from the UCSC +Genome Browser. The Table Browser, Data Integrator, and download server are not available +for this track. +

+

+VCF files can be requested at +SweGen via a form. The request +needs manual approval, which usually is quick. If there is no reply, email SweGen directly. +

+ +

Methods

+

+Fragment size 350bp on a Covaris E220. Paired-end sequencing with 150bp read length was performed +on Illumina HiSeq X (HiSeq Control Software 3.3.39/RTA 2.7.1) with v2.5 sequencing chemistry. +Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and +indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from +multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed +GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator, +IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score +recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were +generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle, +with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by +merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with +GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using +VariantRecalibrator and ApplyRecalibration. +

+

+At UCSC, the hg38 VCF was downloaded from +SweFreq and loaded as-is. +The file that we use is swegen_frequencies_fixploidy_GRCh38_20190204.vcf.gz. +We provide documentation that indicates how all source files of the varFreqs track were converted in the makeDoc file of the track. +For some tracks, python scripts were necessary and are also available from Github. +

+ +

Credits

+

+The SweGen allele frequency data was generated by Science for Life Laboratory. +Any redistributed data derived from the SweGen data set must follow the SweGen terms and conditions. +The data may not be used to attempt to identify any individual in this or other studies. +Thanks to the SweGen patients and SciLifeLab for making the data available. +

+ +

References

+

+Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, +Lundin P, Che H et al. + +SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish +population. +Eur J Hum Genet. 2017 Nov;25(11):1253-1260. +PMID: 28832569; PMC: PMC5765326 +