src/hg/makeDb/trackDb/human/varFreqs.html 150d6eac0d7fa08368b11b25ad8f6b4e84143243

150d6eac0d7fa08368b11b25ad8f6b4e84143243
max
  Tue Dec 16 15:13:55 2025 -0800
more docs for var freqs track

diff --git src/hg/makeDb/trackDb/human/varFreqs.html src/hg/makeDb/trackDb/human/varFreqs.html
index f47b5d89e6b..177933905e0 100644
--- src/hg/makeDb/trackDb/human/varFreqs.html
+++ src/hg/makeDb/trackDb/human/varFreqs.html
@@ -63,30 +63,43 @@
         subjects to aid discoveries involving common and rare variants with biological
         or disease relevance. The R4 release includes 408,709 subjects and allele
         frequencies for 15.5 million rs sites, including nearly one million ClinVar
         variants. Genotype and associated individual-level data are accessible through dbGaP
         <a href="https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login"
         target="_blank">authorized access</a>.
     </li>
 
     <li>
         <b><a href="https://www.finngen.fi/en" target="_blank">FinnGen</a></b>:
         Imputed variants from 500,348 Biobank samples obtained using genotyping arrays
         in Finnland, 10% of the population. The imputation used phased variants obtained from 8,554
         high-quality whole genome sequences, also from Finnland. For details, see (Kurki et al, Nature 2023).
         Phenotype links can be shown at <a href="https://r12.finngen.fi/">FinnGen PheWeb</a>.
     </li>
+
+    <li>
+        <b><a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a></b>:
+        Whole-genome sequencing variant frequencies for 1000 Swedish individuals generated within the SweGen project.
+        The 1000 individuals included in the SweGen project represent a
+        cross-section of the Swedish population and that no disease information
+        has been used for the selection. The frequency data may therefore
+        include genetic variants that are associated with, or causative of,
+        disease. SweGen also provides SV calls, TEs, MELT results for TEs, HLAs and new sequence.
+        For details, see (Ameur et al, Eur J Hum Genet 2017).
+        Dataset can be browsed at the <a href="https://swefreq.nbis.se/dataset/SweGen/browser">SweGen Browser</a>.
+    </li>
+
     <li>
         <b><a href="https://jmorp.megabank.tohoku.ac.jp/downloads" target="_blank">JPN To61k Japan Tohoku University Tohoku Medical Megabank Organization 61k Allele frequency panel (JPN 61k)</a></b>:
         An allele frequency panel based on short-read WGS analysis of 61,000 Japanese individuals.
         The project includes other datatypes, such as STRs, long-read SVs and short-read CNVs.
         Data can be downloaded from the <a href="https://jmorp.megabank.tohoku.ac.jp"
         target="_blank">jMorp Website</a>, specifically the
         <a href="https://jmorp.megabank.tohoku.ac.jp/downloads" target="_blank">Downloads</a>
         section. For details, see (Tadaka et al, NAR 2023).
     </li>
 
     <li>
         <b><a href="https://abraom.ib.usp.br/"
         target="_blank">Brazil Arquivo Brasileiro Online de Muta&ccedil;&otilde; (ABraOM)</a></b>:
         Genomic variants obtained with whole-genome sequencing from SABE, a
         census-based sample of elderly individuals from S&atilde;o Paulo, Brazil's
@@ -181,30 +194,32 @@
 </p>
 <p>
 <b>GenomeAsia Pilot:</b> VCFs are available from UCSC and also from
 the <a target="_blank"
 href="https://browser.genomeasia100k.org/#tid=download">GenomeAsia 100K website</a>.
 No license nor login.
 </p>
 
 <p><b>KOVA:</b> 
         TSV data can be requested on the <a href="https://www.kobic.re.kr/kova/downloads"
         target="_blank">KOVA Downloads</a> website. 
 </p>
 
 <p><b>Finngen:</b> TSV data can be requested via the form at https://finngen.gitbook.io/documentation/data-download which triggers an email with the download link.</p>
 
+<p><b>SweGen:</b> We are allowed to redistribute the VCF, but under the condition that the file terms_of_use.txt is distributed with the file. You can find it <a target=_blank href="https://hgdownload.soe.ucsc.edu/gbdb/hg38/varFreqs/swegen">on our download server</a>, alongside the VCF file. </p>
+
 <p><b>NPM:</b> 
         VCF access can be requested on the 
         <a href="https://chorus.grids-platform.io/" target="_blank">Chorus Browser</a> website, which requires an 
         <a href = "https://npm.a-star.edu.sg/" target=_blank>account and data access request</a>. 
 </p>
 
 <h2>Methods</h2>
 <p>
 <b>MXB:</b> Genotyping was performed with the Illumina Multi-Ethnic Global Array
 (MEGA, ~1.8M SNPs), optimized for admixed populations and enriched for
 ancestry-informative and medically relevant variants. Only autosomal, biallelic
 SNPs passing quality control are included. Samples were selected from 898
 recruitment sites, with prioritization of indigenous language speakers. Data
 processing included GenomeStudio &rarr; PLINK conversion, strand alignment, removal
 of duplicates, update of map positions using dbSNP Build 151 and low-quality
@@ -216,30 +231,35 @@
 >https://sharehost.hms.harvard.edu/genetics/reich_lab/sgdp/vcf_variants/</a>,
 merged with bcftools and lifted to hg38 with CrossMap. 
 </p>
 <p>
 <b>KOVA:</b> V7 of the TSV.gz was obtained from the KOVA staff and converted to VCF. It is not
 available for download from our site but can be requested from the KOVA website.
 </p>
 
 <p><b>Finngen:</b> R12 annotated variants were downloaded from the Google Cloud
 bucket link received though an email after filling out the form linked from
 https://finngen.gitbook.io/documentation/data-download and converted to VCF
 with a <a
 href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/finngen_to_vcf.py"
 target=_blank>custom Python script</a>. </p>
 
+<p><b>SweGen:</b> Fragment size 350bp on a Covaris E220. Paired-end sequencing with 150 bp read length was performed on Illumina HiSeq X (HiSeq Control Software 3.3.39/RTA 2.7.1) with v2.5 sequencing chemistry. Raw whole-genome reads were aligned to the GRCh37 reference using BWA-MEM v0.7.12, then sorted and indexed with samtools v0.1.19 and assessed with qualimap v2.2.20; per-sample alignments from multiple lanes and flow cells were merged using Picard MergeSamFiles v1.120. Processing followed GATK best practices with GATK v3.3, including indel realignment (RealignerTargetCreator, IndelRealigner), duplicate marking (Picard MarkDuplicates v1.120), and base quality score recalibration (BaseRecalibrator), producing one finalized BAM per sample. Per-sample gVCFs were generated with GATK HaplotypeCaller v3.3 using reference files from the GATK v2.8 resource bundle, with all steps coordinated via Piper v1.4.0. Joint genotyping of 1,000 samples was performed by merging gVCFs in five batches of 200 using GATK CombineGVCFs, followed by cohort genotyping with GATK GenotypeGVCFs and variant quality score recalibration for SNVs and indels using VariantRecalibrator and ApplyRecalibration.
+<BR>At UCSC, the hg38 VCF was downloaded 
+from <a target=_blank href="https://swefreq.nbis.se/dataset/SweGen/download">SweFreq</a>.
+</p>
+
 <p><b>NPM Singapore:</b> Whole Genome Sequencing (WGS) data processing followed
 GATK4 best practices. GATK4 germline variant analysis workflow written in WDL
 was adapted to use Nextflow and deployed at the National Supercomputing Centre,
 Singapore (NSCC). In short, WGS reads were aligned against GRCh38 using the
 BWA-MEM algorithm and used as input to GATK HaplotypeCaller to produce single
 sample gVCFs. The gVCF files were joint-called then loaded in Hail, an
 open-source python-based data analysis library suited to work with
 population-scale with genomic data collections. Low-quality WGS libraries and
 low-quality variants were removed.  QC-ed variants were functionally annotated
 using Ensembl Variant Effect Predictor (VEP) (version 95). Functional
 annotations for variant impacting protein-coding were also complemented with
 information on the potential alteration to their cognate protein's 3D structure
 and drug binding ability.
 </p>
 
@@ -278,30 +298,39 @@
 Collaborators. This research has been conducted using the UK Biobank Resource
 under application number 26041.
 </p>
 <p>
 <b>SGDP:</b> This project was funded by the Simons Foundation. Thanks to David Reich and Swapan 
 Mallick for help with importing the data.
 </p>
 <p>
 <b>KOVA:</b> Thanks to Insu Jang and the KOVA director for providing variant frequencies in TSV
 format.
 </p>
 <p>
 <b>Finngen:</b> We want to acknowledge the participants and investigators of the FinnGen study.
 </p>
 
+<p>
+<b>SweGen:</b> The SweGen allele frequency data was generated by Science for
+Life Laboratory. The data may be redistributed in original or modified form,
+but must always be distributed together with the file "terms_of_use.txt" that
+is stored together with the data on our download server, and any redistributed
+data derived from the SweGen data set must follow those terms and conditions.
+The data may not be used to attempt to identify any individual in this or other studies.
+</p>
+
 <p>
 <b>NPM Singapore:</b> Thanks to the NPM Data Access Committee and Eleanor for granting our data request. 
 By browsing the data, you agree to use the data only for academic, non-commercial
 research to improve human health (biology/disease).  We request all data users
 agree to protect the
 confidentiality of the data subjects in any research papers or publications
 that they may prepare, by taking all reasonable care to limit the possibility
 of identification. In particular, the data users shall not to use, or attempt
 to use, the data to deliberately compromise or otherwise infringe the
 confidentiality of information on data subjects and their right to privacy.
 If you use any of the data obtained from the CHORUS variant browser, we request
 that you cite the NPM flagship paper (Wong et al, 2023). All data users of the
 data must take note that the data provider and relevant SG10K_Health cohort
 owners bear no responsibility for the further analysis or interpretation of the
 data.  </p>
@@ -457,15 +486,28 @@
 <em>Nat Genet</em>. 2023 Feb;55(2):178-186.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/36658435" target="_blank">36658435</a>
 </p>
 
 
 
 <p>
 Malomane DK, Williams MP, Huber CD, Mangul S, Abedalthagafi M, Chiang CWK.
 <a href="https://doi.org/10.1101/2025.01.10.632500" target="_blank">
 Patterns of population structure and genetic variation within the Saudi Arabian population</a>.
 <em>bioRxiv</em>. 2025 Jan 13;.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39868174" target="_blank">39868174</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11761371/" target="_blank">PMC11761371</a>
 </p>
 
+
+
+<p>
+Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H
+<em>et al</em>.
+<a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank">
+SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish
+population</a>.
+<em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a>
+</p>
+