dc1e0e76dbe49861bd0ebe8db64e27f587737794 max Mon Mar 30 15:40:03 2026 -0700 adding two more phased variants tracks, refs #37306 diff --git src/hg/makeDb/trackDb/human/han945SvVcf.html src/hg/makeDb/trackDb/human/han945SvVcf.html new file mode 100644 index 00000000000..a24eed8e0a4 --- /dev/null +++ src/hg/makeDb/trackDb/human/han945SvVcf.html @@ -0,0 +1,64 @@ +<h2>Description</h2> +<p> +This track shows per-sample genotypes for 111,288 structural variants (SVs) +from 945 Han Chinese individuals, displayed as a VCF track. It is a companion +to the <a href="hgTrackUi?g=han945Sv">Han 945 SVs</a> bigBed track, which +shows the same variants with summary statistics and filters. +</p> +<p> +The VCF format allows the genome browser to display a genotype matrix showing +which of the 945 individuals carry each structural variant. +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +Each variant is shown with per-sample genotypes: 0/1 indicates the sample +carries the SV, 0/0 indicates it does not. The genotype coloring follows +standard VCF display conventions. +</p> +<p> +Samples are labeled Sample_001 through Sample_945, as the original data +release does not include individual sample identifiers. +</p> + +<h2>Methods</h2> +<p> +The original VCF from Gong et al. is a site-only file (no sample columns) +produced by merging per-sample SV calls with SURVIVOR v1.0.6. SURVIVOR +records which samples support each SV in the INFO/SUPP_VEC field — +a binary string of length 945, where each position represents one sample +and '1' indicates that sample's caller reported the SV. +</p> +<p> +To reconstruct per-sample genotypes, the SUPP_VEC was expanded into 945 +sample columns with a GT (genotype) FORMAT field. Samples with a '1' in +SUPP_VEC were assigned genotype 0/1 (heterozygous carrier); samples with +'0' were assigned 0/0 (homozygous reference). This is a simplification: +the original per-sample callers may have reported homozygous alternate (1/1) +genotypes for some individuals, but this information is not preserved in the +SURVIVOR merge. The conversion was performed with the script +<code>lrSvHan945SuppVecToVcf.py</code>. +</p> + +<h2>Data Access</h2> +<p> +The source VCF was downloaded from the OMIX repository (accession OED00945268) +at the National Genomics Data Center (NGDC). +</p> + +<h2>Credits</h2> +<p> +Thanks to Gong et al. for making their structural variant calls publicly available. +</p> + +<h2>References</h2> + +<p> +Gong J, Sun H, Wang K, Zhao Y, Huang Y, Chen Q, Qiao H, Gao Y, Zhao J, Ling Y <em>et al</em>. +<a href="https://doi.org/10.1038/s41467-025-56661-9" target="_blank"> +Long-read sequencing of 945 Han individuals identifies structural variants associated with +phenotypic diversity and disease susceptibility</a>. +<em>Nat Commun</em>. 2025 Feb 10;16(1):1494. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/39929826" target="_blank">39929826</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11811171/" target="_blank">PMC11811171</a> +</p>