4f8f8773bec66a9e993e9897e0b032c6e97dead8 max Fri May 15 10:12:29 2026 -0700 mei: add HMEID, SweGen, and euL1db subtracks Three new MEI catalogues under the existing mei superTrack: meiHmeid (hg38) 36,699 MELT MEIs from HMEID v1.1 (NyuWa+1KGP, 5,675 individuals, Niu et al. 2022, PMID 35212372). Site-level VCF; per-cohort and per-1KGP super- population AC/AN/AF; SVTYPE Alu/L1/SVA/HERVK. meiSwegen (hg38 lifted) 18,090 MELT MEIs from the SweGen 1,000-sample Swedish cohort (Ameur 2017, PMID 28832569; Gardner 2017, PMID 28855259). Built on hg19, liftOver to hg38 (10 unmapped). tableBrowser off per SweGen distribution terms. meiEul1db (hg19+hg38) 8,988 curated L1-HS insertion polymorphisms (MRIPs) from euL1db v1.00 (Mir 2015, PMID 25352549), aggregating 142,495 sample-level SRIPs across 32 published studies. Coloured by lineage (germline/somatic/mixed). Built on hg19, liftOver to hg38 (3 unmapped). Helman2014 used numeric chrom names (23=X, 24=Y) which are renamed during the build. meiEul1dbRef (hg19+hg38) 1,540 reference-genome L1-HS copies catalogued by euL1db (companion to meiEul1db). Single shared mei.ra (in human/) uses $D substitution so each stanza serves both assemblies where applicable. refs #37524 diff --git src/hg/makeDb/trackDb/human/meiHmeid.html src/hg/makeDb/trackDb/human/meiHmeid.html new file mode 100644 index 00000000000..183d6b9cf7b --- /dev/null +++ src/hg/makeDb/trackDb/human/meiHmeid.html @@ -0,0 +1,151 @@ +<h2>Description</h2> +<p> +This track shows <b>mobile element insertions (MEIs)</b> from the +<a href="http://bigdata.ibp.ac.cn/HMEID/" target="_blank">HMEID</a> +database (v1.1), a catalogue of 36,699 non-reference MEIs called from +short-read whole-genome sequencing of 5,675 individuals. The cohort +combines 2,998 Chinese samples from the NyuWa dataset (~26.2× +coverage) with 2,677 samples from the 1000 Genomes Project (~7.4× +coverage), and the calls are reported against GRCh38. Each site is an +insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile element relative +to the reference. +</p> + +<table class="stdTbl"> +<tr><th>Class</th><th>MEIs</th></tr> +<tr><td>Alu</td><td>26,553</td></tr> +<tr><td>L1</td><td>7,353</td></tr> +<tr><td>SVA</td><td>2,667</td></tr> +<tr><td>HERVK</td><td>126</td></tr> +<tr><th>Total</th><th>36,699</th></tr> +</table> + +<p> +For each MEI, the track reports the mobile element class, the insertion +length, the target-site duplication sequence, the MELT ASSESS quality +score, and allele counts / numbers / frequencies for the full cohort, +for NyuWa, for 1KGP, and for each of the five 1KGP super-populations +(AFR, AMR, EAS, EUR, SAS). +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +An insertion has zero length on the reference: it attaches between +two adjacent reference bases without replacing any of them. Following +the VCF convention used by HMEID and by the other MEI tracks in this +collection, each MEI is drawn as a <b>1-bp block sitting on the anchor +base</b> — the reference base immediately to the left of the +insertion attachment point. The inserted mobile element itself is not +present in the reference and is therefore not drawn. The item label is +<tt>class-altAlleleCount</tt>. +</p> + +<p> +Items are colored by element class: +</p> +<ul> + <li><span style="display:inline-block;background-color:#0072B2;width:18px;height:12px;vertical-align:middle;"></span> <b>Alu</b> — SINE (Short INterspersed Element)</li> + <li><span style="display:inline-block;background-color:#D55E00;width:18px;height:12px;vertical-align:middle;"></span> <b>L1</b> — LINE-1 (Long INterspersed Element-1)</li> + <li><span style="display:inline-block;background-color:#009E73;width:18px;height:12px;vertical-align:middle;"></span> <b>SVA</b> (SINE-VNTR-Alu) — composite retrotransposon</li> + <li><span style="display:inline-block;background-color:#CC79A7;width:18px;height:12px;vertical-align:middle;"></span> <b>HERVK</b> (Human Endogenous Retrovirus K) — endogenous retrovirus</li> +</ul> + +<p> +The score column encodes the cohort-wide alt-allele frequency on a +0-1000 scale. Filters allow restricting the displayed items by element +class, insertion length, allele frequency in the full cohort, allele +frequency within the NyuWa and 1KGP cohorts separately, and by the +MELT ASSESS quality score. The ASSESS score ranges from 0 to 5; HMEID +sites are pre-filtered to ASSESS ≥ 3, meaning at least one-side +TSD evidence, with 5 representing the highest quality (TSD decided +from split reads). +</p> + +<h2>Methods</h2> +<p> +HMEID was built by Niu et al. (2022) from Illumina short-read whole-genome +sequencing of two cohorts: 2,999 individuals from the NyuWa dataset +(diabetes and control samples collected across China, median depth +~26.2× on GRCh38) and 2,691 samples from the 1000 Genomes Project +(~7.4× coverage, GRCh38-aligned CRAMs from EBI). Non-reference +MEIs were detected with MELT v2.1.5 in SPLIT mode with default +parameters; BAM coverage was estimated with goleft v0.1.8 covstats. +After the MELT MakeVCF step, sites were filtered to those that (i) lie +outside low-complexity regions, (ii) are genotyped in >25% of +individuals, (iii) have more than 2 split reads, (iv) carry a MELT +ASSESS score >3 (i.e. ASSESS ≥ 4 in the unfiltered output, but +HMEID retains ASSESS 3 sites that otherwise pass) and (v) are marked +PASS in the FILTER column. Alu and L1 subfamilies were assigned by +MELT's CALU and LINEU modules. 2,998 of 2,999 NyuWa samples and 2,677 +of 2,691 1KGP samples passed processing, yielding the final callset of +36,699 MEIs in 5,675 genomes. Allele frequencies were computed per +cohort and per 1KGP super-population with BCFtools v1.3.1. See +Niu et al. 2022 for full methodological details. +</p> + +<p> +The site-level VCF was downloaded from the +<a href="http://bigdata.ibp.ac.cn/HMEID/download/" target="_blank">HMEID +download page</a> (file +<tt>MEI.GRCh38.HMEIDv1.1.vcf.gz</tt>) and converted to bigBed following +the steps in the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/mei.txt" +target="_blank">makeDoc file</a>. Conversion uses scripts in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/mei" +target="_blank">src/hg/makeDb/scripts/mei</a>: VCF-style positions +(1-based POS, anchor base) are converted to half-open BED coordinates +(<tt>chromStart = POS - 1</tt>, <tt>chromEnd = chromStart + 1</tt>), +MELT SVTYPE codes (<tt>ALU</tt>, <tt>LINE1</tt>, <tt>SVA</tt>, +<tt>HERVK</tt>) are mapped to the element class names used here, and +INFO fields are copied through to per-cohort and per-super-population +allele count / number / frequency columns. All 36,699 input records +produced one BED row each; no records were dropped. +</p> + +<h2>Data Access</h2> +<p> +The data can be explored interactively in table format with the +<a href="../cgi-bin/hgTables">Table Browser</a> or the +<a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from +there to spreadsheet or tab-separated tables. From scripts, the data can +be accessed through our <a href="https://api.genome.ucsc.edu">API</a>, +track=<i>meiHmeid</i>. +</p> +<p> +For automated download and analysis, the genome annotation is stored in +a bigBed file that can be downloaded from +<a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/" target="_blank"> +our download server</a>. The file for this track is called +<tt>hmeid.bb</tt> in <tt>/gbdb/hg38/mei/</tt>. Individual regions or the +whole genome annotation can be obtained using our tool +<tt>bigBedToBed</tt>, which can be compiled from the source code or +downloaded as a precompiled binary for your system. Instructions for +downloading source code and binaries can be found +<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>. +The tool can also be used to obtain features within a given range, e.g. +<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/hmeid.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>. +</p> +<p> +The original annotation source data can be downloaded from the +<a href="http://bigdata.ibp.ac.cn/HMEID/download/" target="_blank">HMEID +download page</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to Yiwei Niu, Shunmin He and colleagues at the Institute of +Biophysics, Chinese Academy of Sciences for building HMEID and +releasing the callset, and to the NyuWa project and the 1000 Genomes +Project for producing the underlying whole-genome sequencing data. +</p> + +<h2>References</h2> +<p> +Niu Y, Teng X, Zhou H, Shi Y, Li Y, Tang Y, Zhang P, Luo H, Kang Q, Xu T <em>et al</em>. +<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkac128" target="_blank"> +Characterizing mobile element insertions in 5675 genomes</a>. +<em>Nucleic Acids Res</em>. 2022 Mar 21;50(5):2493-2508. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/35212372" target="_blank">35212372</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8934628/" target="_blank">PMC8934628</a> +</p> +