4f8f8773bec66a9e993e9897e0b032c6e97dead8 max Fri May 15 10:12:29 2026 -0700 mei: add HMEID, SweGen, and euL1db subtracks Three new MEI catalogues under the existing mei superTrack: meiHmeid (hg38) 36,699 MELT MEIs from HMEID v1.1 (NyuWa+1KGP, 5,675 individuals, Niu et al. 2022, PMID 35212372). Site-level VCF; per-cohort and per-1KGP super- population AC/AN/AF; SVTYPE Alu/L1/SVA/HERVK. meiSwegen (hg38 lifted) 18,090 MELT MEIs from the SweGen 1,000-sample Swedish cohort (Ameur 2017, PMID 28832569; Gardner 2017, PMID 28855259). Built on hg19, liftOver to hg38 (10 unmapped). tableBrowser off per SweGen distribution terms. meiEul1db (hg19+hg38) 8,988 curated L1-HS insertion polymorphisms (MRIPs) from euL1db v1.00 (Mir 2015, PMID 25352549), aggregating 142,495 sample-level SRIPs across 32 published studies. Coloured by lineage (germline/somatic/mixed). Built on hg19, liftOver to hg38 (3 unmapped). Helman2014 used numeric chrom names (23=X, 24=Y) which are renamed during the build. meiEul1dbRef (hg19+hg38) 1,540 reference-genome L1-HS copies catalogued by euL1db (companion to meiEul1db). Single shared mei.ra (in human/) uses $D substitution so each stanza serves both assemblies where applicable. refs #37524 diff --git src/hg/makeDb/trackDb/human/meiHmeid.html src/hg/makeDb/trackDb/human/meiHmeid.html new file mode 100644 index 00000000000..183d6b9cf7b --- /dev/null +++ src/hg/makeDb/trackDb/human/meiHmeid.html @@ -0,0 +1,151 @@ +

Description

+This track shows mobile element insertions (MEIs) from the +HMEID +database (v1.1), a catalogue of 36,699 non-reference MEIs called from +short-read whole-genome sequencing of 5,675 individuals. The cohort +combines 2,998 Chinese samples from the NyuWa dataset (~26.2× +coverage) with 2,677 samples from the 1000 Genomes Project (~7.4× +coverage), and the calls are reported against GRCh38. Each site is an +insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile element relative +to the reference. +

+ + + + + + + + +

Class	MEIs
Alu	26,553
L1	7,353
SVA	2,667
HERVK	126
Total	36,699

+ +

+For each MEI, the track reports the mobile element class, the insertion +length, the target-site duplication sequence, the MELT ASSESS quality +score, and allele counts / numbers / frequencies for the full cohort, +for NyuWa, for 1KGP, and for each of the five 1KGP super-populations +(AFR, AMR, EAS, EUR, SAS). +

+ +

Display Conventions and Configuration

+An insertion has zero length on the reference: it attaches between +two adjacent reference bases without replacing any of them. Following +the VCF convention used by HMEID and by the other MEI tracks in this +collection, each MEI is drawn as a 1-bp block sitting on the anchor +base — the reference base immediately to the left of the +insertion attachment point. The inserted mobile element itself is not +present in the reference and is therefore not drawn. The item label is +class-altAlleleCount. +

+ +

+Items are colored by element class: +

Alu — SINE (Short INterspersed Element)
L1 — LINE-1 (Long INterspersed Element-1)
SVA (SINE-VNTR-Alu) — composite retrotransposon
HERVK (Human Endogenous Retrovirus K) — endogenous retrovirus

+ +

+The score column encodes the cohort-wide alt-allele frequency on a +0-1000 scale. Filters allow restricting the displayed items by element +class, insertion length, allele frequency in the full cohort, allele +frequency within the NyuWa and 1KGP cohorts separately, and by the +MELT ASSESS quality score. The ASSESS score ranges from 0 to 5; HMEID +sites are pre-filtered to ASSESS ≥ 3, meaning at least one-side +TSD evidence, with 5 representing the highest quality (TSD decided +from split reads). +

+ +

Methods

+HMEID was built by Niu et al. (2022) from Illumina short-read whole-genome +sequencing of two cohorts: 2,999 individuals from the NyuWa dataset +(diabetes and control samples collected across China, median depth +~26.2× on GRCh38) and 2,691 samples from the 1000 Genomes Project +(~7.4× coverage, GRCh38-aligned CRAMs from EBI). Non-reference +MEIs were detected with MELT v2.1.5 in SPLIT mode with default +parameters; BAM coverage was estimated with goleft v0.1.8 covstats. +After the MELT MakeVCF step, sites were filtered to those that (i) lie +outside low-complexity regions, (ii) are genotyped in >25% of +individuals, (iii) have more than 2 split reads, (iv) carry a MELT +ASSESS score >3 (i.e. ASSESS ≥ 4 in the unfiltered output, but +HMEID retains ASSESS 3 sites that otherwise pass) and (v) are marked +PASS in the FILTER column. Alu and L1 subfamilies were assigned by +MELT's CALU and LINEU modules. 2,998 of 2,999 NyuWa samples and 2,677 +of 2,691 1KGP samples passed processing, yielding the final callset of +36,699 MEIs in 5,675 genomes. Allele frequencies were computed per +cohort and per 1KGP super-population with BCFtools v1.3.1. See +Niu et al. 2022 for full methodological details. +

+ +

+The site-level VCF was downloaded from the +HMEID +download page (file +MEI.GRCh38.HMEIDv1.1.vcf.gz) and converted to bigBed following +the steps in the +makeDoc file. Conversion uses scripts in +src/hg/makeDb/scripts/mei: VCF-style positions +(1-based POS, anchor base) are converted to half-open BED coordinates +(chromStart = POS - 1, chromEnd = chromStart + 1), +MELT SVTYPE codes (ALU, LINE1, SVA, +HERVK) are mapped to the element class names used here, and +INFO fields are copied through to per-cohort and per-super-population +allele count / number / frequency columns. All 36,699 input records +produced one BED row each; no records were dropped. +

+ +

Data Access

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator and exported from +there to spreadsheet or tab-separated tables. From scripts, the data can +be accessed through our API, +track=meiHmeid. +

+For automated download and analysis, the genome annotation is stored in +a bigBed file that can be downloaded from + +our download server. The file for this track is called +hmeid.bb in /gbdb/hg38/mei/. Individual regions or the +whole genome annotation can be obtained using our tool +bigBedToBed, which can be compiled from the source code or +downloaded as a precompiled binary for your system. Instructions for +downloading source code and binaries can be found +here. +The tool can also be used to obtain features within a given range, e.g. +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/hmeid.bb -chrom=chr21 -start=0 -end=100000000 stdout. +

+The original annotation source data can be downloaded from the +HMEID +download page. +

+ +

Credits

+Thanks to Yiwei Niu, Shunmin He and colleagues at the Institute of +Biophysics, Chinese Academy of Sciences for building HMEID and +releasing the callset, and to the NyuWa project and the 1000 Genomes +Project for producing the underlying whole-genome sequencing data. +

+ +

References

+Niu Y, Teng X, Zhou H, Shi Y, Li Y, Tang Y, Zhang P, Luo H, Kang Q, Xu T et al. + +Characterizing mobile element insertions in 5675 genomes. +Nucleic Acids Res. 2022 Mar 21;50(5):2493-2508. +PMID: 35212372; PMC: PMC8934628 +