4f8f8773bec66a9e993e9897e0b032c6e97dead8 max Fri May 15 10:12:29 2026 -0700 mei: add HMEID, SweGen, and euL1db subtracks Three new MEI catalogues under the existing mei superTrack: meiHmeid (hg38) 36,699 MELT MEIs from HMEID v1.1 (NyuWa+1KGP, 5,675 individuals, Niu et al. 2022, PMID 35212372). Site-level VCF; per-cohort and per-1KGP super- population AC/AN/AF; SVTYPE Alu/L1/SVA/HERVK. meiSwegen (hg38 lifted) 18,090 MELT MEIs from the SweGen 1,000-sample Swedish cohort (Ameur 2017, PMID 28832569; Gardner 2017, PMID 28855259). Built on hg19, liftOver to hg38 (10 unmapped). tableBrowser off per SweGen distribution terms. meiEul1db (hg19+hg38) 8,988 curated L1-HS insertion polymorphisms (MRIPs) from euL1db v1.00 (Mir 2015, PMID 25352549), aggregating 142,495 sample-level SRIPs across 32 published studies. Coloured by lineage (germline/somatic/mixed). Built on hg19, liftOver to hg38 (3 unmapped). Helman2014 used numeric chrom names (23=X, 24=Y) which are renamed during the build. meiEul1dbRef (hg19+hg38) 1,540 reference-genome L1-HS copies catalogued by euL1db (companion to meiEul1db). Single shared mei.ra (in human/) uses $D substitution so each stanza serves both assemblies where applicable. refs #37524 diff --git src/hg/makeDb/trackDb/human/meiSwegen.html src/hg/makeDb/trackDb/human/meiSwegen.html new file mode 100644 index 00000000000..e628b81ed32 --- /dev/null +++ src/hg/makeDb/trackDb/human/meiSwegen.html @@ -0,0 +1,169 @@ +

Description

+This track shows mobile element insertions (MEIs) identified by +MELT +on the SweGen +cohort of 1,000 Swedish whole-genome samples (Ameur et al. 2017). Each +site is an insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile +element relative to the reference. The SweGen short-variant frequency +data for the same cohort is shown in the +SweGen variant frequencies subtrack of +the Variant Frequencies collection. +

+ + + + + + + + + +

Class	MEIs
Alu	14,467
L1	2,429
SVA	1,131
HERVK	73
Total (GRCh37)	18,100
Total (after liftOver to hg38)	18,090

+ +

+For each MEI, the track reports the mobile element class, the +insertion length, the MELT subfamily call (e.g. AluYa5, L1Ta), the +target-site duplication sequence, the MELT ASSESS quality score, +nearby gene context if the insertion lies in or close to a gene, the +allele count (MELT_AN; despite the name this is the number of allele +observations, not the allele number), the alt-allele frequency, and +the MELT FILTER status. +

+ +

Display Conventions and Configuration

+An insertion has zero length on the reference: it attaches between +two adjacent reference bases without replacing any of them. Following +the convention used by MELT and by the other MEI tracks in this +collection, each MEI is drawn as a 1-bp block sitting on the anchor +base — the reference base immediately to the left of the +insertion attachment point. The inserted mobile element itself is not +present in the reference and is therefore not drawn. The item label is +class-altAlleleCount. +

+ +

+Items are colored by element class: +

Alu — SINE (Short INterspersed Element)
L1 — LINE-1 (Long INterspersed Element-1)
SVA (SINE-VNTR-Alu) — composite retrotransposon
HERVK (Human Endogenous Retrovirus K) — endogenous retrovirus

+ +

+The score column encodes the alt-allele frequency on a 0-1000 scale. +Filters allow restricting items by element class, insertion length, +allele frequency, MELT ASSESS quality score (0-5) and the MELT FILTER +status. The track keeps both PASS and non-PASS sites; non-PASS sites +carry one of the MELT site-level filter codes: +

s25 — more than 25% of samples have no data at the site
rSD — ratio of left-side to right-side discordant pairs is more than two standard deviations from the mean
hDP — more discordant pairs at the site are also split-read than expected

+ +

Methods

+The SweGen project sequenced 1,000 Swedish individuals on Illumina +HiSeq X with 150 bp paired-end reads (Covaris E220 fragmentation, ~350 +bp insert), and aligned the reads to the GRCh37 reference with +BWA-MEM v0.7.12. Mobile element insertions were called by +MELT +v2.0.2 (Gardner et al. 2017) in MELT-Split mode using the default +ALU, HERVK, LINE1 and SVA mobile-element zip packages, on all 1,000 +samples. Per-site allele counts and frequencies (MELT_AN and MELT_AF +in INFO) were computed across the cohort; the VCF does not contain +per-sample genotype columns. The analysis used the Perl SMELT +pipeline (github.com/J35P312/SMELT) on the UPPMAX Bianca +cluster in early 2018, by Diana Ekman, Jesper Eisfeldt and Daniel +Nilsson. +

+ +

+The site-level VCF +MELT_SWEGEN.20180314.ALU_HERVK_LINE1_SVA.vcf was obtained +from the SweGen download portal +(swefreq.nbis.se/dataset/SweGen/download, access +requires a brief approval). The VCF uses GRCh37 contigs without a +"chr" prefix; the conversion adds the prefix, drops the VCF +header, maps SVTYPE codes (ALU, LINE1, SVA, +HERVK) to the element class names used here, copies INFO +fields through to the BED, and writes a bed9+9 file with 1-bp anchor +intervals. The hg19 BED was then lifted to hg38 with UCSC +liftOver (-tab -bedPlus=9), which mapped 18,090 of +18,100 records; 10 records fell into hg38-deleted regions and were +dropped. The lifted BED was sorted and converted to bigBed using the +meiSwegen.as schema. Conversion and lift steps are +documented in the +makeDoc file; the scripts live in +src/hg/makeDb/scripts/mei. +

+ +

Why the original GRCh37 MELT VCF rather than the GRCh38 SVDB files

+The SweGen download portal also distributes a hg38 variant set +(SweGen38_{ALU,L1,SVA,HERV}.vcf) for the same 1,001 samples, +produced with SVDB after re-running on GRCh38. We chose to lift the +original GRCh37 MELT VCF instead because the hg38 SVDB files +contain 138,853 records (about 7.7× the MELT site count), and +roughly 60% of those records are singletons (OCC=1) without +any quality filter. They also drop most of the per-site annotation: +no MELT subfamily call (e.g. AluYa5, L1Ta), no insertion length +(SVLEN=0 everywhere), no target-site duplication, no MELT +ASSESS quality score, no gene context and no FILTER stratification +(every site is marked PASS). The GRCh37 MELT VCF, lifted to +hg38, gives a much more informative and quality-filtered set, at the +cost of 10 records that fell into hg38-deleted regions. +

+ +

Data Access

+Due to SweGen license restrictions, the underlying VCF and the bigBed +derived from it cannot be redistributed from the UCSC Genome Browser. +The Table Browser and download server are disabled for this track. To +obtain the source data, follow the request procedure at the +SweGen +download portal. +

+ +

Credits

+Thanks to Adam Ameur, Diana Ekman, Jesper Eisfeldt, Daniel Nilsson +and the SweGen consortium for generating and releasing the MELT MEI +callset, and to SciLifeLab for producing the underlying SweGen WGS +data. +

+ +

References

+ + +

+Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H +et al. + +SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish +population. +Eur J Hum Genet. 2017 Nov;25(11):1253-1260. +PMID: 28832569; PMC: PMC5765326 +

+ +

+Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, Mills RE, 1000 Genomes Project +Consortium, Devine SE. + +The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. +Genome Res. 2017 Nov;27(11):1916-1929. +PMID: 28855259; PMC: PMC5668948 +