4f8f8773bec66a9e993e9897e0b032c6e97dead8 max Fri May 15 10:12:29 2026 -0700 mei: add HMEID, SweGen, and euL1db subtracks Three new MEI catalogues under the existing mei superTrack: meiHmeid (hg38) 36,699 MELT MEIs from HMEID v1.1 (NyuWa+1KGP, 5,675 individuals, Niu et al. 2022, PMID 35212372). Site-level VCF; per-cohort and per-1KGP super- population AC/AN/AF; SVTYPE Alu/L1/SVA/HERVK. meiSwegen (hg38 lifted) 18,090 MELT MEIs from the SweGen 1,000-sample Swedish cohort (Ameur 2017, PMID 28832569; Gardner 2017, PMID 28855259). Built on hg19, liftOver to hg38 (10 unmapped). tableBrowser off per SweGen distribution terms. meiEul1db (hg19+hg38) 8,988 curated L1-HS insertion polymorphisms (MRIPs) from euL1db v1.00 (Mir 2015, PMID 25352549), aggregating 142,495 sample-level SRIPs across 32 published studies. Coloured by lineage (germline/somatic/mixed). Built on hg19, liftOver to hg38 (3 unmapped). Helman2014 used numeric chrom names (23=X, 24=Y) which are renamed during the build. meiEul1dbRef (hg19+hg38) 1,540 reference-genome L1-HS copies catalogued by euL1db (companion to meiEul1db). Single shared mei.ra (in human/) uses $D substitution so each stanza serves both assemblies where applicable. refs #37524 diff --git src/hg/makeDb/trackDb/human/meiSwegen.html src/hg/makeDb/trackDb/human/meiSwegen.html new file mode 100644 index 00000000000..e628b81ed32 --- /dev/null +++ src/hg/makeDb/trackDb/human/meiSwegen.html @@ -0,0 +1,169 @@ +<h2>Description</h2> +<p> +This track shows <b>mobile element insertions (MEIs)</b> identified by +<a href="https://melt.igs.umaryland.edu/" target="_blank">MELT</a> +on the <a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen</a> +cohort of 1,000 Swedish whole-genome samples (Ameur et al. 2017). Each +site is an insertion of an Alu, L1 (LINE-1), SVA or HERV-K mobile +element relative to the reference. The SweGen short-variant frequency +data for the same cohort is shown in the +<a href="hgTrackUi?g=swegen">SweGen variant frequencies</a> subtrack of +the Variant Frequencies collection. +</p> + +<table class="stdTbl"> +<tr><th>Class</th><th>MEIs</th></tr> +<tr><td>Alu</td><td>14,467</td></tr> +<tr><td>L1</td><td>2,429</td></tr> +<tr><td>SVA</td><td>1,131</td></tr> +<tr><td>HERVK</td><td>73</td></tr> +<tr><th>Total (GRCh37)</th><th>18,100</th></tr> +<tr><th>Total (after liftOver to hg38)</th><th>18,090</th></tr> +</table> + +<p> +For each MEI, the track reports the mobile element class, the +insertion length, the MELT subfamily call (e.g. AluYa5, L1Ta), the +target-site duplication sequence, the MELT ASSESS quality score, +nearby gene context if the insertion lies in or close to a gene, the +allele count (MELT_AN; despite the name this is the number of allele +observations, not the allele number), the alt-allele frequency, and +the MELT FILTER status. +</p> + +<h2>Display Conventions and Configuration</h2> +<p> +An insertion has zero length on the reference: it attaches between +two adjacent reference bases without replacing any of them. Following +the convention used by MELT and by the other MEI tracks in this +collection, each MEI is drawn as a <b>1-bp block sitting on the anchor +base</b> — the reference base immediately to the left of the +insertion attachment point. The inserted mobile element itself is not +present in the reference and is therefore not drawn. The item label is +<tt>class-altAlleleCount</tt>. +</p> + +<p> +Items are colored by element class: +</p> +<ul> + <li><span style="display:inline-block;background-color:#0072B2;width:18px;height:12px;vertical-align:middle;"></span> <b>Alu</b> — SINE (Short INterspersed Element)</li> + <li><span style="display:inline-block;background-color:#D55E00;width:18px;height:12px;vertical-align:middle;"></span> <b>L1</b> — LINE-1 (Long INterspersed Element-1)</li> + <li><span style="display:inline-block;background-color:#009E73;width:18px;height:12px;vertical-align:middle;"></span> <b>SVA</b> (SINE-VNTR-Alu) — composite retrotransposon</li> + <li><span style="display:inline-block;background-color:#CC79A7;width:18px;height:12px;vertical-align:middle;"></span> <b>HERVK</b> (Human Endogenous Retrovirus K) — endogenous retrovirus</li> +</ul> + +<p> +The score column encodes the alt-allele frequency on a 0-1000 scale. +Filters allow restricting items by element class, insertion length, +allele frequency, MELT ASSESS quality score (0-5) and the MELT FILTER +status. The track keeps both PASS and non-PASS sites; non-PASS sites +carry one of the MELT site-level filter codes: +</p> +<ul> + <li><tt>s25</tt> — more than 25% of samples have no data at the site</li> + <li><tt>rSD</tt> — ratio of left-side to right-side discordant pairs is more than two standard deviations from the mean</li> + <li><tt>hDP</tt> — more discordant pairs at the site are also split-read than expected</li> +</ul> + +<h2>Methods</h2> +<p> +The SweGen project sequenced 1,000 Swedish individuals on Illumina +HiSeq X with 150 bp paired-end reads (Covaris E220 fragmentation, ~350 +bp insert), and aligned the reads to the GRCh37 reference with +BWA-MEM v0.7.12. Mobile element insertions were called by +<a href="https://melt.igs.umaryland.edu/" target="_blank">MELT</a> +v2.0.2 (Gardner et al. 2017) in MELT-Split mode using the default +ALU, HERVK, LINE1 and SVA mobile-element zip packages, on all 1,000 +samples. Per-site allele counts and frequencies (MELT_AN and MELT_AF +in INFO) were computed across the cohort; the VCF does not contain +per-sample genotype columns. The analysis used the Perl SMELT +pipeline (<a href="https://github.com/J35P312/SMELT" +target="_blank">github.com/J35P312/SMELT</a>) on the UPPMAX Bianca +cluster in early 2018, by Diana Ekman, Jesper Eisfeldt and Daniel +Nilsson. +</p> + +<p> +The site-level VCF +<tt>MELT_SWEGEN.20180314.ALU_HERVK_LINE1_SVA.vcf</tt> was obtained +from the SweGen download portal +(<a href="https://swefreq.nbis.se/dataset/SweGen/download" +target="_blank">swefreq.nbis.se/dataset/SweGen/download</a>, access +requires a brief approval). The VCF uses GRCh37 contigs without a +"chr" prefix; the conversion adds the prefix, drops the VCF +header, maps SVTYPE codes (<tt>ALU</tt>, <tt>LINE1</tt>, <tt>SVA</tt>, +<tt>HERVK</tt>) to the element class names used here, copies INFO +fields through to the BED, and writes a bed9+9 file with 1-bp anchor +intervals. The hg19 BED was then lifted to hg38 with UCSC +<tt>liftOver</tt> (<tt>-tab -bedPlus=9</tt>), which mapped 18,090 of +18,100 records; 10 records fell into hg38-deleted regions and were +dropped. The lifted BED was sorted and converted to bigBed using the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/scripts/mei/meiSwegen.as" +target="_blank">meiSwegen.as</a> schema. Conversion and lift steps are +documented in the +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/mei.txt" +target="_blank">makeDoc file</a>; the scripts live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/mei" +target="_blank">src/hg/makeDb/scripts/mei</a>. +</p> + +<h3>Why the original GRCh37 MELT VCF rather than the GRCh38 SVDB files</h3> +<p> +The SweGen download portal also distributes a hg38 variant set +(<tt>SweGen38_{ALU,L1,SVA,HERV}.vcf</tt>) for the same 1,001 samples, +produced with SVDB after re-running on GRCh38. We chose to lift the +original GRCh37 MELT VCF instead because the hg38 SVDB files +contain 138,853 records (about 7.7× the MELT site count), and +roughly 60% of those records are singletons (<tt>OCC=1</tt>) without +any quality filter. They also drop most of the per-site annotation: +no MELT subfamily call (e.g. AluYa5, L1Ta), no insertion length +(<tt>SVLEN=0</tt> everywhere), no target-site duplication, no MELT +ASSESS quality score, no gene context and no FILTER stratification +(every site is marked <tt>PASS</tt>). The GRCh37 MELT VCF, lifted to +hg38, gives a much more informative and quality-filtered set, at the +cost of 10 records that fell into hg38-deleted regions. +</p> + +<h2>Data Access</h2> +<p> +Due to SweGen license restrictions, the underlying VCF and the bigBed +derived from it cannot be redistributed from the UCSC Genome Browser. +The Table Browser and download server are disabled for this track. To +obtain the source data, follow the request procedure at the +<a href="https://swefreq.nbis.se/dataset/SweGen" target="_blank">SweGen +download portal</a>. +</p> + +<h2>Credits</h2> +<p> +Thanks to Adam Ameur, Diana Ekman, Jesper Eisfeldt, Daniel Nilsson +and the SweGen consortium for generating and releasing the MELT MEI +callset, and to SciLifeLab for producing the underlying SweGen WGS +data. +</p> + +<h2>References</h2> + + +<p> +Ameur A, Dahlberg J, Olason P, Vezzi F, Karlsson R, Martin M, Viklund J, Kähäri AK, Lundin P, Che H +<em>et al</em>. +<a href="https://doi.org/10.1038/ejhg.2017.130" target="_blank"> +SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish +population</a>. +<em>Eur J Hum Genet</em>. 2017 Nov;25(11):1253-1260. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28832569" target="_blank">28832569</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5765326/" target="_blank">PMC5765326</a> +</p> + +<p> +Gardner EJ, Lam VK, Harris DN, Chuang NT, Scott EC, Pittard WS, Mills RE, 1000 Genomes Project +Consortium, Devine SE. +<a href="http://genome.cshlp.org/lookup/pmidlookup?view=long&pmid=28855259" target="_blank"> +The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology</a>. +<em>Genome Res</em>. 2017 Nov;27(11):1916-1929. +PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/28855259" target="_blank">28855259</a>; PMC: <a +href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5668948/" target="_blank">PMC5668948</a> +</p> +