0c1e751423b38dd741875d4cdcc6ffb5d4c4a135 max Tue May 12 07:51:34 2026 -0700 mei: add DeepMEI 1000G subtrack on hg38 91,617 MEIs (68,282 Alu, 16,891 L1, 6,444 SVA) called by DeepMEI on the 3,202 high-coverage 1000 Genomes samples. Same 1-bp anchor convention and Okabe-Ito colors as meiHgsvc3. DeepMEI's symbolic ALT carries no inserted sequence or insertion length, so the bigBed schema is a subset of meiHgsvc3 (no svLen, callerCount, validation flags, insertSeq). Also fixes the INS-svLen:carrierCount label format note in meiHgsvc3.html. refs #37524 diff --git src/hg/makeDb/trackDb/human/meiDeepmei1kg.html src/hg/makeDb/trackDb/human/meiDeepmei1kg.html new file mode 100644 index 00000000000..49e826c3f85 --- /dev/null +++ src/hg/makeDb/trackDb/human/meiDeepmei1kg.html @@ -0,0 +1,141 @@ +
+This track shows mobile element insertions (MEIs) called by +DeepMEI +on the 3,202 high-coverage 1000 Genomes Project samples (NYGC +re-sequencing) aligned to GRCh38. At each site, at least one of the +3,202 samples carries a non-reference insertion of an Alu, L1 (LINE-1) +or SVA mobile element. DeepMEI is a convolutional neural-network +caller that scans short-read alignments for the read-pair, split-read +and clipping signatures of a new insertion and classifies each +candidate site as Alu, L1 or SVA. +
+ +| Class | MEIs |
|---|---|
| Alu | 68,282 |
| L1 | 16,891 |
| SVA | 6,444 |
| Total | 91,617 |
+For each MEI, the track lists the element class, the alt-allele count, +allele number and allele frequency across the 3,202 samples, the number +of carrier samples, and the list of carrier sample IDs. +
+ ++An insertion has zero length on the reference: it attaches between +two adjacent reference bases without replacing any of them. Following +the VCF convention used by DeepMEI and by the other long-read SV +and MEI tracks, each MEI is drawn as a 1-bp block sitting on the +anchor base — the reference base immediately to the left of +the insertion attachment point. The inserted mobile element itself is +not present in the reference and is therefore not drawn; the source +VCF uses a symbolic ALT (e.g. <INS:ME:ALU>) and does not +report the inserted sequence or its exact length, so neither is shown +on this track. The item label is INS-class-carrierCount. +
++Items are colored by element class: +
++The score column encodes the alt-allele frequency on a 0-1000 scale. +Filters allow restricting to specific element classes, allele frequency +and carrier counts. +
+ ++DeepMEI is a deep convolutional neural network that detects +non-reference mobile element insertions from short-read whole-genome +sequencing. For every candidate site supported by an anomalous +read-pair, split-read or soft-clip signature, the surrounding alignment +pile-up is encoded as an image and passed through a CNN that classifies +the site as Alu, L1, SVA or background. The model was trained on +labelled MEIs from the 1000 Genomes phase 3 callset and orthogonal +long-read truth sets. For this track, DeepMEI was run on the +high-coverage (~30×) Illumina re-sequencing of all 3,202 +1000 Genomes Project samples produced by the New York Genome Center +(NYGC), giving 6,404 haplotypes per site. See Xu et al. 2023 (bioRxiv) +for full methodological details. +
+ ++The original VCF was downloaded from the DeepMEI GitHub repository +(file merge_1000g.latested.vcf.gz in +DeepMEI/1000g_high_callset/) and converted to +bigBed following the steps described in the +makeDoc file. Conversion uses scripts in +src/hg/makeDb/scripts/mei: VCF-style positions +(1-based POS, anchor base) are converted to half-open BED coordinates +(chromStart = POS - 1, chromEnd = chromStart + 1), +per-sample genotypes are tallied across the 3,202 samples, and items +are colored by mobile element class. +
+ ++The data can be explored interactively in table format with the +Table Browser or the +Data Integrator and exported from +there to spreadsheet or tab-separated tables. From scripts, the data can +be accessed through our API, +track=meiDeepmei1kg. +
++For automated download and analysis, the genome annotation is stored in +a bigBed file that can be downloaded from + +our download server. The file for this track is called +deepmei1kg.bb in /gbdb/hg38/mei/. Individual regions +or the whole genome annotation can be obtained using our tool +bigBedToBed, which can be compiled from the source code or +downloaded as a precompiled binary for your system. Instructions for +downloading source code and binaries can be found +here. +The tool can also be used to obtain features within a given range, e.g. +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/deepmei1kg.bb -chrom=chr21 -start=0 -end=100000000 stdout. +
++The original annotation source data can be downloaded from the +DeepMEI GitHub repository. +
+ ++Thanks to Xiaofei Xu, Fengxiao Bu and colleagues for developing +DeepMEI and releasing the 1000 Genomes MEI callset, and to the New +York Genome Center for producing the underlying high-coverage +1000 Genomes re-sequencing data. +
+ ++Xu X, Huang Y, Wang X, Cheng J, Yuan H, Bu F. + +Identification of mobile element insertion from whole genome sequencing +data using deep neural network model. +bioRxiv. 2023 March 8. doi:10.1101/2023.03.07.531451. +
++Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, Corvelo A, Clarke WE, Musunuri R, +Nagulapalli K et al. + +High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 +trios. +Cell. 2022 Sep 1;185(18):3426-3440.e19. +PMID: 36055201; PMC: PMC9439720 +