0c1e751423b38dd741875d4cdcc6ffb5d4c4a135
max
  Tue May 12 07:51:34 2026 -0700
mei: add DeepMEI 1000G subtrack on hg38

91,617 MEIs (68,282 Alu, 16,891 L1, 6,444 SVA) called by DeepMEI
on the 3,202 high-coverage 1000 Genomes samples. Same 1-bp anchor
convention and Okabe-Ito colors as meiHgsvc3. DeepMEI's symbolic
ALT carries no inserted sequence or insertion length, so the
bigBed schema is a subset of meiHgsvc3 (no svLen, callerCount,
validation flags, insertSeq). Also fixes the INS-svLen:carrierCount
label format note in meiHgsvc3.html. refs #37524

diff --git src/hg/makeDb/trackDb/human/meiHgsvc3.html src/hg/makeDb/trackDb/human/meiHgsvc3.html
index 4be1fd837ac..36341cbb9db 100644
--- src/hg/makeDb/trackDb/human/meiHgsvc3.html
+++ src/hg/makeDb/trackDb/human/meiHgsvc3.html
@@ -1,158 +1,158 @@
 <h2>Description</h2>
 <p>
 This track shows <b>mobile element insertions (MEIs)</b> identified
 in the Human Genome Structural Variation Consortium phase 3 (HGSVC3)
 callset. These insertions were detected in 65 long-read assembled
 samples relative to the reference assembly: at each site, at least one
 of the 65 samples carries an inserted mobile element that is absent
 from the reference. Two parallel callsets were released against the
 two human reference assemblies:
 </p>
 
 <table class="stdTbl">
 <tr>
   <th>Class</th>
   <th>GRCh38 / hg38</th>
   <th>T2T-CHM13 / hs1</th>
 </tr>
 <tr><td>Alu</td><td>10,270</td><td>10,458</td></tr>
 <tr><td>L1</td><td>1,604</td><td>1,664</td></tr>
 <tr><td>SVA</td><td>764</td><td>791</td></tr>
 <tr><td>HERVK</td><td>3</td><td>5</td></tr>
 <tr><td>snRNA</td><td>1</td><td>1</td></tr>
 <tr><th>Total</th><th>12,642</th><th>12,919</th></tr>
 </table>
 
 <p>
 For each MEI, the track lists the element class and family, the length
 of the inserted sequence, the discovery sample, the number of carrier
 haplotypes/samples carrying the insertion, the alt-allele frequency,
 the number of MEI callers (out of two) that supported the call, separate
 L1ME-AID and PALMER validation flags, overlap with reference
 segmental duplications and tandem repeats, and the full DNA sequence
 of the inserted mobile element (the ALT allele minus the anchor base).
 </p>
 
 <h2>Display Conventions and Configuration</h2>
 <p>
 An insertion has zero length on the reference: it attaches between two
 adjacent reference bases without replacing any of them. Following the
 VCF convention used by the underlying PAV calls and by the other
 long-read SV tracks, each MEI is drawn as a <b>1-bp block sitting on the
 anchor base</b> &mdash; the reference base immediately to the left of
 the insertion attachment point. The inserted mobile element itself is
 not present in the reference and is therefore not drawn; its length is
 reported on the detail page (svLen) and in the item label
-(<tt>INS-svLen-carrierCount</tt>). bigBed does support truly zero-width
+(<tt>INS-svLen:carrierCount</tt>). bigBed does support truly zero-width
 features between nucleotides, but for consistency with the
 <a href="hgTrackUi?g=lrSv">long-read SV tracks</a>, this track uses the
 1-bp anchor representation instead.
 </p>
 <p>
 Items are colored by element class:
 </p>
 <ul>
   <li><span style="display:inline-block;background-color:#0072B2;width:18px;height:12px;vertical-align:middle;"></span> <b>Alu</b> &mdash; SINE (Short INterspersed Element)</li>
   <li><span style="display:inline-block;background-color:#D55E00;width:18px;height:12px;vertical-align:middle;"></span> <b>L1</b> &mdash; LINE-1 (Long INterspersed Element-1)</li>
   <li><span style="display:inline-block;background-color:#009E73;width:18px;height:12px;vertical-align:middle;"></span> <b>SVA</b> (SINE-VNTR-Alu) &mdash; composite retrotransposon</li>
   <li><span style="display:inline-block;background-color:#CC79A7;width:18px;height:12px;vertical-align:middle;"></span> <b>HERVK</b> (Human Endogenous Retrovirus K) &mdash; endogenous retrovirus</li>
   <li><span style="display:inline-block;background-color:#000000;width:18px;height:12px;vertical-align:middle;"></span> <b>snRNA</b> &mdash; small nuclear RNA</li>
 </ul>
 <p>
 The score column encodes the alt-allele frequency on a 0-1000 scale.
 Filters allow restricting to specific element classes, length ranges,
 allele frequency, carrier counts, supporting callers, validation status,
 and reference repeat overlap.
 </p>
 
 <h2>Methods</h2>
 <p>
 The HGSVC3 study sequenced and de novo assembled 65 individuals
 (30 males, 35 females) representing five continental groups and 28
 populations: 30 of African, 9 of Admixed American, 8 of European, 10 of
 East Asian and 8 of South Asian descent, with three parent-child trios
 included. Each sample was sequenced to ~47-fold coverage of PacBio HiFi
 and ~56-fold coverage of Oxford Nanopore long reads (~36-fold
 ultra-long), and complemented with Strand-seq, Bionano optical mapping,
 Hi-C, Iso-Seq and RNA-seq. Haplotype-resolved diploid assemblies were
 produced and structural variants called with PAV. Mobile element
 insertions were identified from the union of two independent MEI
 callsets, L1ME-AID and PALMER; all single-caller calls were manually
 curated. Orthogonal validation against an independent MELT-LRA callset
 showed an average concordance of 90.8% on GRCh38. Roughly 93% of MEIs
 are supported by both callers; the remaining single-caller calls split
 about 6:1 in favour of PALMER (PALMER-only ~6%, L1ME-AID-only ~1%).
 The Caller Count, PALMER Validated and L1ME-AID Validated filters can
 be used to restrict the display to the high-confidence dual-validated
 subset. Calls are restricted to non-low-confidence regions
 (i.e. excluding Yq12 and centromeres).
 For each site, per-sample genotypes from all 65 assembled samples are
 summarized into an alt-allele count, allele number, allele frequency
 and a list of carrier samples. See Logsdon et al. 2025 (Nature) for
 full methodological details.
 </p>
 
 <p>
 The original CSVs were downloaded from the
 <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Mobile_Elements/1.0/"
 target="_blank">HGSVC3 Mobile Elements release directory</a>
 (files <tt>MEI_Callset_GRCh38.ALL.20241211.csv.gz</tt> and
 <tt>MEI_Callset_T2T-CHM13.ALL.20241211.csv.gz</tt>) and converted to
 bigBed following the steps described in the
 <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/mei.txt"
 target="_blank">makeDoc file</a>. Conversion uses scripts in
 <a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/mei"
 target="_blank">src/hg/makeDb/scripts/mei</a>: VCF-style positions
 (1-based POS, anchor base) are converted to half-open BED coordinates
 (<tt>chromStart = POS - 1</tt>, <tt>chromEnd = chromStart + 1</tt>),
 genotypes are tallied across the 65 samples, and items are colored by
 mobile element class.
 </p>
 
 <h2>Data Access</h2>
 <p>
 The data can be explored interactively in table format with the
 <a href="../cgi-bin/hgTables">Table Browser</a> or the
 <a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from
 there to spreadsheet or tab-separated tables. From scripts, the data can
 be accessed through our <a href="https://api.genome.ucsc.edu">API</a>,
 track=<i>meiHgsvc3</i>.
 </p>
 <p>
 For automated download and analysis, the genome annotation is stored in
 a bigBed file that can be downloaded from
 <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/" target="_blank">
 our download server</a>.  The file for this track is called
 <tt>hgsvc3.bb</tt> in <tt>/gbdb/hg38/mei/</tt> (GRCh38) or
 <tt>/gbdb/hs1/mei/</tt> (T2T-CHM13). Individual regions or the whole
 genome annotation can be obtained using our tool <tt>bigBedToBed</tt>,
 which can be compiled from the source code or downloaded as a
 precompiled binary for your system. Instructions for downloading source
 code and binaries can be found
 <a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads">here</a>.
 The tool can also be used to obtain features within a given range, e.g.
 <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/mei/hgsvc3.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.
 </p>
 <p>
 The original annotation source data can be downloaded from the
 <a href="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/HGSVC3/release/Mobile_Elements/1.0/"
 target="_blank">HGSVC3 1000 Genomes FTP site</a>.
 </p>
 
 <h2>Credits</h2>
 <p>
 Thanks to the Human Genome Structural Variation Consortium phase 3
 (HGSVC3) for releasing the underlying assemblies and MEI callsets used
 to produce this track.
 </p>
 
 <h2>References</h2>
 <p>
 Logsdon GA, Ebert P, Audano PA, Loftus M, Porubsky D, Ebler J, Yilmaz F, Hallast P, Prodanov T, Yoo
 D <em>et al</em>.
 <a href="https://doi.org/10.1038/s41586-025-09140-6" target="_blank">
 Complex genetic variation in nearly complete human genomes</a>.
 <em>Nature</em>. 2025 Aug;644(8076):430-441.
 PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/40702183" target="_blank">40702183</a>; PMC: <a
 href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12350169/" target="_blank">PMC12350169</a>
 </p>