65091fe6f6487c23d650a144e947fc1c582d3f40 max Tue Apr 21 02:16:16 2026 -0700 abelSv: move under lrSv supertrack as short-read comparison subtrack Move the Abel et al. 2020 CCDG 17,795-genome SV callset from a top-level hg38 track to a subtrack of the lrSv supertrack (parallel to onekg3202Sr) and relabel shortLabel/longLabel to flag Illumina short-read provenance. The same bigBed is now visible on hg38 in the long-read SV browsing context. Also: - Clarify abelSv.html variant counts: 738,624 upstream unique SVs across both callsets, 737,998 after B37->hg38 liftOver (626 unmapped). B38=458,106, B37lift=279,892. - lrSv.html: fix triple-slash https:/// in the Ebert et al. Science reference URL. - bigBed.html: add closing on the extra-fields pipe-separator bullet and tighten a comma in the same sentence. refs #36258, refs #37376 diff --git src/hg/makeDb/trackDb/human/hg38/abelSv.html src/hg/makeDb/trackDb/human/abelSv.html similarity index 91% rename from src/hg/makeDb/trackDb/human/hg38/abelSv.html rename to src/hg/makeDb/trackDb/human/abelSv.html index 82cd0ab81b5..7d5913fffb5 100644 --- src/hg/makeDb/trackDb/human/hg38/abelSv.html +++ src/hg/makeDb/trackDb/human/abelSv.html @@ -1,141 +1,145 @@

Description

Structural variants (SVs) are large changes in DNA — deletions, duplications, inversions, insertions of mobile elements, and translocations — that are at least 50 base pairs in size. They are a major source of genetic variation between individuals and can affect gene dosage, disrupt coding sequence, or rearrange regulatory elements. Because SVs are harder to detect than small variants, population-scale SV maps lag behind single-nucleotide variant resources.

-This track displays site-frequency data for 738,624 SVs identified in 17,795 -deeply sequenced human genomes (mean coverage > 20×) by +This track displays site-frequency data for 737,998 SVs identified in 17,795 +deeply sequenced human genomes (mean coverage > 20×) by Illumina +short-read sequencing by Abel et al., Nature 2020. The samples were sequenced by the four sequencing centers of the NHGRI Centers for Common Disease Genomics (CCDG) program, supplemented with ancestrally diverse samples from the PAGE consortium and the Simons Genome Diversity Project. The composition includes roughly 24% African, 16% Latino, 11% Finnish, 39% non-Finnish European, and 9% other ancestries.

-Two non-overlapping public callsets are displayed as a single track:

+Two non-overlapping public callsets are combined into this track. +The upstream release contains 738,624 unique primary SV records across +the two callsets; 626 B37 records did not lift over to GRCh38, leaving +the 737,998 shown here:

B38 (native GRCh38): 14,623 samples, called directly on the - GRCh38 assembly.
B37lift (GRCh37, lifted): 8,417 samples originally called on - GRCh37, with coordinates lifted to GRCh38 using the standard UCSC - hg19→hg38 liftOver chain. 626 variants could not be lifted.
B38 (native GRCh38): 458,106 SVs from 14,623 samples, called + directly on the GRCh38 assembly.
B37lift (GRCh37, lifted): 279,892 SVs from 8,417 samples + originally called on GRCh37, with coordinates lifted to GRCh38 + using the standard UCSC hg19→hg38 liftOver chain.

Important: the B38 and B37 callsets share 5,245 samples. When inspecting a variant present in both callsets, users should not simply sum the allele counts; the AC/AN reported for each callset reflects that callset's sample set. The callset filter can be used to restrict display to one source.

Display conventions

Items are colored by SV type:

DEL deletion
DUP duplication
INV inversion
MEI mobile-element insertion or MEI-derived deletion
BND breakend / translocation

Deletions, duplications, inversions, and mobile-element variants are drawn as intervals spanning from the variant start to its end. Breakend (BND) records are drawn as single-base items at the variant breakpoint; the mate chromosome and position are shown on the details page for each BND. Each BND pair from LUMPY is shown only once (the secondary mate record is suppressed).

Filters

The following filters are available from the track configuration page:

SV type — any combination of DEL, DUP, INV, MEI, BND.
Callset — B38 native, B37lift, or both.
Filter — PASS (high confidence) and/or LOW (low confidence, as flagged by the authors based on Mendelian-error rate).
Allele frequency (AF), Allele count (AC), SV length, and Mean sample quality (MSQ).

Per-population allele counts and numbers are shown on the details page for 8 ancestry groups: AFR (African), AMR (Latino/Admixed-American), NFE (non-Finnish European), FE (Finnish European), EAS (East-Asian), SAS (South-Asian), PI (Pacific Islander), and Other.

Methods

The authors used their open-source svtools pipeline to jointly call SVs across all samples. Per-sample calls were produced with LUMPY (v0.2.13), CNVnator (v0.3.3), and svtyper (v0.1.4); calls were merged across samples and refined with svtools. Low- and high-confidence variants were distinguished using a Mendelian-error cutoff on mean sample quality, calibrated against a set of 409 CEPH trios. Per-sample validation was performed against a PacBio long-read truth set derived from three HGSVC samples.

For this UCSC track, VCF INFO fields were parsed and converted to BED9+ format. Variants originally called on GRCh37 (B37 callset) were lifted to GRCh38 using the UCSC hg19ToHg38.over.chain.gz chain. See the track build documentation for full details.

Data Access

The data can be explored interactively in table format with the Table Browser or the Data Integrator and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our API, track=abelSv.

For automated download and analysis, the annotation is stored in a bigBed file that can be downloaded from our download server. The file for this track is called abelSv.bb. Individual regions or the whole genome annotation can be obtained using our tool bigBedToBed, which can be compiled from the source code or downloaded as a precompiled binary for your system. Instructions for downloading source code and binaries can be found here. The tool can also be used to obtain features within a given range, e.g. bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/abelSv/abelSv.bb -chrom=chr21 -start=0 -end=100000000 stdout

The original site-frequency VCF and BEDPE files are distributed by the authors from their supplementary-data GitHub repository.

Credits

Thanks to Haley J. Abel, David E. Larson, Ira M. Hall and colleagues at the McDonnell Genome Institute (Washington University in St. Louis), the Broad Institute, Baylor College of Medicine, the New York Genome Center, and the University of Washington for producing this resource and making the site-frequency callsets publicly available.

References

Abel HJ, Larson DE, Regier AA, Chiang C, Das I, Kanchi KL, Layer RM, Neale BM, Salerno WJ, Reeves C et al. Mapping and characterization of structural variation in 17,795 human genomes. Nature. 2020 Jul;583(7814):83-89. PMID: 32460305; PMC: PMC7547914