bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/abelSv.html src/hg/makeDb/trackDb/human/abelSv.html index 7d5913fffb5..858d071de16 100644 --- src/hg/makeDb/trackDb/human/abelSv.html +++ src/hg/makeDb/trackDb/human/abelSv.html @@ -69,45 +69,61 @@ <li><b>Callset</b> — B38 native, B37lift, or both.</li> <li><b>Filter</b> — PASS (high confidence) and/or LOW (low confidence, as flagged by the authors based on Mendelian-error rate).</li> <li><b>Allele frequency</b> (AF), <b>Allele count</b> (AC), <b>SV length</b>, and <b>Mean sample quality</b> (MSQ).</li> </ul> <p>Per-population allele counts and numbers are shown on the details page for 8 ancestry groups: AFR (African), AMR (Latino/Admixed-American), NFE (non-Finnish European), FE (Finnish European), EAS (East-Asian), SAS (South-Asian), PI (Pacific Islander), and Other.</p> <h2>Methods</h2> <p> -The authors used their open-source <a href="https://github.com/hall-lab/svtools" target="_blank"> -svtools</a> pipeline to jointly call SVs across all samples. Per-sample -calls were produced with LUMPY (v0.2.13), CNVnator (v0.3.3), and svtyper -(v0.1.4); calls were merged across samples and refined with svtools. Low- -and high-confidence variants were distinguished using a Mendelian-error -cutoff on mean sample quality, calibrated against a set of 409 CEPH trios. -Per-sample validation was performed against a PacBio long-read truth set -derived from three HGSVC samples.</p> +Abel et al. 2020 jointly called SVs from Illumina short-read sequencing +(mean coverage >20x) of 17,795 genomes from the NHGRI Centers for +Common Disease Genomics program with per-sample calls from LUMPY v0.2.13, +CNVnator v0.3.3 and svtyper v0.1.4, integrated across the cohort by the +<a href="https://github.com/hall-lab/svtools" target="_blank">svtools</a> +pipeline. Low- and high-confidence variants were separated by a +Mendelian-error cutoff on mean sample quality, calibrated against 409 +CEPH trios, and per-sample calls were validated against a PacBio +long-read truth set from three HGSVC samples. Two non-overlapping +callsets were released: 458,106 SVs from 14,623 samples called natively +on GRCh38 (B38) and 279,892 SVs from 8,417 samples called on GRCh37 +(B37). The site-frequency callsets span DELs, DUPs, INVs, mobile-element +variants and breakends/translocations.</p> <p> -For this UCSC track, VCF INFO fields were parsed and converted to BED9+ -format. Variants originally called on GRCh37 (B37 callset) were lifted -to GRCh38 using the UCSC <tt>hg19ToHg38.over.chain.gz</tt> chain. See the +The B38 and B37 site-frequency VCFs (plus BEDPE companion files) were +downloaded from the authors' supplementary-data GitHub repository, +<a href="https://github.com/hall-lab/sv_paper_042020" target="_blank"> +github.com/hall-lab/sv_paper_042020</a>. For the hg38 track, INFO fields +were parsed into BED9+ columns; B37 records were lifted to hg38 with the +UCSC <tt>hg19ToHg38.over.chain.gz</tt> chain (626 B37 records failed to +lift, leaving 737,998 SVs total in the track).</p> + +<p> +The step-by-step build commands (download, liftOver, format conversion, +bigBed build) are recorded in the UCSC makeDoc for this track: <a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/abelSv.txt" target="_blank"> -track build documentation</a> for full details.</p> +doc/hg38/abelSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. +</p> <h2>Data Access</h2> <p>The data can be explored interactively in table format with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported from there to spreadsheet or tab-sep tables. From scripts, the data can be accessed through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>abelSv</i>.</p> <p>For automated download and analysis, the annotation is stored in a bigBed file that can be downloaded from <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/abelSv/" target="_blank"> our download server</a>. The file for this track is called <tt>abelSv.bb</tt>. Individual regions or the whole genome annotation can