bac95a147f49cd331052e597006e04b3deee40fc max Wed Apr 22 10:43:20 2026 -0700 lrSv/srSv: human-readable SV type filter labels, script cleanups Add human-readable labels to the supertrack-level svType filter on both the lrSv and srSv supertracks using the "CODE|CODE (Long name)" filterValues syntax: DEL -> "DEL (Deletion)", INS -> "INS (Insertion)", etc. Labels keep the short code up front so users can match what hgTracks shows next to each feature. Also sweep in the in-progress converter/as-file cleanups under scripts/lrSv/ and scripts/srSv/ (introduction of lrSvCommon.py helpers, consistent insLen / svLen / AC column naming, tightened field-description text) that had been piling up as an unstaged working tree. refs #36258 diff --git src/hg/makeDb/trackDb/human/chirmade101Sv.html src/hg/makeDb/trackDb/human/chirmade101Sv.html index 3f78a6c24dc..1ffe94e15a3 100644 --- src/hg/makeDb/trackDb/human/chirmade101Sv.html +++ src/hg/makeDb/trackDb/human/chirmade101Sv.html @@ -28,42 +28,64 @@ <li><span style="color: rgb(230,140,0);">Inversions (inv)</span> - orange</li> <li><span style="color: rgb(140,0,200);">Complex</span> - purple</li> </ul> </p> <p> Filters are available for SV type, SV length and the number of overlapping genes. The detail page shows the full annotation row: gene-level constraint scores (per overlapping gene), ClinGen / Decipher / ClinVar region matches, OMIM phenotype annotations and gnomAD SV frequencies at >=90% reciprocal overlap. Because most genomic regions carry no clinical annotation, many columns will be blank for an arbitrary SV. </p> <h2>Methods</h2> <p> -SVs were called from 101 long-read whole-genome sequencing samples and -annotated as described in Chirmade et al. 2026. The annotation table used -here (<tt>sv_annotations.tsv</tt>) is the companion data release for GWAS -SVatalog, available from the Zenodo record linked below. Coordinates in -the source TSV are 1-based closed and were converted to 0-based half-open -BED for this track. +Chirmade et al. 2026 called SVs from 101 whole-genome sequenced individuals +enrolled in the CF Canada-SickKids Program in Individualized Therapy +(CFIT), a predominantly-European cohort of people with cystic fibrosis. +Each sample was sequenced with two long-read / linked-read technologies: +PacBio continuous long reads on Sequel I (34 samples, 50x) or Sequel II +(67 samples, 76x), and 10X Genomics linked reads on Illumina HiSeq X at +~30x. SVs were called per sample with pbsv v2.2.2 (pbmm2 alignments) and +Sniffles v1.0.11 (NGMLR alignments) on the PacBio CLR data, and with Long +Ranger, CNVnator v0.4, ERDS v1.1 and Manta v1.6.0 on the 10XG data. +Per-platform and cross-platform calls were merged in three steps using a +50% reciprocal overlap rule (pbsv anchored, tagged by Sniffles on PacBio; +Manta anchored, augmented by CNVnator, ERDS and Long Ranger deletions on +10XG; then a cross-platform merge with PacBio coordinates preferred), and +SV records present in fewer than three participants were dropped. The +released catalog contains 87,183 SVs (42,435 deletions, 41,734 insertions, +1,394 duplications, 912 inversions and 708 complex events); the +pre-computed GWAS SVatalog LD analyses use a common-SV subset of 35,732 +sites against 116,870 GWAS-Catalog SNPs. </p> <p> -Note that the SVatalog tool's pre-computed LD analyses use a common-SV -subset (35,732 sites); the underlying long-read callset released in this -TSV (87,183 SVs) is larger and includes rarer variants not used for LD -visualisation. +The annotation TSV <tt>sv_annotations.tsv</tt> was downloaded from the +Zenodo companion record, +<a href="https://zenodo.org/records/13367574" target="_blank"> +zenodo.org/records/13367574</a>. Coordinates in the TSV are 1-based closed +and were converted to 0-based half-open BED for this track. +</p> +<p> +The step-by-step build commands (download, coordinate shift, format +conversion, bigBed build) are recorded in the UCSC makeDoc for this track +container: +<a href="https://github.com/ucscGenomeBrowser/kent/blob/master/src/hg/makeDb/doc/hg38/lrSv.txt" target="_blank"> +doc/hg38/lrSv.txt</a>. The conversion scripts and autoSql schemas live in +<a href="https://github.com/ucscGenomeBrowser/kent/tree/master/src/hg/makeDb/scripts/lrSv" target="_blank"> +makeDb/scripts/lrSv</a>. </p> <h2>Data Access</h2> <p> The data can be explored interactively in table format with the <a href="../cgi-bin/hgTables">Table Browser</a> or the <a href="../cgi-bin/hgIntegrator">Data Integrator</a>, and accessed programmatically through our <a href="https://api.genome.ucsc.edu">API</a>, track=<i>chirmade101Sv</i>. </p> <p> The bigBed is available from <a href="http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/" target="_blank">our download server</a> as <tt>chirmade101.bb</tt>. Example: <tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/hg38/lrSv/chirmade101.bb -chrom=chr21 -start=0 -end=100000000 stdout</tt>.