8faeb3cba60c7cb842bc17c17a57c9b53ef1b478 max Tue Apr 21 02:51:32 2026 -0700 ncbiCloneEndsCH1073: add NCBI CH1073 BAC library clone end placements track on danRer11, refs #35059 210,777 unique-concordant clone-insert placements from NCBI's CH1073 (RZPD-1073 / DanioKey) library clone report. Separate from the existing bacEndPairsLift (danRer4 -> danRer11 UCSC-BLAT lift), which is left in place. Co-Authored-By: Claude Opus 4.7 (1M context) diff --git src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html new file mode 100644 index 00000000000..b18fa1a0867 --- /dev/null +++ src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html @@ -0,0 +1,123 @@ +

Description

+

+Bacterial artificial chromosomes (BACs) are large inserts of genomic DNA +(typically 150–300 kb) carried in bacteria. Sequencing a single +short read from each end of a BAC and mapping those end sequences to a +reference genome yields the approximate start and stop of the full BAC +insert. These BAC end placements are useful for confirming the order, +orientation, and span of the reference assembly, for identifying large +structural variants that disrupt concordant pair placement, and for +locating a BAC containing a gene of interest for downstream laboratory +work. +

+

+This track shows the NCBI CH1073 zebrafish BAC library (also +known as RZPD-1073 / DanioKey) placements labeled by NCBI as +unique concordant—clones whose two end reads place +uniquely in GRCz11 and at the expected orientation and approximate +distance. Each row represents one clone insert inferred from the two +mapped ends; one clone may have several placements when the ends map +to an alt haplotype scaffold in addition to the primary assembly. +

+ +

Display Conventions and Configuration

+

+Each item is drawn as a single block spanning the inferred BAC insert +(start of the upstream end to end of the downstream end). Clicking an +item opens a details page showing the clone name, NCBI placement ID, +insert size, concordance and uniqueness flags, assembly unit +(Primary Assembly, ALT_DRER_TU_1, etc.), and an +oversize flag that is set for placements larger than +500 kb—far longer than a typical BAC—so users can +filter out likely-spurious mappings. +

+

+The clone name links out to a ZFIN search for cross-reference information on the +clone. +

+

+Three categorical filters are available in the track configuration +interface: +

+By default no filter is applied. +

+ +

Methods

+

+The source data were produced by the NCBI Clone DB group from end +sequences of the CH1073 library. NCBI maps each end sequence to the +reference assembly and categorizes the pair as concordant (expected +orientation and insert size) or discordant, and as uniquely placed or +multiply placed. The full set of per-library placement reports is +available from the NCBI FTP server at +ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/. +

+

+To build the UCSC track, the +CH1073.GCF_000002035.6.105.unique_concordant.gff file was +downloaded and converted to BED. RefSeq contig accessions in the GFF +(e.g. NC_007114.7, NW_018394540.1) were mapped to +UCSC-style chromosome names (e.g. chr3, +chr1_KZ114997v1_alt) using the NCBI GRCz11 assembly report. +A fixed oversize flag was set on any insert longer than +500 kb; these records are retained so researchers can inspect +them but are easy to exclude via the track filter. The resulting +BED was converted to bigBed with bedToBigBed. +

+ +

Data Access

+

+The data can be explored interactively in table format with the +Table Browser or the +Data Integrator and exported +from there to spreadsheet or tab-sep tables. From scripts, the data +can be accessed through our API, track=ncbiCloneEndsCH1073. +

+

+For automated download and analysis, the annotation is stored in a +bigBed file that can be downloaded from +our download server. The file for this track is +CH1073.bb. Individual regions or the whole genome annotation +can be obtained using our tool bigBedToBed, which can be +compiled from the source code or downloaded as a precompiled binary +for your system. Instructions for downloading source code and +binaries can be found +here. The tool can also be used to obtain features +within a given range, e.g. +bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/danRer11/ncbiCloneEndsCH1073/CH1073.bb -chrom=chr1 -start=0 -end=10000000 stdout. +

+

+The original annotation can be downloaded from +NCBI's clone reports FTP directory. +

+ +

Credits

+

+Clone placements produced by the NCBI Clone DB group. CH1073 +(RZPD-1073 / DanioKey) is a zebrafish BAC library originally +constructed and end-sequenced in the context of large-scale +zebrafish genome and clone resources. +The CH1073 library was constructed by +Pieter de Jong +and colleagues at BACPAC Resources. +

+ +

References

+

+Schneider VA, Chen HC, Clausen C, Meric PA, Zhou Z, Bouk N, Husain N, Maglott DR, Church DM. + +Clone DB: an integrated NCBI resource for clone-associated data. +Nucleic Acids Res. 2013 Jan;41(Database issue):D1070-8. +PMID: 23193260; PMC: PMC3531087 +