src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html 8faeb3cba60c7cb842bc17c17a57c9b53ef1b478

8faeb3cba60c7cb842bc17c17a57c9b53ef1b478
max
  Tue Apr 21 02:51:32 2026 -0700
ncbiCloneEndsCH1073: add NCBI CH1073 BAC library clone end placements track on danRer11, refs #35059

210,777 unique-concordant clone-insert placements from NCBI's CH1073
(RZPD-1073 / DanioKey) library clone report. Separate from the existing
bacEndPairsLift (danRer4 -> danRer11 UCSC-BLAT lift), which is left in place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

diff --git src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html
new file mode 100644
index 00000000000..b18fa1a0867
--- /dev/null
+++ src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html
@@ -0,0 +1,123 @@
+<h2>Description</h2>
+<p>
+Bacterial artificial chromosomes (BACs) are large inserts of genomic DNA
+(typically 150&ndash;300&nbsp;kb) carried in bacteria. Sequencing a single
+short read from each end of a BAC and mapping those end sequences to a
+reference genome yields the approximate start and stop of the full BAC
+insert. These BAC end placements are useful for confirming the order,
+orientation, and span of the reference assembly, for identifying large
+structural variants that disrupt concordant pair placement, and for
+locating a BAC containing a gene of interest for downstream laboratory
+work.
+</p>
+<p>
+This track shows the NCBI <b>CH1073</b> zebrafish BAC library (also
+known as RZPD-1073 / DanioKey) placements labeled by NCBI as
+<i>unique&nbsp;concordant</i>&mdash;clones whose two end reads place
+uniquely in GRCz11 and at the expected orientation and approximate
+distance. Each row represents one clone insert inferred from the two
+mapped ends; one clone may have several placements when the ends map
+to an alt haplotype scaffold in addition to the primary assembly.
+</p>
+
+<h2>Display Conventions and Configuration</h2>
+<p>
+Each item is drawn as a single block spanning the inferred BAC insert
+(start of the upstream end to end of the downstream end). Clicking an
+item opens a details page showing the clone name, NCBI placement ID,
+insert size, concordance and uniqueness flags, assembly unit
+(<i>Primary&nbsp;Assembly</i>, <i>ALT_DRER_TU_1</i>, etc.), and an
+<i>oversize</i> flag that is set for placements larger than
+500&nbsp;kb&mdash;far longer than a typical BAC&mdash;so users can
+filter out likely-spurious mappings.
+</p>
+<p>
+The clone name links out to a <a href="https://zfin.org/search"
+target="_blank">ZFIN</a> search for cross-reference information on the
+clone.
+</p>
+<p>
+Three categorical filters are available in the track configuration
+interface:
+<ul>
+  <li><b>End-pair concordance</b> &ndash; <tt>TRUE</tt>/<tt>FALSE</tt></li>
+  <li><b>Unique placement</b> &ndash; <tt>TRUE</tt>/<tt>FALSE</tt></li>
+  <li><b>Oversize placement (&gt;500kb)</b> &ndash; <tt>TRUE</tt>/<tt>FALSE</tt></li>
+</ul>
+By default no filter is applied.
+</p>
+
+<h2>Methods</h2>
+<p>
+The source data were produced by the NCBI Clone DB group from end
+sequences of the CH1073 library. NCBI maps each end sequence to the
+reference assembly and categorizes the pair as concordant (expected
+orientation and insert size) or discordant, and as uniquely placed or
+multiply placed. The full set of per-library placement reports is
+available from the NCBI FTP server at
+<a href="https://ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/"
+target="_blank">ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/</a>.
+</p>
+<p>
+To build the UCSC track, the
+<tt>CH1073.GCF_000002035.6.105.unique_concordant.gff</tt> file was
+downloaded and converted to BED. RefSeq contig accessions in the GFF
+(e.g. <tt>NC_007114.7</tt>, <tt>NW_018394540.1</tt>) were mapped to
+UCSC-style chromosome names (e.g. <tt>chr3</tt>,
+<tt>chr1_KZ114997v1_alt</tt>) using the NCBI GRCz11 assembly report.
+A fixed <i>oversize</i> flag was set on any insert longer than
+500&nbsp;kb; these records are retained so researchers can inspect
+them but are easy to exclude via the track filter. The resulting
+BED was converted to bigBed with <tt>bedToBigBed</tt>.
+</p>
+
+<h2>Data Access</h2>
+<p>
+The data can be explored interactively in table format with the
+<a href="../cgi-bin/hgTables">Table Browser</a> or the
+<a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported
+from there to spreadsheet or tab-sep tables. From scripts, the data
+can be accessed through our <a href="https://api.genome.ucsc.edu"
+target="_blank">API</a>, <tt>track=ncbiCloneEndsCH1073</tt>.
+</p>
+<p>
+For automated download and analysis, the annotation is stored in a
+bigBed file that can be downloaded from
+<a href="http://hgdownload.soe.ucsc.edu/gbdb/danRer11/ncbiCloneEndsCH1073/"
+target="_blank">our download server</a>. The file for this track is
+<tt>CH1073.bb</tt>. Individual regions or the whole genome annotation
+can be obtained using our tool <tt>bigBedToBed</tt>, which can be
+compiled from the source code or downloaded as a precompiled binary
+for your system. Instructions for downloading source code and
+binaries can be found
+<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads"
+target="_blank">here</a>. The tool can also be used to obtain features
+within a given range, e.g.
+<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/danRer11/ncbiCloneEndsCH1073/CH1073.bb&nbsp;-chrom=chr1&nbsp;-start=0&nbsp;-end=10000000&nbsp;stdout</tt>.
+</p>
+<p>
+The original annotation can be downloaded from
+<a href="https://ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/CH1073.GCF_000002035.6.105.unique_concordant.gff"
+target="_blank">NCBI's clone reports FTP directory</a>.
+</p>
+
+<h2>Credits</h2>
+<p>
+Clone placements produced by the NCBI Clone DB group. CH1073
+(RZPD-1073 / DanioKey) is a zebrafish BAC library originally
+constructed and end-sequenced in the context of large-scale
+zebrafish genome and clone resources.
+The CH1073 library was constructed by
+<a href="https://bacpacresources.org/" target="_blank">Pieter de Jong</a>
+and colleagues at BACPAC Resources.
+</p>
+
+<h2>References</h2>
+<p>
+Schneider VA, Chen HC, Clausen C, Meric PA, Zhou Z, Bouk N, Husain N, Maglott DR, Church DM.
+<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gks1164" target="_blank">
+Clone DB: an integrated NCBI resource for clone-associated data</a>.
+<em>Nucleic Acids Res</em>. 2013 Jan;41(Database issue):D1070-8.
+PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193260" target="_blank">23193260</a>; PMC: <a
+href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531087/" target="_blank">PMC3531087</a>
+</p>