d93c426ef1ad5fbb32b754408599eaf380a199e5 max Tue Apr 21 13:34:58 2026 -0700 choriCloneEnds: reorganize danRer11 CHORI BAC clone end placements as a superTrack, refs #35059 - Rename ncbiCloneEndsCH1073 to choriCloneEnds throughout (trackDb, HTML, makeDoc, scripts dir, /hive and /gbdb layout). User-visible label is now "CHORI Clones" since all three libraries (CH1073, CH73, CH211) are CHORI/BACPAC BAC libraries; data source (NCBI Clone DB) is cited in Methods. - Wrap the existing CH1073 track in a choriCloneEnds superTrack and add two new subtracks built from the parallel unique_concordant GFFs at ftp.ncbi.nih.gov/repository/clone/reports/Danio_rerio/ : CH73 (99,141 placements, 23 oversize) CH211 (70,231 placements, 46 oversize) CH1073 is rebuilt with the same pipeline (210,777 placements). - Build all three bigBeds with -extraIndex=name and register searchTable / searchType bigBed stanzas with searchIndex name on each subtrack, so clone names (CH1073-100A1, CH73-1A1, CH211-1A1, ...) resolve from the Genome Browser position box. - Single shared HTML description page; Methods now links to the NCBI FTP source and to the UCSC makeDoc and scripts dir on GitHub. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> diff --git src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html deleted file mode 100644 index b18fa1a0867..00000000000 --- src/hg/makeDb/trackDb/zebrafish/danRer11/ncbiCloneEndsCH1073.html +++ /dev/null @@ -1,123 +0,0 @@ -<h2>Description</h2> -<p> -Bacterial artificial chromosomes (BACs) are large inserts of genomic DNA -(typically 150–300 kb) carried in bacteria. Sequencing a single -short read from each end of a BAC and mapping those end sequences to a -reference genome yields the approximate start and stop of the full BAC -insert. These BAC end placements are useful for confirming the order, -orientation, and span of the reference assembly, for identifying large -structural variants that disrupt concordant pair placement, and for -locating a BAC containing a gene of interest for downstream laboratory -work. -</p> -<p> -This track shows the NCBI <b>CH1073</b> zebrafish BAC library (also -known as RZPD-1073 / DanioKey) placements labeled by NCBI as -<i>unique concordant</i>—clones whose two end reads place -uniquely in GRCz11 and at the expected orientation and approximate -distance. Each row represents one clone insert inferred from the two -mapped ends; one clone may have several placements when the ends map -to an alt haplotype scaffold in addition to the primary assembly. -</p> - -<h2>Display Conventions and Configuration</h2> -<p> -Each item is drawn as a single block spanning the inferred BAC insert -(start of the upstream end to end of the downstream end). Clicking an -item opens a details page showing the clone name, NCBI placement ID, -insert size, concordance and uniqueness flags, assembly unit -(<i>Primary Assembly</i>, <i>ALT_DRER_TU_1</i>, etc.), and an -<i>oversize</i> flag that is set for placements larger than -500 kb—far longer than a typical BAC—so users can -filter out likely-spurious mappings. -</p> -<p> -The clone name links out to a <a href="https://zfin.org/search" -target="_blank">ZFIN</a> search for cross-reference information on the -clone. -</p> -<p> -Three categorical filters are available in the track configuration -interface: -<ul> - <li><b>End-pair concordance</b> – <tt>TRUE</tt>/<tt>FALSE</tt></li> - <li><b>Unique placement</b> – <tt>TRUE</tt>/<tt>FALSE</tt></li> - <li><b>Oversize placement (>500kb)</b> – <tt>TRUE</tt>/<tt>FALSE</tt></li> -</ul> -By default no filter is applied. -</p> - -<h2>Methods</h2> -<p> -The source data were produced by the NCBI Clone DB group from end -sequences of the CH1073 library. NCBI maps each end sequence to the -reference assembly and categorizes the pair as concordant (expected -orientation and insert size) or discordant, and as uniquely placed or -multiply placed. The full set of per-library placement reports is -available from the NCBI FTP server at -<a href="https://ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/" -target="_blank">ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/</a>. -</p> -<p> -To build the UCSC track, the -<tt>CH1073.GCF_000002035.6.105.unique_concordant.gff</tt> file was -downloaded and converted to BED. RefSeq contig accessions in the GFF -(e.g. <tt>NC_007114.7</tt>, <tt>NW_018394540.1</tt>) were mapped to -UCSC-style chromosome names (e.g. <tt>chr3</tt>, -<tt>chr1_KZ114997v1_alt</tt>) using the NCBI GRCz11 assembly report. -A fixed <i>oversize</i> flag was set on any insert longer than -500 kb; these records are retained so researchers can inspect -them but are easy to exclude via the track filter. The resulting -BED was converted to bigBed with <tt>bedToBigBed</tt>. -</p> - -<h2>Data Access</h2> -<p> -The data can be explored interactively in table format with the -<a href="../cgi-bin/hgTables">Table Browser</a> or the -<a href="../cgi-bin/hgIntegrator">Data Integrator</a> and exported -from there to spreadsheet or tab-sep tables. From scripts, the data -can be accessed through our <a href="https://api.genome.ucsc.edu" -target="_blank">API</a>, <tt>track=ncbiCloneEndsCH1073</tt>. -</p> -<p> -For automated download and analysis, the annotation is stored in a -bigBed file that can be downloaded from -<a href="http://hgdownload.soe.ucsc.edu/gbdb/danRer11/ncbiCloneEndsCH1073/" -target="_blank">our download server</a>. The file for this track is -<tt>CH1073.bb</tt>. Individual regions or the whole genome annotation -can be obtained using our tool <tt>bigBedToBed</tt>, which can be -compiled from the source code or downloaded as a precompiled binary -for your system. Instructions for downloading source code and -binaries can be found -<a href="http://hgdownload.soe.ucsc.edu/downloads.html#utilities_downloads" -target="_blank">here</a>. The tool can also be used to obtain features -within a given range, e.g. -<tt>bigBedToBed http://hgdownload.soe.ucsc.edu/gbdb/danRer11/ncbiCloneEndsCH1073/CH1073.bb -chrom=chr1 -start=0 -end=10000000 stdout</tt>. -</p> -<p> -The original annotation can be downloaded from -<a href="https://ftp.ncbi.nlm.nih.gov/repository/clone/reports/Danio_rerio/CH1073.GCF_000002035.6.105.unique_concordant.gff" -target="_blank">NCBI's clone reports FTP directory</a>. -</p> - -<h2>Credits</h2> -<p> -Clone placements produced by the NCBI Clone DB group. CH1073 -(RZPD-1073 / DanioKey) is a zebrafish BAC library originally -constructed and end-sequenced in the context of large-scale -zebrafish genome and clone resources. -The CH1073 library was constructed by -<a href="https://bacpacresources.org/" target="_blank">Pieter de Jong</a> -and colleagues at BACPAC Resources. -</p> - -<h2>References</h2> -<p> -Schneider VA, Chen HC, Clausen C, Meric PA, Zhou Z, Bouk N, Husain N, Maglott DR, Church DM. -<a href="https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gks1164" target="_blank"> -Clone DB: an integrated NCBI resource for clone-associated data</a>. -<em>Nucleic Acids Res</em>. 2013 Jan;41(Database issue):D1070-8. -PMID: <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193260" target="_blank">23193260</a>; PMC: <a -href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531087/" target="_blank">PMC3531087</a> -</p>