7df6e18265341f87a69fba808aa1f92f8ebca841 markd Wed Apr 15 13:39:42 2026 -0700 move copy of htslib diff --git src/htslib/faidx.5 src/htslib/faidx.5 deleted file mode 100644 index 471b853214d..00000000000 --- src/htslib/faidx.5 +++ /dev/null @@ -1,146 +0,0 @@ -'\" t -.TH faidx 5 "August 2015" "htslib" "Bioinformatics formats" -.SH NAME -faidx \- an index enabling random access to FASTA files -.\" -.\" Copyright (C) 2013, 2015 Genome Research Ltd. -.\" -.\" Author: John Marshall <jm18@sanger.ac.uk> -.\" -.\" Permission is hereby granted, free of charge, to any person obtaining a -.\" copy of this software and associated documentation files (the "Software"), -.\" to deal in the Software without restriction, including without limitation -.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, -.\" and/or sell copies of the Software, and to permit persons to whom the -.\" Software is furnished to do so, subject to the following conditions: -.\" -.\" The above copyright notice and this permission notice shall be included in -.\" all copies or substantial portions of the Software. -.\" -.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL -.\" THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING -.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER -.\" DEALINGS IN THE SOFTWARE. -.\" -.SH SYNOPSIS -.IR file.fa .fai, -.IR file.fasta .fai -.SH DESCRIPTION -Using an \fBfai index\fP file in conjunction with a FASTA file containing -reference sequences enables efficient access to arbitrary regions within -those reference sequences. -The index file typically has the same filename as the corresponding FASTA -file, with \fB.fai\fP appended. -.P -An \fBfai index\fP file is a text file consisting of lines each with -five TAB-delimited columns: -.TS -lbl. -NAME Name of this reference sequence -LENGTH Total length of this reference sequence, in bases -OFFSET Offset within the FASTA file of this sequence's first base -LINEBASES The number of bases on each line -LINEWIDTH The number of bytes in each line, including the newline -.TE -.P -The \fBNAME\fP and \fBLENGTH\fP columns contain the same -data as would appear in the \fBSN\fP and \fBLN\fP fields of a -SAM \fB@SQ\fP header for the same reference sequence. -.P -The \fBOFFSET\fP column contains the offset within the FASTA file, in bytes -starting from zero, of the first base of this reference sequence, i.e., of -the character following the newline at the end of the "\fB>\fP" header line. -Typically the lines of a \fBfai index\fP file appear in the order in which the -reference sequences appear in the FASTA file, so \fB.fai\fP files are typically -sorted according to this column. -.P -The \fBLINEBASES\fP column contains the number of bases in each of the sequence -lines that form the body of this reference sequence, apart from the final line -which may be shorter. -The \fBLINEWIDTH\fP column contains the number of \fIbytes\fP in each of -the sequence lines (except perhaps the final line), thus differing from -\fBLINEBASES\fP in that it also counts the bytes forming the line terminator. -.SS FASTA Files -In order to be indexed with \fBsamtools faidx\fP, a FASTA file must be a text -file of the form -.LP -.RS -.RI > name -.RI [ description ...] -.br -ATGCATGCATGCATGCATGCATGCATGCAT -.br -GCATGCATGCATGCATGCATGCATGCATGC -.br -ATGCAT -.br -.RI > name -.RI [ description ...] -.br -ATGCATGCATGCAT -.br -GCATGCATGCATGC -.br -[...] -.RE -.LP -In particular, each reference sequence must be "well-formatted", i.e., all -of its sequence lines must be the same length, apart from the final sequence -line which may be shorter. -(While this sequence line length must be the same within each sequence, -it may vary between different reference sequences in the same FASTA file.) -.P -This also means that although the FASTA file may have Unix- or Windows-style -or other line termination, the newline characters present must be consistent, -at least within each reference sequence. -.P -The \fBsamtools\fP implementation uses the first word of the "\fB>\fP" header -line text (i.e., up to the first whitespace character, having skipped any -initial whitespace after the ">") as the \fBNAME\fP column. -.SH EXAMPLE -For example, given this FASTA file -.LP -.RS ->one -.br -ATGCATGCATGCATGCATGCATGCATGCAT -.br -GCATGCATGCATGCATGCATGCATGCATGC -.br -ATGCAT -.br ->two another chromosome -.br -ATGCATGCATGCAT -.br -GCATGCATGCATGC -.br -.RE -.LP -formatted with Unix-style (LF) line termination, the corresponding fai index -would be -.RS -.TS -lnnnn. -one 66 5 30 31 -two 28 98 14 15 -.TE -.RE -.LP -If the FASTA file were formatted with Windows-style (CR-LF) line termination, -the fai index would be -.RS -.TS -lnnnn. -one 66 6 30 32 -two 28 103 14 16 -.TE -.RE -.SH SEE ALSO -.IR samtools (1) -.TP -http://en.wikipedia.org/wiki/FASTA_format -Further description of the FASTA format