src/hg/htdocs/FAQ/FAQformat.html 24c983f5d9b1a0f4792eae2e4c1c9725efee919d

24c983f5d9b1a0f4792eae2e4c1c9725efee919d
dschmelt
  Tue Mar 1 16:56:08 2022 -0800
Adding note about ref genome requirement refs #29017

diff --git src/hg/htdocs/FAQ/FAQformat.html src/hg/htdocs/FAQ/FAQformat.html
index b53f554..8cb7874 100755
--- src/hg/htdocs/FAQ/FAQformat.html
+++ src/hg/htdocs/FAQ/FAQformat.html
@@ -572,40 +572,42 @@
 Also, review the enhanced  <a href="../goldenPath/help/interact.html">interact</a> format
 for information on how to visualize pairwise interactions as arcs in the browser.
 </p>
 
 <a name="format5"></a>
 <h2>MAF format</h2> 
 <p> 
 The multiple alignment format stores a series of multiple alignments in a format that is easy to 
 parse and relatively easy to read. This format stores multiple alignments at the DNA level between 
 entire genomes. Previously used formats are suitable for multiple alignments of single proteins or 
 regions of DNA without rearrangements, but would require considerable extension to cope with genomic
 issues such as forward and reverse strand directions, multiple pieces to the alignment, and so 
 forth.</p> 
 <p> 
 <strong>General Structure</strong><br> 
-The <em>.maf</em> format is line-oriented. Each multiple alignment ends with a blank line. Each 
+The <em>.maf</em> format is line-oriented. Each multiple alignment beigns with the reference genome
+line and ends with a blank line. Each 
 sequence in an alignment is on a single line, which can get quite long, but there is no length 
 limit. Words in a line are delimited by any white space. Lines starting with # are considered to be 
 comments. Lines starting with ## can be ignored by most programs, but contain meta-data of one form 
 or another.</p> 
 <p> 
 The file is divided into paragraphs that terminate in a blank line.  Within a paragraph, the first 
 word of a line indicates its type. Each multiple alignment is in a separate paragraph that begins 
 with an &quot;a&quot; line and contains an &quot;s&quot; line for each sequence in the multiple 
-alignment. Some MAF files may contain other optional line types: </p>
+alignment. The first sequence must be the reference genome on which the rest of the sequenes map. 
+Some MAF files may contain other optional line types: </p>
 <ul>
   <li>
   an &quot;i&quot; line containing information about what is in the aligned species DNA before and 
   after the immediately preceding &quot;s&quot; line</li>
   <li>
   an &quot;e&quot; line containing information about the size of the gap between the alignments 
   that span the current block</li>
   <li>
   a &quot;q&quot; line indicating the quality of each aligned base for the species</li>
 </ul>
 <p>
 Parsers may ignore any other types of paragraphs and other types of lines within an alignment 
 paragraph. </p> 
 <p> 
 <strong>Custom Tracks</strong><br> 
@@ -681,32 +683,32 @@
   <li> 
   <strong>pass</strong> -- Optional. Positive integer value. For programs that do multiple pass 
   alignments such as blastz, this shows which pass this alignment came from. Typically, pass 1 will 
   find the strongest alignments genome-wide, and pass 2 will find weaker alignments between two 
   first-pass alignments.</li>
 </ul> 
 <p> 
 <strong>Lines starting with &quot;s&quot; -- a sequence within an alignment block</strong></p>
 <pre><code> s hg16.chr7    27707221 13 + 158545518 gcagctgaaaaca
  s panTro1.chr6 28869787 13 + 161576975 gcagctgaaaaca
  s baboon         249182 13 +   4622798 gcagctgaaaaca
  s mm4.chr6     53310102 13 + 151104725 ACAGCTGAAAATA
 </code></pre>
 <p> 
 The &quot;s&quot; lines together with the &quot;a&quot; lines define a multiple alignment. 
-The &quot;s&quot; lines have the following fields which are defined by position rather than 
-name=value pairs.</p> 
+The first &quot;s&quot; line must be the reference genome, hg16 in the above example.
+The &quot;s&quot; lines have the following fields which are defined by position.</p> 
 <ul> 
   <li> 
   <strong>src</strong> -- The name of one of the source sequences for the alignment. For sequences 
   that are resident in a browser assembly, the form 'database.chromosome' allows automatic creation 
   of links to other assemblies. Non-browser sequences are typically reference by the species name 
   alone.</li>
   <li> 
   <strong>start</strong> -- The start of the aligning region in the source sequence. This is a 
   zero-based number. If the strand field is &quot;-&quot; then this is the start relative to the 
   reverse-complemented source sequence (see 
   <a href="http://genomewiki.ucsc.edu/index.php/Coordinate_Transforms" target=blank>Coordinate 
   Transforms</a>).</li> 
   <li> 
   <strong>size</strong> -- The size of the aligning region in the source sequence. This number is 
   equal to the number of non-dash characters in the alignment text field below.</li>