src/hg/htdocs/goldenPath/datorg.html bbabbd5d2566d47d923d51dbe350634783455999

bbabbd5d2566d47d923d51dbe350634783455999
mspeir
  Sun Oct 26 12:14:52 2025 -0700
change soe to gi, refs #35031

diff --git src/hg/htdocs/goldenPath/datorg.html src/hg/htdocs/goldenPath/datorg.html
index e254e5de786..a20ce54bf29 100755
--- src/hg/htdocs/goldenPath/datorg.html
+++ src/hg/htdocs/goldenPath/datorg.html
@@ -1,112 +1,112 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser Data Organization" -->
 <!--#set var="ROOT" value=".." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>Data Organization and Format</h1>
 
 <p>
 The data for the working draft are organized hierarchically by chromosome and by the 
 sequenced-clone contigs within each chromosome. At the top level there are 25 folders; 22 of these 
 are for the numbered chromosomes (autosomes), folders X and Y are for the sex chromosomes, and Un 
 is for clone contigs that cannot be placed confidently on a chromosome. Each of the 25 chromosomal 
 folders contains a separate clone contig folder for each of the clone contigs for that 
 chromosome.</p>
 <p> 
 There are two primary files in each clone contig folder; these have suffixes .fa and .agp 
 respectively. The .fa files gives the working draft sequence for the clone contig. The format is 
 fasta format:</p>
 <pre><code>&gt;NT_077768
 GAATTCTCTGTAACACTAAGCTCTCTTCCTCAAAACCAGAGGTAGATAGA
 ATGTGTAATAATTTACAGAATTTCTAGACTTCAACGATCTGATTTTTTAA
 ATTTATTTTTATTTTTTCAGGTTGAGACTGAGCTAAAGTTAATCTGTGGC
 ...</code></pre>
 <p>
 The .agp file is a kind of index that tells how the .fa file is built. It looks like this:</p>
 <pre><code>17/NT_077768    1       6538    1       D       AC021317.18     122280  128817  -
 17/NT_077768    6539    56206   2       D       AC021317.18     128918  178585  -
 17/NT_077768    56207   56306   3       N       100     fragment        yes
 17/NT_077768    56307   117971  4       D       AC021317.18     47188   108852  -
 17/NT_077768    117972  170563  5       F       AC115992.13     23659   76250   +
 17/NT_077768    170564  274979  6       D       AC124789.11     1       104416  -
 ...</code></pre>
 <p>
 Each line represents either an actual sequence record or a gap (unless it begins with &quot;#&quot;,
 in which case it is a comment.) If the line represents an actual sequence record then it has the 
 form:</p>
 <pre><code>&lt;chromosome/ctg&gt; &lt;start-in-ctg&gt; &lt;end-in-ctg&gt; &lt;number&gt; &lt;type&gt; &lt;accession&gt;.&lt;version&gt; &lt;start&gt; &lt;end&gt; &lt;orientation&gt;</code></pre>
 <p>
 If it represents a gap it has the form:</p>
 <pre><code>&lt;chromosome/ctg&gt; &lt;start-in-ctg&gt; &lt;end-in-ctg&gt; &lt;number&gt; N &lt;number-of-Ns&gt; &lt;kind&gt; &lt;bridged?&gt;</code></pre>
 <p>
 The positions &lt;start-in-ctg&gt; and &lt;end-in-ctg&gt; are the start and end positions where the 
 sequence is to be put in the .fa file. For a sequence record, the positions &lt;start&gt; and
 &lt;end&gt; are the start and end positions of where the sequence came from in the GenBank record 
 &lt;accession&gt;.&lt;version&gt;. The field &lt;orientation&gt; tells whether or not the sequence 
 must be reverse complemented before it is inserted into its place in the .fa file. For example, the 
 records above mean that to build the .fa file for clone contig NT_077768 from chromosome 17 you 
 take:</p>
 <pre><code>AC021317 version 18, residues 122280 to 128817, reverse complemented, followed by 
 AC021317 version 18, residues 128918 to 178585, reverse complemented, followed by 
 a gap of 100 Ns, followed by 
 AC021317 version 18, residues 47188 to 108852, reverse complemented, followed by 
 AC115992 version 13, residues 23659 to 76250, followed by 
 AC124789 version 11, residues 1 to 104416, reverse complemented, followed by 
 ...</code></pre>
 <p>
 The joins perfectly abut. In a sequence record, &lt;type&gt; can have the values:</p>
 <ul>
   <li>
   F - Finished</li>
   <li>
   A - in Active finishing</li> 
   <li>
   D - Draft</li>
   <li>
   P - PreDraft</li>
   <li>
   O - Other sequence</li>
 </ul>
 <p>
 In a gap record the type is always N. The &lt;number&gt; field sequentially numbers the records.</p>
 <p>
 In a gap record, &lt;number-of-Ns&gt; is the size of the gap and &lt;kind&gt; is:</p>
 <ul>
   <li>    
   fragment - a gap between two sequence contigs (also called a "sequence gap")</li>
   <li>    
   split_finished - a special sized gap between two finished sequence contigs</li>
   <li>    
   clone - a gap between two clones that do not overlap</li>
   <li>    
   contig - a gap between clone contigs in the genome layout (also called a 
   &quot;layout gap&quot;li>
   <li>    
   centromere - a gap inserted for the centromere </li>
   <li>    
   short_arm - a gap inserted at the start of an acrocentric chromosome </li>
   <li>    
   heterochromatin - a gap inserted for an especially large region of heterochromatin (may include 
   the centromere as well.)</li>
   <li>    
   telomere - a gap inserted for a telomere</li>
 </ul>
 <p>
 The &lt;bridged?&gt; value is &quot;yes&quot; if there is a cDNA or BACend pair or plasmid end 
 pair that spans the gap, else it is &quot;no&quot;.</p>
 <p>
-We provide three ways you can <a href="http://hgdownload.soe.ucsc.edu/downloads.html" 
+We provide three ways you can <a href="http://hgdownload.gi.ucsc.edu/downloads.html" 
 target="_blank">download</a> these .fa and .agp files:</p>  
 <ul>
   <li> 
   full data set: the entire hierarchy in a zipped format</li>
   <li> 
   by chromosome: one zipped file for each chromosome containing all the sequence ordered along that 
   chromosome</li>
   <li>
   by individual clone contig: separate files, not zipped, for each clone contig</li>
 </ul>
 
 <!--#include virtual="$ROOT/inc/gbPageEnd.html" -->