src/hg/htdocs/goldenPath/history.html 8e73439c3de6d6560d50b267be0ba4016cc40f86

8e73439c3de6d6560d50b267be0ba4016cc40f86
jnavarr5
  Mon Aug 19 14:43:06 2019 -0700
Adding a Table of Contents for the history page, refs #20314

diff --git src/hg/htdocs/goldenPath/history.html src/hg/htdocs/goldenPath/history.html
index a09de2b..c3d2803 100755
--- src/hg/htdocs/goldenPath/history.html
+++ src/hg/htdocs/goldenPath/history.html
@@ -1,24 +1,44 @@
 <!DOCTYPE html>
 <!--#set var="TITLE" value="Genome Browser History" -->
 <!--#set var="ROOT" value=".." -->
 
 <!-- Relative paths to support mirror sites with non-standard GB docs install -->
 <!--#include virtual="$ROOT/inc/gbPageStart.html" -->
 
 <h1>UCSC Genome Browser Project History</h1>
 
+<h2>Table of Contents</h2>
+<ul>
+  <li><a href="#overview">Genome Browser overview</a></li>
+  <li><a href="#race">Human Genome Project Race</a></li>
+    <ul class="gbsNoBullet">
+      <li><a href="#celera">New challenger, Celera Genomics</a></li>
+      <li><a href="#push">Push to the Finish Line</a></li>
+    </ul>
+  <li><a href="#ENCODE">The ENCODE Project</a></li>
+    <ul class="gbsNoBullet">
+      <li><a href="#ucsc">UC Santa Cruz's role</a></li>
+    </ul>
+  <li><a href="#primer">UCSC Genome Research Primer</a></li>
+    <ul class="gbsNoBullet">
+      <li><a href="#comparative">Comparative Genomics</a></li>
+      <li><a href="#health">Possibilities for Health</a></li>
+    </ul>
+</ul>
+
+<a name="overview"></a>
 <h2>Genome Browser overview</h2>
 <p>
 The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows
 researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down
 to an individual nucleotide. The browser integrates the work of countless scientists in laboratories
 worldwide, including work generated at UCSC, in an interactive, graphical display.</p>
 <p>
 Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy,
 then the browser drills down to levels of increasing detail, focusing first on chromosome bands,
 then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then
 the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the
 genome alphabet. Not only does the browser show the genome sequence, but it also delineates known
 areas of the genome and offers supplementary information about the genes-in effect, providing the
 word breaks and punctuation.</p>
 <p>
@@ -68,82 +88,86 @@
 critical step toward fully understanding the workings of the human genome, a project that will
 occupy science and medicine for many years. The browser platform has multiple potential uses that
 can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser
 lead to spin-offs, or genome browser mirrors, such as the following:</p>
 <ul>
   <li><a href="https://news.ucsc.edu/2008/05/2242.html"
     target="_blank">The HIV Data Browser</a></li>
   <li><a href="https://xena.ucsc.edu/welcome-to-ucsc-xena/"
     target="_blank">The UCSC Cancer Genomics Browser</a></li>
   <li><a href="https://genome.ucsc.edu/encode/"
     target="_blank">The data collection center for the international ENCODE project</a></li>
   <li><a href="http://genome.ucsc.edu/ebolaPortal/"
     target="_blank">The UCSC Ebola Virus Genome Browser</a></li>
 </ul>
 
+<a name="race"></a>
 <h2>Human Genome Project Race</h2>
 <p>
 In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric
 Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome
 Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted
 help in discovering the locations of the genes, which make up only approximately 1.5% of the
 sequence. Haussler had previously applied a mathematical technique known as hidden Markov models
 (HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the
 dominant gene-finding methodology and was used successfully on the <i>Drosophila melanogaster</i>
 (fruit fly) genome.</p>
 <p>
 At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the
 sequence one piece (or, in the jargon of molecular biology, one &quot;clone&quot;) at a time, and
 intending to string the pieces together based on a precisely constructed clone map. This approach
 had been shown to work very well with <i>Caenorhabditis elegans</i> (a roundworm) and human
 chromosome 22. But the process of making sure every last part of the sequence is read and put
 together properly is quite labor-intensive.</p>
 <p>
 Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, &amp;
 Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence
 Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the
 birth of the UCSC Genome Browser Group.</p>
 
-<h3>New challenger, Celera</h3>
+<a name="celera"></a>
+<h3>New challenger, Celera Genomics</h3>
 <p>
 It was a crucial time for the international project. A private company, Celera Genomics, had
 announced its intention to assemble the human genome sequence well in advance of the public effort,
 raising the fear that the sequence would be protected by patents and thus not be freely available
 to scientists. Celera Genomics was using an alternative approach, a so-called whole genome
 &quot;shotgun,&quot; where small bits of the sequence are read at random from the genome, and then a
 computer program assembles these bits into an approximation of the genome as a whole. By using this
 approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project
 from start to finish could be done in less than half the time the IHGP planned for their effort.</p>
 <p>
 An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence
 was to have similar utility to Celera's sequence, and in particular to prevent Celera and its
 clients from locking up significant portions of the human genome under patents. A number of groups
 within the IHGP were working on the second stage of assembly that would merge the approximately
 400,000 contigs into larger pieces and order them along the human chromosomes so that research
 groups could find the human genes. However, the process was slow and arduous. Even with the
 outstanding mapping information provided by Bob Waterston's group at Washington University, the
 second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers
 of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.</p>
 <p>
 At least partly in response to competition from Celera, the IHGP changed its focus from producing
 finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach
 in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer
 program into pieces called &quot;contigs.&quot; After the shotgun phase, a clone was typically in
 5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome
 when David Haussler first attempted to locate the genes computationally, and he quickly discovered
 that computational gene-finding was nearly impossible, since the average size of a contig was
 considerably smaller than the average size of a human gene.</p>
+
+<a name="push"></a>
 <h3>Push to the Finish Line</h3>
 <p>
 Motivated to prevent Celera and its clients from locking up significant portions of the human genome
 in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a
 remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer
 program that assembled the working draft of the human genome. The program, called GigAssembler,
 constructed the first working draft of the human genome on June 22, 2000, just days before Celera
 completed its first assembly. The IHGP working draft combined anonymous genomic information from
 human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera
 sequence was of a single individual. Since the public consortium finished the genome ahead of the
 private company, the genome and the information it contains is available free to researchers
 worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the
 completion of the first drafts of the human genome by the IHGP and Celera.</p>
 <p>
 On July 7, 2000, after further examination by the principal scientists of the public genome project,
@@ -155,58 +179,60 @@
 there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or
 ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled
 genome was published along with the findings of hundreds of researchers worldwide in the February
 15, 2001 issue of <i>Nature</i>, which was largely devoted to the human genome. In the months
 following the release of the working draft, the UCSC team worked with other researchers worldwide to
 fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses
 99% of the gene-containing regions of the human genome and is 99.99% accurate.</p>
 <p>
 The UCSC Genome Browser was designated as the official repository of the early human genome assembly
 iterations. Once the human genome sequence became available, other genome browsers also came online,
 most notably those at the National Center for Biotechnology Information (NCBI) and at the European
 Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow
 researchers to jump from any place in the human genome to the same region on either of the other two
 browsers.</p>
 
+<a name="ENCODE"></a>
 <h2>The ENCODE Project</h2>
 <p>
 The human genome contains vast amounts of information, and all of the functions of a human cell are
 implicitly coded in the human genome. With the molecular sequence known, researchers have been
 mining it for clues as to how the body works in health and in disease. Ultimately laying out the
 plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC
 Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to
 probe it with new experimental and informatics methodologies, and to decode the genetic program of
 the cell.</p>
 <p>
 After the sequence of the genome was first available, a researcher's ability to decode that sequence
 and tap into the wealth of information it holds was still quite limited. The next step beyond
 viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end,
 the UCSC Genome Browser group participated as the data collection center for the
 <a href="https://www.encodeproject.org/" target="_blank">ENCyclopedia Of DNA Elements (ENCODE)
 project</a>, an international endeavor to generate a comprehensive parts list of all the functional
 components in the human genome.</p>
 <p>
 ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome
 crucial to biological function. Before ENCODE, scientists focused on finding the genes, or
 protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic
 material of humans and other mammals. Non-coding regions of the genome have important functions, and
 the ENCODE project is developing a comprehensive &quot;parts list&quot; by identifying and precisely
 locating all functional elements in the human genome. This project, sponsored by the
 <a href="https://www.genome.gov/" target="_blank">National Human Genome Research Institute
 (NHGRI)</a>, involves an international consortium of scientists from government, industry, and
 academia.</p>
 
+<a name="ucsc"></a>
 <h3>UC Santa Cruz's role</h3>
 <p>
 UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its
 inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC
 Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the
 database and web interface for all sequence-related data for the ENCODE project. This included
 integrating the data into the UCSC Human Genome Browser (where it continues to reside) on
 specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also
 developed, performed, and presented computational and comparative analyses to glean further genomic
 and functional information from the collective data.</p>
 <p>
 UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis
 groups to define data and metadata reporting standards for a broad range of genomics assays. They
 implemented data submission and validation pipelines, created and maintained the encodeproject.org
 website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at
@@ -226,31 +252,34 @@
 target="_blank">ENCODE data in the UCSC Genome Browser: year 5 update.</a> Nucleic Acids Res. 2013
 Jan;41(Database issue):D56-63.</p>
 
 <h3>More about the ENCODE Project</h3>
 <ul>
   <li>
     <a href="https://www.encodeproject.org/" target="_blank">ENCODE data portal</a></li>
   <li>
     <a href="https://www.nytimes.com/2008/11/11/science/11gene.html" target="_blank">New York
     Times article: &quot;Now: the rest of the genome&quot;</a></li>
   <li>
     <a href="https://www.genome.gov/11009066/" target="_blank">NHGRI announcement of the ENCODE
     Project</a></li>
 </ul>
 
+<a name="primer"></a>
 <h2>UCSC Genome Research Primer</h2>
+
+<a name="comparative"></a>
 <h3>Comparative Genomics</h3>
 <p>
 Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome
 Browser group conducts research into the functional elements of the human genome that have evolved
 under natural selection. Since the first assembly of the human genome, the UCSC group has added a
 growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken,
 mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar
 genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can
 lead to many different types of new discoveries:</p>
 <ul>
   <li>
     New gene discoveries can result from searching the human genome for sequences that match those
     with known functions in other organisms. The molecular genetics behind disease development and
     progression in model organisms can be leveraged to discover potential disease-related genes in
     humans, moving us closer to diagnostic advances and targeted treatments.</li>
@@ -259,30 +288,31 @@
     interspecies differences and of short segments in the human genome that have been extremely
     well-conserved over millions of years of evolution.</li>
   <li>
     By searching for the highly conserved segments in the human genome- those that are unchanged
     from like segments in the genomes of other organisms, we can begin to understand the essential
     elements of the blueprint for life. Researchers suspect that these highly conserved elements
     must be essential to function. Genes make up only a small percentage of the unchanged elements,
     suggesting that other unknown regulatory elements in the genome are also important for
     function.</li>
   <li>
     Searching for genes that have evolved with unusual speed from one organism to another will give
     clues to essential interspecies differences, such as differences between the human and
     chimpanzee brain.</li>
 </ul>
 
+<a name="health"></a>
 <h3>Possibilities for Health</h3>
 <p>
 As we begin to better understand the molecular mechanisms responsible for human disease, entirely
 new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular
 functions of a healthy human cell or organ, and we are still a long way from understanding the often
 subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the
 brink of a revolution in medicine.</p>
 <p>
 Rather than relying on trial and error to design and test new drugs, researchers will increasingly
 use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research
 based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome
 Browser, will also form the basis for new diagnoses and therapies for human disease that will
 transform the practice of medicine in this century.</p>
 <p>
 The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute