UCSC Genome Browser Project History

0367f2212eef6c8e73cfa0b169759338263f1c04
jnavarr5
  Fri Aug 16 16:03:34 2019 -0700
Moving the history.html page to the /goldenPath directory. refs #20314

diff --git src/hg/htdocs/goldenPath/history.html src/hg/htdocs/goldenPath/history.html
new file mode 100644
index 0000000..463164c
--- /dev/null
+++ src/hg/htdocs/goldenPath/history.html
@@ -0,0 +1,301 @@
+<!DOCTYPE html>
+<!--#set var="TITLE" value="Genome Browser History" -->
+<!--#set var="ROOT" value="." -->
+
+<!-- Relative paths to support mirror sites with non-standard GB docs install -->
+<!--#include virtual="$ROOT/inc/gbPageStart.html" -->
+
+<h1>UCSC Genome Browser Project History</h1>
+
+<h2>Genome Browser overview</h2>
+<p>
+The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows
+researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down
+to an individual nucleotide. The browser integrates the work of countless scientists in laboratories
+worldwide, including work generated at UCSC, in an interactive, graphical display.</p>
+<p>
+Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy,
+then the browser drills down to levels of increasing detail, focusing first on chromosome bands,
+then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then
+the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the
+genome alphabet. Not only does the browser show the genome sequence, but it also delineates known
+areas of the genome and offers supplementary information about the genes-in effect, providing the
+word breaks and punctuation.</p>
+<p>
+Genome sequences are difficult to read because they consist of letter strings with no breaks or
+punctuation. The example below contains 7 different letters (genomes contain only 4). Can you
+understand what it is saying?</p>
+<pre>
+THATTHATISISTHATTHATISNOTISNOTISTHATITITIS</pre>
+<p>
+With word breaks and punctuation, it starts to make sense:</p>
+<pre>
+THAT THAT IS, IS. THAT THAT IS NOT, IS NOT. IS THAT IT? IT IS!</pre>
+<p>
+The UCSC Genome Browser group played a pivotal role in bringing this extraordinary life script into
+the light of science. The browser presents both experimentally validated and computer-predicted
+genes along with dozens of lines of evidence that help scientists recognize the key features of
+genes and predict their function. The databases for the genome browser are updated nightly with new
+information generated by researchers throughout the world.</p>
+<p>
+When directed to focus on a particular segment of the genome, the browser displays a range of data
+that are stacked vertically. At the top, it shows the chromosome number and the current position on
+the chromosome. Underneath, it shows several rows of data about genes that have been found
+experimentally or have been predicted by a number of different methods. Below those are lines of
+information about gene expression and regulation, followed by comparisons with the genomes of other
+species and other information, such as single-nucleotide polymorphisms (SNPs).</p>
+<p>
+Far from simply displaying the genetic code, the UCSC browser brings the code to life by aligning
+relevant areas with experimental and computational data and images. It also links to international
+databases, giving researchers instant access to deeper information about the genome. An experienced
+user can form a hypothesis and verify it in minutes using this tool. Together this information
+represents an extremely comprehensive view of the genome, helping scientists recognize important
+features of the sequence and providing strong evidence of function. For instance, the genome browser
+helps unravel the varied splicing patterns whereby one gene can make many different proteins. This
+process of alternative splicing is thought to explain how a human can be so complex, yet have only
+about twice as many genes as a roundworm.</p>
+<p>
+The UCSC Genome Browser group continues to add functions to the genome browser, such as the Track
+Collection Builder, which allows multiple continuous-value graphing tracks to be copied and grouped
+into one composite track or &quot;collection.&quot; Once the tracks are inside of a collection, the
+Track Collection Builder tool allows you to sort by similarity and magnitude, as well as alter the
+aggregate/overlay graphing view options to compare results. By merging experimental results from
+multiple sources, this powerful tool allows researchers to better understand how genes function.</p>
+<p>
+Today, the UCSC Genome Browser group continues to make the human genome sequence even more useful
+for science and medicine by identifying and annotating key functional genomic elements in such a way
+that they are easily accessible to researchers. This process of discovery and categorization is a
+critical step toward fully understanding the workings of the human genome, a project that will
+occupy science and medicine for many years. The browser platform has multiple potential uses that
+can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser
+lead to spin-offs, or genome browser mirrors, such as the following:</p>
+<ul>
+  <li><a href="https://news.ucsc.edu/2008/05/2242.html"
+    target="_blank">The HIV Data Browser</a></li>
+  <li><a href="https://xena.ucsc.edu/welcome-to-ucsc-xena/"
+    target="_blank">The UCSC Cancer Genomics Browser</a></li>
+  <li><a href="https://genome.ucsc.edu/encode/"
+    target="_blank">The data collection center for the international ENCODE project</a></li>
+  <li><a href="http://genome.ucsc.edu/ebolaPortal/"
+    target="_blank">The UCSC Ebola Virus Genome Browser</a></li>
+</ul>
+
+<h2>Human Genome Project Race</h2>
+<p>
+In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric
+Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome
+Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted
+help in discovering the locations of the genes, which make up only approximately 1.5% of the
+sequence. Haussler had previously applied a mathematical technique known as hidden Markov models
+(HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the
+dominant gene-finding methodology and was used successfully on the <i>Drosophila melanogaster</i>
+(fruit fly) genome.</p>
+<p>
+At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the
+sequence one piece (or, in the jargon of molecular biology, one &quot;clone&quot;) at a time, and
+intending to string the pieces together based on a precisely constructed clone map. This approach
+had been shown to work very well with <i>Caenorhabditis elegans</i> (a roundworm) and human
+chromosome 22. But the process of making sure every last part of the sequence is read and put
+together properly is quite labor-intensive.</p>
+<p>
+Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, &amp;
+Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence
+Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the
+birth of the UCSC Genome Browser Group.</p>
+
+<h3>New challenger, Celera</h3>
+<p>
+It was a crucial time for the international project. A private company, Celera Genomics, had
+announced its intention to assemble the human genome sequence well in advance of the public effort,
+raising the fear that the sequence would be protected by patents and thus not be freely available
+to scientists. Celera Genomics was using an alternative approach, a so-called whole genome
+&quot;shotgun,&quot; where small bits of the sequence are read at random from the genome, and then a
+computer program assembles these bits into an approximation of the genome as a whole. By using this
+approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project
+from start to finish could be done in less than half the time the IHGP planned for their effort.</p>
+<p>
+An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence
+was to have similar utility to Celera's sequence, and in particular to prevent Celera and its
+clients from locking up significant portions of the human genome under patents. A number of groups
+within the IHGP were working on the second stage of assembly that would merge the approximately
+400,000 contigs into larger pieces and order them along the human chromosomes so that research
+groups could find the human genes. However, the process was slow and arduous. Even with the
+outstanding mapping information provided by Bob Waterston's group at Washington University, the
+second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers
+of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.</p>
+<p>
+At least partly in response to competition from Celera, the IHGP changed its focus from producing
+finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach
+in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer
+program into pieces called &quot;contigs.&quot; After the shotgun phase, a clone was typically in
+5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome
+when David Haussler first attempted to locate the genes computationally, and he quickly discovered
+that computational gene-finding was nearly impossible, since the average size of a contig was
+considerably smaller than the average size of a human gene.</p>
+<h3>Push to the Finish Line</h3>
+<p>
+Motivated to prevent Celera and its clients from locking up significant portions of the human genome
+in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a
+remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer
+program that assembled the working draft of the human genome. The program, called GigAssembler,
+constructed the first working draft of the human genome on June 22, 2000, just days before Celera
+completed its first assembly. The IHGP working draft combined anonymous genomic information from
+human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera
+sequence was of a single individual. Since the public consortium finished the genome ahead of the
+private company, the genome and the information it contains is available free to researchers
+worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the
+completion of the first drafts of the human genome by the IHGP and Celera.</p>
+<p>
+On July 7, 2000, after further examination by the principal scientists of the public genome project,
+and to facilitate the annotation process, the UCSC Genome Browser group released this first working
+draft on the web at <a href="https://genome.ucsc.edu" target="_blank">https://genome.ucsc.edu</a>.
+The scientific community downloaded one-half trillion bytes of information from the UCSC genome
+server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human
+species. The initial assembled human genome sequence was referred to as a working draft because
+there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or
+ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled
+genome was published along with the findings of hundreds of researchers worldwide in the February
+15, 2001 issue of <i>Nature</i>, which was largely devoted to the human genome. In the months
+following the release of the working draft, the UCSC team worked with other researchers worldwide to
+fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses
+99% of the gene-containing regions of the human genome and is 99.99% accurate.</p>
+<p>
+The UCSC Genome Browser was designated as the official repository of the early human genome assembly
+iterations. Once the human genome sequence became available, other genome browsers also came online,
+most notably those at the National Center for Biotechnology Information (NCBI) and at the European
+Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow
+researchers to jump from any place in the human genome to the same region on either of the other two
+browsers.</p>
+
+<h2>The ENCODE Project</h2>
+<p>
+The human genome contains vast amounts of information, and all of the functions of a human cell are
+implicitly coded in the human genome. With the molecular sequence known, researchers have been
+mining it for clues as to how the body works in health and in disease. Ultimately laying out the
+plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC
+Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to
+probe it with new experimental and informatics methodologies, and to decode the genetic program of
+the cell.</p>
+<p>
+After the sequence of the genome was first available, a researcher’s ability to decode that sequence
+and tap into the wealth of information it holds was still quite limited. The next step beyond
+viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end,
+the UCSC Genome Browser group participated as the data collection center for the
+<a href="https://www.encodeproject.org/" target="_blank">ENCyclopedia Of DNA Elements (ENCODE)
+project</a>, an international endeavor to generate a comprehensive parts list of all the functional
+components in the human genome.</p>
+<p>
+ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome
+crucial to biological function. Before ENCODE, scientists focused on finding the genes, or
+protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic
+material of humans and other mammals. Non-coding regions of the genome have important functions, and
+the ENCODE project is developing a comprehensive &quot;parts list&quot; by identifying and precisely
+locating all functional elements in the human genome. This project, sponsored by the
+<a href="https://www.genome.gov/" target="_blank">National Human Genome Research Institute
+(NHGRI)</a>, involves an international consortium of scientists from government, industry, and
+academia.</p>
+
+<h3>UC Santa Cruz's role</h3>
+<p>
+UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its
+inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC
+Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the
+database and web interface for all sequence-related data for the ENCODE project. This included
+integrating the data into the UCSC Human Genome Browser (where it continues to reside) on
+specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also
+developed, performed, and presented computational and comparative analyses to glean further genomic
+and functional information from the collective data.</p>
+<p>
+UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis
+groups to define data and metadata reporting standards for a broad range of genomics assays. They
+implemented data submission and validation pipelines, created and maintained the encodeproject.org
+website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at
+the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support
+for the project.</p>
+<p>
+The Michael Cherry laboratory at Stanford University took over the ENCODE data coordination center
+in late 2012. UC Santa Cruz continues to support existing ENCODE data and resources on the UCSC
+Genome Browser website. Newer ENCODE data of broad interest, in particular, integrative and summary
+data, will be incorporated into the browser.</p>
+<p>
+<em>The following paper describes ENCODE resources at UC Santa Cruz:</em></p>
+<p class="text-center">
+Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R,
+Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D,
+Kuhn RM, Haussler D, Kent WJ. <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193274"
+target="_blank">ENCODE data in the UCSC Genome Browser: year 5 update.</a> Nucleic Acids Res. 2013
+Jan;41(Database issue):D56-63.</p>
+
+<h3>More about the ENCODE Project</h3>
+<ul>
+  <li>
+    <a href="https://www.encodeproject.org/" target="_blank">ENCODE data portal</a></li>
+  <li>
+    <a href="https://www.nytimes.com/2008/11/11/science/11gene.html" target="_blank">New York
+    Times article: &quot;Now: the rest of the genome&quot;</a></li>
+  <li>
+    <a href="https://www.genome.gov/11009066/" target="_blank">NHGRI announcement of the ENCODE
+    Project</a></li>
+</ul>
+
+<h2>UCSC Genome Research Primer</h2>
+<h3>Comparative Genomics</h3>
+<p>
+Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome
+Browser group conducts research into the functional elements of the human genome that have evolved
+under natural selection. Since the first assembly of the human genome, the UCSC group has added a
+growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken,
+mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar
+genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can
+lead to many different types of new discoveries:</p>
+<ul>
+  <li>
+    New gene discoveries can result from searching the human genome for sequences that match those
+    with known functions in other organisms. The molecular genetics behind disease development and
+    progression in model organisms can be leveraged to discover potential disease-related genes in
+    humans, moving us closer to diagnostic advances and targeted treatments.</li>
+  <li>
+    We can reconstruct the evolutionary history of the human genome by identifying the origins of
+    interspecies differences and of short segments in the human genome that have been extremely
+    well-conserved over millions of years of evolution.</li>
+  <li>
+    By searching for the highly conserved segments in the human genome- those that are unchanged
+    from like segments in the genomes of other organisms, we can begin to understand the essential
+    elements of the blueprint for life. Researchers suspect that these highly conserved elements
+    must be essential to function. Genes make up only a small percentage of the unchanged elements,
+    suggesting that other unknown regulatory elements in the genome are also important for
+    function.</li>
+  <li>
+    Searching for genes that have evolved with unusual speed from one organism to another will give
+    clues to essential interspecies differences, such as differences between the human and
+    chimpanzee brain.</li>
+</ul>
+
+<h3>Possibilities for Health</h3>
+<p>
+As we begin to better understand the molecular mechanisms responsible for human disease, entirely
+new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular
+functions of a healthy human cell or organ, and we are still a long way from understanding the often
+subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the
+brink of a revolution in medicine.</p>
+<p>
+Rather than relying on trial and error to design and test new drugs, researchers will increasingly
+use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research
+based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome
+Browser, will also form the basis for new diagnoses and therapies for human disease that will
+transform the practice of medicine in this century.</p>
+<p>
+The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute
+(NHGRI), a medical sequencing project intended to amass data relating genes to health conditions.
+This project sets the stage for the time when it becomes affordable for an individual's genome to be
+sequenced. The information obtained will allow estimates of future disease risk and improve the
+prevention, diagnosis, and treatment of disease. The project focuses on rare Mendelian disorders,
+complex disorders, and normal human variation.</p>
+<p>
+The practice of medicine will become much more individualized, with therapies tailored to be most
+effective given an individual's genetic makeup. Medical tests are already available to identify
+individual genetic variations that affect a patient's response to commonly used medications. These
+tests can allow doctors to avoid adverse reactions and choose medications appropriate for specific
+individuals. Someday we may even be able to repair or replace the disease-causing genes,
+re-orchestrating the molecular pathways needed for health.</p>
+<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->