UCSC Genome Browser Project History

0367f2212eef6c8e73cfa0b169759338263f1c04
jnavarr5
  Fri Aug 16 16:03:34 2019 -0700
Moving the history.html page to the /goldenPath directory. refs #20314

diff --git src/hg/htdocs/history.html src/hg/htdocs/history.html
deleted file mode 100644
index 463164c..0000000
--- src/hg/htdocs/history.html
+++ /dev/null
@@ -1,301 +0,0 @@
-<!DOCTYPE html>
-<!--#set var="TITLE" value="Genome Browser History" -->
-<!--#set var="ROOT" value="." -->
-
-<!-- Relative paths to support mirror sites with non-standard GB docs install -->
-<!--#include virtual="$ROOT/inc/gbPageStart.html" -->
-
-<h1>UCSC Genome Browser Project History</h1>
-
-<h2>Genome Browser overview</h2>
-<p>
-The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows
-researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down
-to an individual nucleotide. The browser integrates the work of countless scientists in laboratories
-worldwide, including work generated at UCSC, in an interactive, graphical display.</p>
-<p>
-Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy,
-then the browser drills down to levels of increasing detail, focusing first on chromosome bands,
-then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then
-the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the
-genome alphabet. Not only does the browser show the genome sequence, but it also delineates known
-areas of the genome and offers supplementary information about the genes-in effect, providing the
-word breaks and punctuation.</p>
-<p>
-Genome sequences are difficult to read because they consist of letter strings with no breaks or
-punctuation. The example below contains 7 different letters (genomes contain only 4). Can you
-understand what it is saying?</p>
-<pre>
-THATTHATISISTHATTHATISNOTISNOTISTHATITITIS</pre>
-<p>
-With word breaks and punctuation, it starts to make sense:</p>
-<pre>
-THAT THAT IS, IS. THAT THAT IS NOT, IS NOT. IS THAT IT? IT IS!</pre>
-<p>
-The UCSC Genome Browser group played a pivotal role in bringing this extraordinary life script into
-the light of science. The browser presents both experimentally validated and computer-predicted
-genes along with dozens of lines of evidence that help scientists recognize the key features of
-genes and predict their function. The databases for the genome browser are updated nightly with new
-information generated by researchers throughout the world.</p>
-<p>
-When directed to focus on a particular segment of the genome, the browser displays a range of data
-that are stacked vertically. At the top, it shows the chromosome number and the current position on
-the chromosome. Underneath, it shows several rows of data about genes that have been found
-experimentally or have been predicted by a number of different methods. Below those are lines of
-information about gene expression and regulation, followed by comparisons with the genomes of other
-species and other information, such as single-nucleotide polymorphisms (SNPs).</p>
-<p>
-Far from simply displaying the genetic code, the UCSC browser brings the code to life by aligning
-relevant areas with experimental and computational data and images. It also links to international
-databases, giving researchers instant access to deeper information about the genome. An experienced
-user can form a hypothesis and verify it in minutes using this tool. Together this information
-represents an extremely comprehensive view of the genome, helping scientists recognize important
-features of the sequence and providing strong evidence of function. For instance, the genome browser
-helps unravel the varied splicing patterns whereby one gene can make many different proteins. This
-process of alternative splicing is thought to explain how a human can be so complex, yet have only
-about twice as many genes as a roundworm.</p>
-<p>
-The UCSC Genome Browser group continues to add functions to the genome browser, such as the Track
-Collection Builder, which allows multiple continuous-value graphing tracks to be copied and grouped
-into one composite track or &quot;collection.&quot; Once the tracks are inside of a collection, the
-Track Collection Builder tool allows you to sort by similarity and magnitude, as well as alter the
-aggregate/overlay graphing view options to compare results. By merging experimental results from
-multiple sources, this powerful tool allows researchers to better understand how genes function.</p>
-<p>
-Today, the UCSC Genome Browser group continues to make the human genome sequence even more useful
-for science and medicine by identifying and annotating key functional genomic elements in such a way
-that they are easily accessible to researchers. This process of discovery and categorization is a
-critical step toward fully understanding the workings of the human genome, a project that will
-occupy science and medicine for many years. The browser platform has multiple potential uses that
-can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser
-lead to spin-offs, or genome browser mirrors, such as the following:</p>
-<ul>
-  <li><a href="https://news.ucsc.edu/2008/05/2242.html"
-    target="_blank">The HIV Data Browser</a></li>
-  <li><a href="https://xena.ucsc.edu/welcome-to-ucsc-xena/"
-    target="_blank">The UCSC Cancer Genomics Browser</a></li>
-  <li><a href="https://genome.ucsc.edu/encode/"
-    target="_blank">The data collection center for the international ENCODE project</a></li>
-  <li><a href="http://genome.ucsc.edu/ebolaPortal/"
-    target="_blank">The UCSC Ebola Virus Genome Browser</a></li>
-</ul>
-
-<h2>Human Genome Project Race</h2>
-<p>
-In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric
-Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome
-Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted
-help in discovering the locations of the genes, which make up only approximately 1.5% of the
-sequence. Haussler had previously applied a mathematical technique known as hidden Markov models
-(HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the
-dominant gene-finding methodology and was used successfully on the <i>Drosophila melanogaster</i>
-(fruit fly) genome.</p>
-<p>
-At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the
-sequence one piece (or, in the jargon of molecular biology, one &quot;clone&quot;) at a time, and
-intending to string the pieces together based on a precisely constructed clone map. This approach
-had been shown to work very well with <i>Caenorhabditis elegans</i> (a roundworm) and human
-chromosome 22. But the process of making sure every last part of the sequence is read and put
-together properly is quite labor-intensive.</p>
-<p>
-Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, &amp;
-Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence
-Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the
-birth of the UCSC Genome Browser Group.</p>
-
-<h3>New challenger, Celera</h3>
-<p>
-It was a crucial time for the international project. A private company, Celera Genomics, had
-announced its intention to assemble the human genome sequence well in advance of the public effort,
-raising the fear that the sequence would be protected by patents and thus not be freely available
-to scientists. Celera Genomics was using an alternative approach, a so-called whole genome
-&quot;shotgun,&quot; where small bits of the sequence are read at random from the genome, and then a
-computer program assembles these bits into an approximation of the genome as a whole. By using this
-approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project
-from start to finish could be done in less than half the time the IHGP planned for their effort.</p>
-<p>
-An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence
-was to have similar utility to Celera's sequence, and in particular to prevent Celera and its
-clients from locking up significant portions of the human genome under patents. A number of groups
-within the IHGP were working on the second stage of assembly that would merge the approximately
-400,000 contigs into larger pieces and order them along the human chromosomes so that research
-groups could find the human genes. However, the process was slow and arduous. Even with the
-outstanding mapping information provided by Bob Waterston's group at Washington University, the
-second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers
-of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.</p>
-<p>
-At least partly in response to competition from Celera, the IHGP changed its focus from producing
-finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach
-in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer
-program into pieces called &quot;contigs.&quot; After the shotgun phase, a clone was typically in
-5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome
-when David Haussler first attempted to locate the genes computationally, and he quickly discovered
-that computational gene-finding was nearly impossible, since the average size of a contig was
-considerably smaller than the average size of a human gene.</p>
-<h3>Push to the Finish Line</h3>
-<p>
-Motivated to prevent Celera and its clients from locking up significant portions of the human genome
-in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a
-remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer
-program that assembled the working draft of the human genome. The program, called GigAssembler,
-constructed the first working draft of the human genome on June 22, 2000, just days before Celera
-completed its first assembly. The IHGP working draft combined anonymous genomic information from
-human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera
-sequence was of a single individual. Since the public consortium finished the genome ahead of the
-private company, the genome and the information it contains is available free to researchers
-worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the
-completion of the first drafts of the human genome by the IHGP and Celera.</p>
-<p>
-On July 7, 2000, after further examination by the principal scientists of the public genome project,
-and to facilitate the annotation process, the UCSC Genome Browser group released this first working
-draft on the web at <a href="https://genome.ucsc.edu" target="_blank">https://genome.ucsc.edu</a>.
-The scientific community downloaded one-half trillion bytes of information from the UCSC genome
-server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human
-species. The initial assembled human genome sequence was referred to as a working draft because
-there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or
-ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled
-genome was published along with the findings of hundreds of researchers worldwide in the February
-15, 2001 issue of <i>Nature</i>, which was largely devoted to the human genome. In the months
-following the release of the working draft, the UCSC team worked with other researchers worldwide to
-fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses
-99% of the gene-containing regions of the human genome and is 99.99% accurate.</p>
-<p>
-The UCSC Genome Browser was designated as the official repository of the early human genome assembly
-iterations. Once the human genome sequence became available, other genome browsers also came online,
-most notably those at the National Center for Biotechnology Information (NCBI) and at the European
-Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow
-researchers to jump from any place in the human genome to the same region on either of the other two
-browsers.</p>
-
-<h2>The ENCODE Project</h2>
-<p>
-The human genome contains vast amounts of information, and all of the functions of a human cell are
-implicitly coded in the human genome. With the molecular sequence known, researchers have been
-mining it for clues as to how the body works in health and in disease. Ultimately laying out the
-plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC
-Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to
-probe it with new experimental and informatics methodologies, and to decode the genetic program of
-the cell.</p>
-<p>
-After the sequence of the genome was first available, a researcher’s ability to decode that sequence
-and tap into the wealth of information it holds was still quite limited. The next step beyond
-viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end,
-the UCSC Genome Browser group participated as the data collection center for the
-<a href="https://www.encodeproject.org/" target="_blank">ENCyclopedia Of DNA Elements (ENCODE)
-project</a>, an international endeavor to generate a comprehensive parts list of all the functional
-components in the human genome.</p>
-<p>
-ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome
-crucial to biological function. Before ENCODE, scientists focused on finding the genes, or
-protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic
-material of humans and other mammals. Non-coding regions of the genome have important functions, and
-the ENCODE project is developing a comprehensive &quot;parts list&quot; by identifying and precisely
-locating all functional elements in the human genome. This project, sponsored by the
-<a href="https://www.genome.gov/" target="_blank">National Human Genome Research Institute
-(NHGRI)</a>, involves an international consortium of scientists from government, industry, and
-academia.</p>
-
-<h3>UC Santa Cruz's role</h3>
-<p>
-UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its
-inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC
-Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the
-database and web interface for all sequence-related data for the ENCODE project. This included
-integrating the data into the UCSC Human Genome Browser (where it continues to reside) on
-specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also
-developed, performed, and presented computational and comparative analyses to glean further genomic
-and functional information from the collective data.</p>
-<p>
-UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis
-groups to define data and metadata reporting standards for a broad range of genomics assays. They
-implemented data submission and validation pipelines, created and maintained the encodeproject.org
-website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at
-the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support
-for the project.</p>
-<p>
-The Michael Cherry laboratory at Stanford University took over the ENCODE data coordination center
-in late 2012. UC Santa Cruz continues to support existing ENCODE data and resources on the UCSC
-Genome Browser website. Newer ENCODE data of broad interest, in particular, integrative and summary
-data, will be incorporated into the browser.</p>
-<p>
-<em>The following paper describes ENCODE resources at UC Santa Cruz:</em></p>
-<p class="text-center">
-Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R,
-Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D,
-Kuhn RM, Haussler D, Kent WJ. <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193274"
-target="_blank">ENCODE data in the UCSC Genome Browser: year 5 update.</a> Nucleic Acids Res. 2013
-Jan;41(Database issue):D56-63.</p>
-
-<h3>More about the ENCODE Project</h3>
-<ul>
-  <li>
-    <a href="https://www.encodeproject.org/" target="_blank">ENCODE data portal</a></li>
-  <li>
-    <a href="https://www.nytimes.com/2008/11/11/science/11gene.html" target="_blank">New York
-    Times article: &quot;Now: the rest of the genome&quot;</a></li>
-  <li>
-    <a href="https://www.genome.gov/11009066/" target="_blank">NHGRI announcement of the ENCODE
-    Project</a></li>
-</ul>
-
-<h2>UCSC Genome Research Primer</h2>
-<h3>Comparative Genomics</h3>
-<p>
-Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome
-Browser group conducts research into the functional elements of the human genome that have evolved
-under natural selection. Since the first assembly of the human genome, the UCSC group has added a
-growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken,
-mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar
-genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can
-lead to many different types of new discoveries:</p>
-<ul>
-  <li>
-    New gene discoveries can result from searching the human genome for sequences that match those
-    with known functions in other organisms. The molecular genetics behind disease development and
-    progression in model organisms can be leveraged to discover potential disease-related genes in
-    humans, moving us closer to diagnostic advances and targeted treatments.</li>
-  <li>
-    We can reconstruct the evolutionary history of the human genome by identifying the origins of
-    interspecies differences and of short segments in the human genome that have been extremely
-    well-conserved over millions of years of evolution.</li>
-  <li>
-    By searching for the highly conserved segments in the human genome- those that are unchanged
-    from like segments in the genomes of other organisms, we can begin to understand the essential
-    elements of the blueprint for life. Researchers suspect that these highly conserved elements
-    must be essential to function. Genes make up only a small percentage of the unchanged elements,
-    suggesting that other unknown regulatory elements in the genome are also important for
-    function.</li>
-  <li>
-    Searching for genes that have evolved with unusual speed from one organism to another will give
-    clues to essential interspecies differences, such as differences between the human and
-    chimpanzee brain.</li>
-</ul>
-
-<h3>Possibilities for Health</h3>
-<p>
-As we begin to better understand the molecular mechanisms responsible for human disease, entirely
-new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular
-functions of a healthy human cell or organ, and we are still a long way from understanding the often
-subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the
-brink of a revolution in medicine.</p>
-<p>
-Rather than relying on trial and error to design and test new drugs, researchers will increasingly
-use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research
-based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome
-Browser, will also form the basis for new diagnoses and therapies for human disease that will
-transform the practice of medicine in this century.</p>
-<p>
-The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute
-(NHGRI), a medical sequencing project intended to amass data relating genes to health conditions.
-This project sets the stage for the time when it becomes affordable for an individual's genome to be
-sequenced. The information obtained will allow estimates of future disease risk and improve the
-prevention, diagnosis, and treatment of disease. The project focuses on rare Mendelian disorders,
-complex disorders, and normal human variation.</p>
-<p>
-The practice of medicine will become much more individualized, with therapies tailored to be most
-effective given an individual's genetic makeup. Medical tests are already available to identify
-individual genetic variations that affect a patient's response to commonly used medications. These
-tests can allow doctors to avoid adverse reactions and choose medications appropriate for specific
-individuals. Someday we may even be able to repair or replace the disease-causing genes,
-re-orchestrating the molecular pathways needed for health.</p>
-<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->