bd20ab0b9b864dd251d23750e8d8f4a8e3606d0f jnavarr5 Thu Aug 29 16:00:37 2019 -0700 Adding links to other projects on the history page, refs #20314 diff --git src/hg/htdocs/goldenPath/history.html src/hg/htdocs/goldenPath/history.html index f651459..319875d 100755 --- src/hg/htdocs/goldenPath/history.html +++ src/hg/htdocs/goldenPath/history.html @@ -92,63 +92,66 @@ spin-offs, or genome browser mirrors, such as the following:

Human Genome Project — The Race

-In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric +In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted help in discovering the locations of the genes, which make up only approximately 1.5% of the sequence. Haussler had previously applied a mathematical technique known as hidden Markov models (HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the dominant gene-finding methodology and was used successfully on the Drosophila melanogaster (fruit fly) genome.

At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the sequence one piece (or, in the jargon of molecular biology, one "clone") at a time, and intending to string the pieces together based on a precisely constructed clone map. This approach had been shown to work very well with Caenorhabditis elegans (a roundworm) and human chromosome 22. But the process of making sure every last part of the sequence is read and put together properly is quite labor-intensive.

Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, & Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the birth of the UCSC Genome Browser group.

New challenger, Celera Genomics

-It was a crucial time for the international project. A private company, Celera Genomics, had -announced its intention to assemble the human genome sequence well in advance of the public effort, -raising the fear that the sequence would be protected by patents and thus not be freely available -to scientists. Celera Genomics was using an alternative approach, a so-called whole genome -"shotgun" method, where small bits of the sequence are read at -random from the genome, and then a -computer program assembles these bits into an approximation of the genome as a whole. By using this -approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project -from start to finish could be done in less than half the time the IHGP planned for their effort.

+It was a crucial time for the international project. A private company, Celera Genomics, had announced its +intention to assemble the human genome sequence well in advance of the public effort, raising the +fear that the sequence would be protected by patents and thus not be freely available to scientists. +Celera Genomics was using an alternative approach, a so-called whole genome "shotgun" +method, where small bits of the sequence are read at random from the genome, and then a computer +program assembles these bits into an approximation of the genome as a whole. By using this approach, +Celera's assembly would still have numerous gaps and ambiguities, but the entire project from start +to finish could be done in less than half the time the IHGP planned for their effort. A further +complication was the fact that Celera had access to the fruits of the public project, while keeping +their own results private.

An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence was to have similar utility to Celera's sequence, and in particular to prevent Celera and its clients from locking up significant portions of the human genome under patents. A number of groups within the IHGP were working on the second stage of assembly that would merge the approximately 400,000 contigs into larger pieces and order them along the human chromosomes so that research groups could find the human genes. However, the process was slow and arduous. Even with the outstanding mapping information provided by Bob Waterston's group at Washington University, the second-stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.

At least partly in response to competition from Celera, the IHGP changed its focus from producing finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach in miniature. Bits of a clone were read at random, and the bits were stitched together by a computer program into pieces called "contigs." After the shotgun phase, a clone was typically in @@ -177,34 +180,35 @@ draft on the web at https://genome.ucsc.edu. In the first 24 hours of free and unrestricted access to the human genome, the scientific community downloaded one-half trillion bytes of information from the assembled blueprint of our human species. The initial assembled human genome sequence was referred to as a working draft because there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled genome was published along with the findings of hundreds of researchers worldwide in the February 15, 2001 issue of Nature, which was largely devoted to the human genome. In the months following the release of the working draft, the UCSC team worked with other researchers worldwide to fill in the gaps. The resulting sequence made its debut in April of 2003. It encompasses 99% of the gene-containing regions of the human genome and is 99.99% accurate.

The UCSC Genome Browser was designated as the official repository of the early human genome assembly iterations. Once the human genome sequence became available, other genome browsers also came online, -most notably those at the National Center for Biotechnology Information (NCBI) and at the European -Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow -researchers to jump from any place in the human genome to the same region on either of the other two -browsers.

+most notably those at the National Center +for Biotechnology Information (NCBI) and at the +European Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers +allow researchers to jump from any place in the human genome to the same region on either of the +other two browsers.

The ENCODE Project

The human genome contains vast amounts of information, and all of the functions of a human cell are implicitly coded in the human genome. With the molecular sequence known, researchers have been mining it for clues as to how the body works in health and in disease, ultimately laying out the plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to probe it with new experimental and informatics methodologies, and to decode the genetic program of the cell.

After the sequence of the genome was first available, a researcher's ability to decode that sequence and tap into the wealth of information it holds was still quite limited. The next step beyond viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end, @@ -216,44 +220,46 @@ ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome crucial to biological function. Before ENCODE, scientists focused on finding the genes, or protein-coding regions, in DNA sequences; but these account for only about 1.5% of the genetic material of humans and other mammals. Non-coding regions of the genome have important functions serving as the instruction set for when and in which tissues genes are turned on and off. The ENCODE project is developing a comprehensive "parts list" by identifying and precisely locating all functional elements in the human genome. This project, sponsored by the National Human Genome Research Institute (NHGRI), involves an international consortium of scientists from government, industry, and academia.

UC Santa Cruz's Role

UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its -inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC -Genome Browser group, directed by Jim Kent with technical management by Kate Rosenbloom, provided the -database and web interface for all sequence-related data to the ENCODE project. This included -integrating the data into the UCSC Human Genome Browser (where it continues to reside) on -specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also -developed, performed, and presented computational and comparative analyses to glean further genomic -and functional information from the collective data.

+inception in 2003 through the end of the +first production phase in 2012. During that time, the UCSC Genome Browser group, directed by +Jim Kent with technical management by Kate Rosenbloom, provided the database and web interface for +all sequence-related data to the ENCODE project. This included integrating the data into the UCSC +Human Genome Browser (where it continues to reside) on specialized tracks, and providing further +in-depth information on detail pages. UC Santa Cruz also developed, performed, and presented +computational and comparative analyses to glean further genomic and functional information from the +collective data.

UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis groups to define data and metadata reporting standards for a broad range of genomics assays. They -implemented data submission and validation pipelines, created and maintained the encodeproject.org -website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at -the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support -for the project.

+implemented data submission and validation pipelines, created and maintained the +encodeproject.org website, developed +user access tools for ENCODE data, exported all ENCODE data to repositories at the National Center +for Biotechnology Information (NCBI), and provided outreach and tutorial support for the project. +

The ENCODE data coordination was passed on to the Michael Cherry laboratory at Stanford University in late 2012. UC Santa Cruz, however, continues to support existing ENCODE data and resources on the UCSC Genome Browser website. Newer ENCODE data of broad interest, particularly integrative and summary data, will be incorporated into the browser.

The following paper describes ENCODE resources at UC Santa Cruz:

Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, Heitner SG et al. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013 Jan;41(Database issue):D56-63. PMID: 23193274; PMC: PMC3531152