0367f2212eef6c8e73cfa0b169759338263f1c04 jnavarr5 Fri Aug 16 16:03:34 2019 -0700 Moving the history.html page to the /goldenPath directory. refs #20314 diff --git src/hg/htdocs/goldenPath/history.html src/hg/htdocs/goldenPath/history.html new file mode 100644 index 0000000..463164c --- /dev/null +++ src/hg/htdocs/goldenPath/history.html @@ -0,0 +1,301 @@ + + + + + + + +

UCSC Genome Browser Project History

+ +

Genome Browser overview

+

+The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows +researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down +to an individual nucleotide. The browser integrates the work of countless scientists in laboratories +worldwide, including work generated at UCSC, in an interactive, graphical display.

+

+Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy, +then the browser drills down to levels of increasing detail, focusing first on chromosome bands, +then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then +the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the +genome alphabet. Not only does the browser show the genome sequence, but it also delineates known +areas of the genome and offers supplementary information about the genes-in effect, providing the +word breaks and punctuation.

+

+Genome sequences are difficult to read because they consist of letter strings with no breaks or +punctuation. The example below contains 7 different letters (genomes contain only 4). Can you +understand what it is saying?

+
+THATTHATISISTHATTHATISNOTISNOTISTHATITITIS
+

+With word breaks and punctuation, it starts to make sense:

+
+THAT THAT IS, IS. THAT THAT IS NOT, IS NOT. IS THAT IT? IT IS!
+

+The UCSC Genome Browser group played a pivotal role in bringing this extraordinary life script into +the light of science. The browser presents both experimentally validated and computer-predicted +genes along with dozens of lines of evidence that help scientists recognize the key features of +genes and predict their function. The databases for the genome browser are updated nightly with new +information generated by researchers throughout the world.

+

+When directed to focus on a particular segment of the genome, the browser displays a range of data +that are stacked vertically. At the top, it shows the chromosome number and the current position on +the chromosome. Underneath, it shows several rows of data about genes that have been found +experimentally or have been predicted by a number of different methods. Below those are lines of +information about gene expression and regulation, followed by comparisons with the genomes of other +species and other information, such as single-nucleotide polymorphisms (SNPs).

+

+Far from simply displaying the genetic code, the UCSC browser brings the code to life by aligning +relevant areas with experimental and computational data and images. It also links to international +databases, giving researchers instant access to deeper information about the genome. An experienced +user can form a hypothesis and verify it in minutes using this tool. Together this information +represents an extremely comprehensive view of the genome, helping scientists recognize important +features of the sequence and providing strong evidence of function. For instance, the genome browser +helps unravel the varied splicing patterns whereby one gene can make many different proteins. This +process of alternative splicing is thought to explain how a human can be so complex, yet have only +about twice as many genes as a roundworm.

+

+The UCSC Genome Browser group continues to add functions to the genome browser, such as the Track +Collection Builder, which allows multiple continuous-value graphing tracks to be copied and grouped +into one composite track or "collection." Once the tracks are inside of a collection, the +Track Collection Builder tool allows you to sort by similarity and magnitude, as well as alter the +aggregate/overlay graphing view options to compare results. By merging experimental results from +multiple sources, this powerful tool allows researchers to better understand how genes function.

+

+Today, the UCSC Genome Browser group continues to make the human genome sequence even more useful +for science and medicine by identifying and annotating key functional genomic elements in such a way +that they are easily accessible to researchers. This process of discovery and categorization is a +critical step toward fully understanding the workings of the human genome, a project that will +occupy science and medicine for many years. The browser platform has multiple potential uses that +can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser +lead to spin-offs, or genome browser mirrors, such as the following:

+ + +

Human Genome Project Race

+

+In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric +Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome +Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted +help in discovering the locations of the genes, which make up only approximately 1.5% of the +sequence. Haussler had previously applied a mathematical technique known as hidden Markov models +(HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the +dominant gene-finding methodology and was used successfully on the Drosophila melanogaster +(fruit fly) genome.

+

+At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the +sequence one piece (or, in the jargon of molecular biology, one "clone") at a time, and +intending to string the pieces together based on a precisely constructed clone map. This approach +had been shown to work very well with Caenorhabditis elegans (a roundworm) and human +chromosome 22. But the process of making sure every last part of the sequence is read and put +together properly is quite labor-intensive.

+

+Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, & +Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence +Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the +birth of the UCSC Genome Browser Group.

+ +

New challenger, Celera

+

+It was a crucial time for the international project. A private company, Celera Genomics, had +announced its intention to assemble the human genome sequence well in advance of the public effort, +raising the fear that the sequence would be protected by patents and thus not be freely available +to scientists. Celera Genomics was using an alternative approach, a so-called whole genome +"shotgun," where small bits of the sequence are read at random from the genome, and then a +computer program assembles these bits into an approximation of the genome as a whole. By using this +approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project +from start to finish could be done in less than half the time the IHGP planned for their effort.

+

+An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence +was to have similar utility to Celera's sequence, and in particular to prevent Celera and its +clients from locking up significant portions of the human genome under patents. A number of groups +within the IHGP were working on the second stage of assembly that would merge the approximately +400,000 contigs into larger pieces and order them along the human chromosomes so that research +groups could find the human genes. However, the process was slow and arduous. Even with the +outstanding mapping information provided by Bob Waterston's group at Washington University, the +second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers +of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.

+

+At least partly in response to competition from Celera, the IHGP changed its focus from producing +finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach +in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer +program into pieces called "contigs." After the shotgun phase, a clone was typically in +5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome +when David Haussler first attempted to locate the genes computationally, and he quickly discovered +that computational gene-finding was nearly impossible, since the average size of a contig was +considerably smaller than the average size of a human gene.

+

Push to the Finish Line

+

+Motivated to prevent Celera and its clients from locking up significant portions of the human genome +in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a +remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer +program that assembled the working draft of the human genome. The program, called GigAssembler, +constructed the first working draft of the human genome on June 22, 2000, just days before Celera +completed its first assembly. The IHGP working draft combined anonymous genomic information from +human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera +sequence was of a single individual. Since the public consortium finished the genome ahead of the +private company, the genome and the information it contains is available free to researchers +worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the +completion of the first drafts of the human genome by the IHGP and Celera.

+

+On July 7, 2000, after further examination by the principal scientists of the public genome project, +and to facilitate the annotation process, the UCSC Genome Browser group released this first working +draft on the web at https://genome.ucsc.edu. +The scientific community downloaded one-half trillion bytes of information from the UCSC genome +server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human +species. The initial assembled human genome sequence was referred to as a working draft because +there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or +ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled +genome was published along with the findings of hundreds of researchers worldwide in the February +15, 2001 issue of Nature, which was largely devoted to the human genome. In the months +following the release of the working draft, the UCSC team worked with other researchers worldwide to +fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses +99% of the gene-containing regions of the human genome and is 99.99% accurate.

+

+The UCSC Genome Browser was designated as the official repository of the early human genome assembly +iterations. Once the human genome sequence became available, other genome browsers also came online, +most notably those at the National Center for Biotechnology Information (NCBI) and at the European +Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow +researchers to jump from any place in the human genome to the same region on either of the other two +browsers.

+ +

The ENCODE Project

+

+The human genome contains vast amounts of information, and all of the functions of a human cell are +implicitly coded in the human genome. With the molecular sequence known, researchers have been +mining it for clues as to how the body works in health and in disease. Ultimately laying out the +plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC +Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to +probe it with new experimental and informatics methodologies, and to decode the genetic program of +the cell.

+

+After the sequence of the genome was first available, a researcher’s ability to decode that sequence +and tap into the wealth of information it holds was still quite limited. The next step beyond +viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end, +the UCSC Genome Browser group participated as the data collection center for the +ENCyclopedia Of DNA Elements (ENCODE) +project, an international endeavor to generate a comprehensive parts list of all the functional +components in the human genome.

+

+ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome +crucial to biological function. Before ENCODE, scientists focused on finding the genes, or +protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic +material of humans and other mammals. Non-coding regions of the genome have important functions, and +the ENCODE project is developing a comprehensive "parts list" by identifying and precisely +locating all functional elements in the human genome. This project, sponsored by the +National Human Genome Research Institute +(NHGRI), involves an international consortium of scientists from government, industry, and +academia.

+ +

UC Santa Cruz's role

+

+UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its +inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC +Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the +database and web interface for all sequence-related data for the ENCODE project. This included +integrating the data into the UCSC Human Genome Browser (where it continues to reside) on +specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also +developed, performed, and presented computational and comparative analyses to glean further genomic +and functional information from the collective data.

+

+UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis +groups to define data and metadata reporting standards for a broad range of genomics assays. They +implemented data submission and validation pipelines, created and maintained the encodeproject.org +website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at +the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support +for the project.

+

+The Michael Cherry laboratory at Stanford University took over the ENCODE data coordination center +in late 2012. UC Santa Cruz continues to support existing ENCODE data and resources on the UCSC +Genome Browser website. Newer ENCODE data of broad interest, in particular, integrative and summary +data, will be incorporated into the browser.

+

+The following paper describes ENCODE resources at UC Santa Cruz:

+

+Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, +Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, +Kuhn RM, Haussler D, Kent WJ. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013 +Jan;41(Database issue):D56-63.

+ +

More about the ENCODE Project

+ + +

UCSC Genome Research Primer

+

Comparative Genomics

+

+Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome +Browser group conducts research into the functional elements of the human genome that have evolved +under natural selection. Since the first assembly of the human genome, the UCSC group has added a +growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken, +mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar +genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can +lead to many different types of new discoveries:

+ + +

Possibilities for Health

+

+As we begin to better understand the molecular mechanisms responsible for human disease, entirely +new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular +functions of a healthy human cell or organ, and we are still a long way from understanding the often +subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the +brink of a revolution in medicine.

+

+Rather than relying on trial and error to design and test new drugs, researchers will increasingly +use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research +based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome +Browser, will also form the basis for new diagnoses and therapies for human disease that will +transform the practice of medicine in this century.

+

+The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute +(NHGRI), a medical sequencing project intended to amass data relating genes to health conditions. +This project sets the stage for the time when it becomes affordable for an individual's genome to be +sequenced. The information obtained will allow estimates of future disease risk and improve the +prevention, diagnosis, and treatment of disease. The project focuses on rare Mendelian disorders, +complex disorders, and normal human variation.

+

+The practice of medicine will become much more individualized, with therapies tailored to be most +effective given an individual's genetic makeup. Medical tests are already available to identify +individual genetic variations that affect a patient's response to commonly used medications. These +tests can allow doctors to avoid adverse reactions and choose medications appropriate for specific +individuals. Someday we may even be able to repair or replace the disease-causing genes, +re-orchestrating the molecular pathways needed for health.

+