0367f2212eef6c8e73cfa0b169759338263f1c04 jnavarr5 Fri Aug 16 16:03:34 2019 -0700 Moving the history.html page to the /goldenPath directory. refs #20314 diff --git src/hg/htdocs/history.html src/hg/htdocs/history.html deleted file mode 100644 index 463164c..0000000 --- src/hg/htdocs/history.html +++ /dev/null @@ -1,301 +0,0 @@ -<!DOCTYPE html> -<!--#set var="TITLE" value="Genome Browser History" --> -<!--#set var="ROOT" value="." --> - -<!-- Relative paths to support mirror sites with non-standard GB docs install --> -<!--#include virtual="$ROOT/inc/gbPageStart.html" --> - -<h1>UCSC Genome Browser Project History</h1> - -<h2>Genome Browser overview</h2> -<p> -The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows -researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down -to an individual nucleotide. The browser integrates the work of countless scientists in laboratories -worldwide, including work generated at UCSC, in an interactive, graphical display.</p> -<p> -Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy, -then the browser drills down to levels of increasing detail, focusing first on chromosome bands, -then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then -the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the -genome alphabet. Not only does the browser show the genome sequence, but it also delineates known -areas of the genome and offers supplementary information about the genes-in effect, providing the -word breaks and punctuation.</p> -<p> -Genome sequences are difficult to read because they consist of letter strings with no breaks or -punctuation. The example below contains 7 different letters (genomes contain only 4). Can you -understand what it is saying?</p> -<pre> -THATTHATISISTHATTHATISNOTISNOTISTHATITITIS</pre> -<p> -With word breaks and punctuation, it starts to make sense:</p> -<pre> -THAT THAT IS, IS. THAT THAT IS NOT, IS NOT. IS THAT IT? IT IS!</pre> -<p> -The UCSC Genome Browser group played a pivotal role in bringing this extraordinary life script into -the light of science. The browser presents both experimentally validated and computer-predicted -genes along with dozens of lines of evidence that help scientists recognize the key features of -genes and predict their function. The databases for the genome browser are updated nightly with new -information generated by researchers throughout the world.</p> -<p> -When directed to focus on a particular segment of the genome, the browser displays a range of data -that are stacked vertically. At the top, it shows the chromosome number and the current position on -the chromosome. Underneath, it shows several rows of data about genes that have been found -experimentally or have been predicted by a number of different methods. Below those are lines of -information about gene expression and regulation, followed by comparisons with the genomes of other -species and other information, such as single-nucleotide polymorphisms (SNPs).</p> -<p> -Far from simply displaying the genetic code, the UCSC browser brings the code to life by aligning -relevant areas with experimental and computational data and images. It also links to international -databases, giving researchers instant access to deeper information about the genome. An experienced -user can form a hypothesis and verify it in minutes using this tool. Together this information -represents an extremely comprehensive view of the genome, helping scientists recognize important -features of the sequence and providing strong evidence of function. For instance, the genome browser -helps unravel the varied splicing patterns whereby one gene can make many different proteins. This -process of alternative splicing is thought to explain how a human can be so complex, yet have only -about twice as many genes as a roundworm.</p> -<p> -The UCSC Genome Browser group continues to add functions to the genome browser, such as the Track -Collection Builder, which allows multiple continuous-value graphing tracks to be copied and grouped -into one composite track or "collection." Once the tracks are inside of a collection, the -Track Collection Builder tool allows you to sort by similarity and magnitude, as well as alter the -aggregate/overlay graphing view options to compare results. By merging experimental results from -multiple sources, this powerful tool allows researchers to better understand how genes function.</p> -<p> -Today, the UCSC Genome Browser group continues to make the human genome sequence even more useful -for science and medicine by identifying and annotating key functional genomic elements in such a way -that they are easily accessible to researchers. This process of discovery and categorization is a -critical step toward fully understanding the workings of the human genome, a project that will -occupy science and medicine for many years. The browser platform has multiple potential uses that -can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser -lead to spin-offs, or genome browser mirrors, such as the following:</p> -<ul> - <li><a href="https://news.ucsc.edu/2008/05/2242.html" - target="_blank">The HIV Data Browser</a></li> - <li><a href="https://xena.ucsc.edu/welcome-to-ucsc-xena/" - target="_blank">The UCSC Cancer Genomics Browser</a></li> - <li><a href="https://genome.ucsc.edu/encode/" - target="_blank">The data collection center for the international ENCODE project</a></li> - <li><a href="http://genome.ucsc.edu/ebolaPortal/" - target="_blank">The UCSC Ebola Virus Genome Browser</a></li> -</ul> - -<h2>Human Genome Project Race</h2> -<p> -In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric -Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome -Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted -help in discovering the locations of the genes, which make up only approximately 1.5% of the -sequence. Haussler had previously applied a mathematical technique known as hidden Markov models -(HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the -dominant gene-finding methodology and was used successfully on the <i>Drosophila melanogaster</i> -(fruit fly) genome.</p> -<p> -At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the -sequence one piece (or, in the jargon of molecular biology, one "clone") at a time, and -intending to string the pieces together based on a precisely constructed clone map. This approach -had been shown to work very well with <i>Caenorhabditis elegans</i> (a roundworm) and human -chromosome 22. But the process of making sure every last part of the sequence is read and put -together properly is quite labor-intensive.</p> -<p> -Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, & -Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence -Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the -birth of the UCSC Genome Browser Group.</p> - -<h3>New challenger, Celera</h3> -<p> -It was a crucial time for the international project. A private company, Celera Genomics, had -announced its intention to assemble the human genome sequence well in advance of the public effort, -raising the fear that the sequence would be protected by patents and thus not be freely available -to scientists. Celera Genomics was using an alternative approach, a so-called whole genome -"shotgun," where small bits of the sequence are read at random from the genome, and then a -computer program assembles these bits into an approximation of the genome as a whole. By using this -approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project -from start to finish could be done in less than half the time the IHGP planned for their effort.</p> -<p> -An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence -was to have similar utility to Celera's sequence, and in particular to prevent Celera and its -clients from locking up significant portions of the human genome under patents. A number of groups -within the IHGP were working on the second stage of assembly that would merge the approximately -400,000 contigs into larger pieces and order them along the human chromosomes so that research -groups could find the human genes. However, the process was slow and arduous. Even with the -outstanding mapping information provided by Bob Waterston's group at Washington University, the -second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers -of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.</p> -<p> -At least partly in response to competition from Celera, the IHGP changed its focus from producing -finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach -in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer -program into pieces called "contigs." After the shotgun phase, a clone was typically in -5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome -when David Haussler first attempted to locate the genes computationally, and he quickly discovered -that computational gene-finding was nearly impossible, since the average size of a contig was -considerably smaller than the average size of a human gene.</p> -<h3>Push to the Finish Line</h3> -<p> -Motivated to prevent Celera and its clients from locking up significant portions of the human genome -in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a -remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer -program that assembled the working draft of the human genome. The program, called GigAssembler, -constructed the first working draft of the human genome on June 22, 2000, just days before Celera -completed its first assembly. The IHGP working draft combined anonymous genomic information from -human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera -sequence was of a single individual. Since the public consortium finished the genome ahead of the -private company, the genome and the information it contains is available free to researchers -worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the -completion of the first drafts of the human genome by the IHGP and Celera.</p> -<p> -On July 7, 2000, after further examination by the principal scientists of the public genome project, -and to facilitate the annotation process, the UCSC Genome Browser group released this first working -draft on the web at <a href="https://genome.ucsc.edu" target="_blank">https://genome.ucsc.edu</a>. -The scientific community downloaded one-half trillion bytes of information from the UCSC genome -server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human -species. The initial assembled human genome sequence was referred to as a working draft because -there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or -ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled -genome was published along with the findings of hundreds of researchers worldwide in the February -15, 2001 issue of <i>Nature</i>, which was largely devoted to the human genome. In the months -following the release of the working draft, the UCSC team worked with other researchers worldwide to -fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses -99% of the gene-containing regions of the human genome and is 99.99% accurate.</p> -<p> -The UCSC Genome Browser was designated as the official repository of the early human genome assembly -iterations. Once the human genome sequence became available, other genome browsers also came online, -most notably those at the National Center for Biotechnology Information (NCBI) and at the European -Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow -researchers to jump from any place in the human genome to the same region on either of the other two -browsers.</p> - -<h2>The ENCODE Project</h2> -<p> -The human genome contains vast amounts of information, and all of the functions of a human cell are -implicitly coded in the human genome. With the molecular sequence known, researchers have been -mining it for clues as to how the body works in health and in disease. Ultimately laying out the -plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC -Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to -probe it with new experimental and informatics methodologies, and to decode the genetic program of -the cell.</p> -<p> -After the sequence of the genome was first available, a researcher’s ability to decode that sequence -and tap into the wealth of information it holds was still quite limited. The next step beyond -viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end, -the UCSC Genome Browser group participated as the data collection center for the -<a href="https://www.encodeproject.org/" target="_blank">ENCyclopedia Of DNA Elements (ENCODE) -project</a>, an international endeavor to generate a comprehensive parts list of all the functional -components in the human genome.</p> -<p> -ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome -crucial to biological function. Before ENCODE, scientists focused on finding the genes, or -protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic -material of humans and other mammals. Non-coding regions of the genome have important functions, and -the ENCODE project is developing a comprehensive "parts list" by identifying and precisely -locating all functional elements in the human genome. This project, sponsored by the -<a href="https://www.genome.gov/" target="_blank">National Human Genome Research Institute -(NHGRI)</a>, involves an international consortium of scientists from government, industry, and -academia.</p> - -<h3>UC Santa Cruz's role</h3> -<p> -UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its -inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC -Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the -database and web interface for all sequence-related data for the ENCODE project. This included -integrating the data into the UCSC Human Genome Browser (where it continues to reside) on -specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also -developed, performed, and presented computational and comparative analyses to glean further genomic -and functional information from the collective data.</p> -<p> -UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis -groups to define data and metadata reporting standards for a broad range of genomics assays. They -implemented data submission and validation pipelines, created and maintained the encodeproject.org -website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at -the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support -for the project.</p> -<p> -The Michael Cherry laboratory at Stanford University took over the ENCODE data coordination center -in late 2012. UC Santa Cruz continues to support existing ENCODE data and resources on the UCSC -Genome Browser website. Newer ENCODE data of broad interest, in particular, integrative and summary -data, will be incorporated into the browser.</p> -<p> -<em>The following paper describes ENCODE resources at UC Santa Cruz:</em></p> -<p class="text-center"> -Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, -Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, -Kuhn RM, Haussler D, Kent WJ. <a href="https://www.ncbi.nlm.nih.gov/pubmed/23193274" -target="_blank">ENCODE data in the UCSC Genome Browser: year 5 update.</a> Nucleic Acids Res. 2013 -Jan;41(Database issue):D56-63.</p> - -<h3>More about the ENCODE Project</h3> -<ul> - <li> - <a href="https://www.encodeproject.org/" target="_blank">ENCODE data portal</a></li> - <li> - <a href="https://www.nytimes.com/2008/11/11/science/11gene.html" target="_blank">New York - Times article: "Now: the rest of the genome"</a></li> - <li> - <a href="https://www.genome.gov/11009066/" target="_blank">NHGRI announcement of the ENCODE - Project</a></li> -</ul> - -<h2>UCSC Genome Research Primer</h2> -<h3>Comparative Genomics</h3> -<p> -Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome -Browser group conducts research into the functional elements of the human genome that have evolved -under natural selection. Since the first assembly of the human genome, the UCSC group has added a -growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken, -mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar -genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can -lead to many different types of new discoveries:</p> -<ul> - <li> - New gene discoveries can result from searching the human genome for sequences that match those - with known functions in other organisms. The molecular genetics behind disease development and - progression in model organisms can be leveraged to discover potential disease-related genes in - humans, moving us closer to diagnostic advances and targeted treatments.</li> - <li> - We can reconstruct the evolutionary history of the human genome by identifying the origins of - interspecies differences and of short segments in the human genome that have been extremely - well-conserved over millions of years of evolution.</li> - <li> - By searching for the highly conserved segments in the human genome- those that are unchanged - from like segments in the genomes of other organisms, we can begin to understand the essential - elements of the blueprint for life. Researchers suspect that these highly conserved elements - must be essential to function. Genes make up only a small percentage of the unchanged elements, - suggesting that other unknown regulatory elements in the genome are also important for - function.</li> - <li> - Searching for genes that have evolved with unusual speed from one organism to another will give - clues to essential interspecies differences, such as differences between the human and - chimpanzee brain.</li> -</ul> - -<h3>Possibilities for Health</h3> -<p> -As we begin to better understand the molecular mechanisms responsible for human disease, entirely -new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular -functions of a healthy human cell or organ, and we are still a long way from understanding the often -subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the -brink of a revolution in medicine.</p> -<p> -Rather than relying on trial and error to design and test new drugs, researchers will increasingly -use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research -based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome -Browser, will also form the basis for new diagnoses and therapies for human disease that will -transform the practice of medicine in this century.</p> -<p> -The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute -(NHGRI), a medical sequencing project intended to amass data relating genes to health conditions. -This project sets the stage for the time when it becomes affordable for an individual's genome to be -sequenced. The information obtained will allow estimates of future disease risk and improve the -prevention, diagnosis, and treatment of disease. The project focuses on rare Mendelian disorders, -complex disorders, and normal human variation.</p> -<p> -The practice of medicine will become much more individualized, with therapies tailored to be most -effective given an individual's genetic makeup. Medical tests are already available to identify -individual genetic variations that affect a patient's response to commonly used medications. These -tests can allow doctors to avoid adverse reactions and choose medications appropriate for specific -individuals. Someday we may even be able to repair or replace the disease-causing genes, -re-orchestrating the molecular pathways needed for health.</p> -<!--#include virtual="$ROOT/inc/gbPageEnd.html" -->