0367f2212eef6c8e73cfa0b169759338263f1c04 jnavarr5 Fri Aug 16 16:03:34 2019 -0700 Moving the history.html page to the /goldenPath directory. refs #20314 diff --git src/hg/htdocs/history.html src/hg/htdocs/history.html deleted file mode 100644 index 463164c..0000000 --- src/hg/htdocs/history.html +++ /dev/null @@ -1,301 +0,0 @@ - - - - - - - -
-The UCSC Genome Browser is a web-based tool serving as a multi-powered microscope that allows -researchers to view all 23 chromosomes of the human genome at any scale from a full chromosome down -to an individual nucleotide. The browser integrates the work of countless scientists in laboratories -worldwide, including work generated at UCSC, in an interactive, graphical display.
--Zoomed out, the coarse-level view shows early chromosome maps as determined by electron microscopy, -then the browser drills down to levels of increasing detail, focusing first on chromosome bands, -then on gene clusters (showing known genes-mostly those linked to diseases), then single genes, then -the components of genes, and finally on the nucleotides-the As, Cs, Gs, and Ts that make up the -genome alphabet. Not only does the browser show the genome sequence, but it also delineates known -areas of the genome and offers supplementary information about the genes-in effect, providing the -word breaks and punctuation.
--Genome sequences are difficult to read because they consist of letter strings with no breaks or -punctuation. The example below contains 7 different letters (genomes contain only 4). Can you -understand what it is saying?
--THATTHATISISTHATTHATISNOTISNOTISTHATITITIS-
-With word breaks and punctuation, it starts to make sense:
--THAT THAT IS, IS. THAT THAT IS NOT, IS NOT. IS THAT IT? IT IS!-
-The UCSC Genome Browser group played a pivotal role in bringing this extraordinary life script into -the light of science. The browser presents both experimentally validated and computer-predicted -genes along with dozens of lines of evidence that help scientists recognize the key features of -genes and predict their function. The databases for the genome browser are updated nightly with new -information generated by researchers throughout the world.
--When directed to focus on a particular segment of the genome, the browser displays a range of data -that are stacked vertically. At the top, it shows the chromosome number and the current position on -the chromosome. Underneath, it shows several rows of data about genes that have been found -experimentally or have been predicted by a number of different methods. Below those are lines of -information about gene expression and regulation, followed by comparisons with the genomes of other -species and other information, such as single-nucleotide polymorphisms (SNPs).
--Far from simply displaying the genetic code, the UCSC browser brings the code to life by aligning -relevant areas with experimental and computational data and images. It also links to international -databases, giving researchers instant access to deeper information about the genome. An experienced -user can form a hypothesis and verify it in minutes using this tool. Together this information -represents an extremely comprehensive view of the genome, helping scientists recognize important -features of the sequence and providing strong evidence of function. For instance, the genome browser -helps unravel the varied splicing patterns whereby one gene can make many different proteins. This -process of alternative splicing is thought to explain how a human can be so complex, yet have only -about twice as many genes as a roundworm.
--The UCSC Genome Browser group continues to add functions to the genome browser, such as the Track -Collection Builder, which allows multiple continuous-value graphing tracks to be copied and grouped -into one composite track or "collection." Once the tracks are inside of a collection, the -Track Collection Builder tool allows you to sort by similarity and magnitude, as well as alter the -aggregate/overlay graphing view options to compare results. By merging experimental results from -multiple sources, this powerful tool allows researchers to better understand how genes function.
--Today, the UCSC Genome Browser group continues to make the human genome sequence even more useful -for science and medicine by identifying and annotating key functional genomic elements in such a way -that they are easily accessible to researchers. This process of discovery and categorization is a -critical step toward fully understanding the workings of the human genome, a project that will -occupy science and medicine for many years. The browser platform has multiple potential uses that -can improve diagnosis, prevention, and cures for disease. The usefulness of the UCSC Genome Browser -lead to spin-offs, or genome browser mirrors, such as the following:
--In December 1999, the International Human Genome Project (IHGP) came to UC Santa Cruz when Eric -Lander, the director of the Whitehead sequencing center (Whitehead Institute/MIT Center for Genome -Research), invited David Haussler to help annotate the human genome. In particular, Lander wanted -help in discovering the locations of the genes, which make up only approximately 1.5% of the -sequence. Haussler had previously applied a mathematical technique known as hidden Markov models -(HMMs) to the task of computer gene-finding. This application of HMMs had quickly become the -dominant gene-finding methodology and was used successfully on the Drosophila melanogaster -(fruit fly) genome.
--At the time UCSC entered the International Human Genome Project (IHGP), the IHGP was assembling the -sequence one piece (or, in the jargon of molecular biology, one "clone") at a time, and -intending to string the pieces together based on a precisely constructed clone map. This approach -had been shown to work very well with Caenorhabditis elegans (a roundworm) and human -chromosome 22. But the process of making sure every last part of the sequence is read and put -together properly is quite labor-intensive.
--Haussler enlisted Jim Kent, then a graduate student at UCSC's Department of Molecular, Cell, & -Developmental Biology, along with systems engineer Patrick Gavin, and graduate students Terrence -Furey and David Kulp (who had led the gene-finding effort on the Drosophila genome). This was the -birth of the UCSC Genome Browser Group.
- --It was a crucial time for the international project. A private company, Celera Genomics, had -announced its intention to assemble the human genome sequence well in advance of the public effort, -raising the fear that the sequence would be protected by patents and thus not be freely available -to scientists. Celera Genomics was using an alternative approach, a so-called whole genome -"shotgun," where small bits of the sequence are read at random from the genome, and then a -computer program assembles these bits into an approximation of the genome as a whole. By using this -approach, Celera's assembly would still have numerous gaps and ambiguities, but the entire project -from start to finish could be done in less than half the time the IHGP planned for their effort.
--An approach resulting in numerous gaps and ambiguities was necessary if the IHGP's draft sequence -was to have similar utility to Celera's sequence, and in particular to prevent Celera and its -clients from locking up significant portions of the human genome under patents. A number of groups -within the IHGP were working on the second stage of assembly that would merge the approximately -400,000 contigs into larger pieces and order them along the human chromosomes so that research -groups could find the human genes. However, the process was slow and arduous. Even with the -outstanding mapping information provided by Bob Waterston's group at Washington University, the -second stage assembly turned out to be like an extremely difficult jigsaw puzzle, with many layers -of conflicting evidence having similar-looking, non-contiguous, overlapping pieces.
--At least partly in response to competition from Celera, the IHGP changed its focus from producing -finished clones to producing draft clones. To sequence a clone, the IHGP adopted a shotgun approach -in miniature. Bits of a clone was read at random, and the bits were stitched together by a computer -program into pieces called "contigs." After the shotgun phase, a clone was typically in -5-50 contigs, but the relative order of the contigs was not known. This was the state of the genome -when David Haussler first attempted to locate the genes computationally, and he quickly discovered -that computational gene-finding was nearly impossible, since the average size of a contig was -considerably smaller than the average size of a human gene.
--Motivated to prevent Celera and its clients from locking up significant portions of the human genome -in patents, Jim Kent dropped his other work in May of 2000 to focus on the assembly problem. In a -remarkable display of energy and talent, Kent developed within 4 weeks a 10,000-line computer -program that assembled the working draft of the human genome. The program, called GigAssembler, -constructed the first working draft of the human genome on June 22, 2000, just days before Celera -completed its first assembly. The IHGP working draft combined anonymous genomic information from -human volunteers of diverse backgrounds, accepted on a first-come, first-taken basis. The Celera -sequence was of a single individual. Since the public consortium finished the genome ahead of the -private company, the genome and the information it contains is available free to researchers -worldwide. Kent's assembly was celebrated at a White House ceremony on June 26, 2000, announcing the -completion of the first drafts of the human genome by the IHGP and Celera.
--On July 7, 2000, after further examination by the principal scientists of the public genome project, -and to facilitate the annotation process, the UCSC Genome Browser group released this first working -draft on the web at https://genome.ucsc.edu. -The scientific community downloaded one-half trillion bytes of information from the UCSC genome -server in the first 24 hours of free and unrestricted access to the assembled blueprint of our human -species. The initial assembled human genome sequence was referred to as a working draft because -there remained gaps where DNA sequence was missing, due either to a lack of raw sequence data or -ambiguities in the positions of the fragments. With the gene assembly 90% complete, the assembled -genome was published along with the findings of hundreds of researchers worldwide in the February -15, 2001 issue of Nature, which was largely devoted to the human genome. In the months -following the release of the working draft, the UCSC team worked with other researchers worldwide to -fill in the gaps. The resulting finished sequence made its debut in April of 2003. It encompasses -99% of the gene-containing regions of the human genome and is 99.99% accurate.
--The UCSC Genome Browser was designated as the official repository of the early human genome assembly -iterations. Once the human genome sequence became available, other genome browsers also came online, -most notably those at the National Center for Biotechnology Information (NCBI) and at the European -Bioinformatics Institute (EBI). Reciprocal links provided on each of the three browsers allow -researchers to jump from any place in the human genome to the same region on either of the other two -browsers.
- --The human genome contains vast amounts of information, and all of the functions of a human cell are -implicitly coded in the human genome. With the molecular sequence known, researchers have been -mining it for clues as to how the body works in health and in disease. Ultimately laying out the -plan for the complex pathways of molecular interactions that the sequence orchestrates. The UCSC -Genome Browser aids the worldwide scientific community in its challenge to understand the genome, to -probe it with new experimental and informatics methodologies, and to decode the genetic program of -the cell.
--After the sequence of the genome was first available, a researcher’s ability to decode that sequence -and tap into the wealth of information it holds was still quite limited. The next step beyond -viewing the genome is gaining an understanding of the instructions encoded in it. Toward this end, -the UCSC Genome Browser group participated as the data collection center for the -ENCyclopedia Of DNA Elements (ENCODE) -project, an international endeavor to generate a comprehensive parts list of all the functional -components in the human genome.
--ENCODE is a scientific reconnaissance mission aimed at discovering all regions of the human genome -crucial to biological function. Before ENCODE, scientists focused on finding the genes, or -protein-coding regions in DNA sequences, but these account for only about 1.5% of the genetic -material of humans and other mammals. Non-coding regions of the genome have important functions, and -the ENCODE project is developing a comprehensive "parts list" by identifying and precisely -locating all functional elements in the human genome. This project, sponsored by the -National Human Genome Research Institute -(NHGRI), involves an international consortium of scientists from government, industry, and -academia.
- --UC Santa Cruz developed and ran the data coordination center for the ENCODE project from its -inception in 2003 through the end of the first production phase in 2012. During that time, the UCSC -Genome Browser group directed by Jim Kent with technical management by Kate Rosenbloom provided the -database and web interface for all sequence-related data for the ENCODE project. This included -integrating the data into the UCSC Human Genome Browser (where it continues to reside) on -specialized tracks, and providing further in-depth information on detail pages. UC Santa Cruz also -developed, performed, and presented computational and comparative analyses to glean further genomic -and functional information from the collective data.
--UC Santa Cruz worked closely with labs producing data for the ENCODE project and with data analysis -groups to define data and metadata reporting standards for a broad range of genomics assays. They -implemented data submission and validation pipelines, created and maintained the encodeproject.org -website, developed user access tools for ENCODE data, exported all ENCODE data to repositories at -the National Center for Biotechnology Information (NCBI), and provided outreach and tutorial support -for the project.
--The Michael Cherry laboratory at Stanford University took over the ENCODE data coordination center -in late 2012. UC Santa Cruz continues to support existing ENCODE data and resources on the UCSC -Genome Browser website. Newer ENCODE data of broad interest, in particular, integrative and summary -data, will be incorporated into the browser.
--The following paper describes ENCODE resources at UC Santa Cruz:
--Rosenbloom KR, Sloan CA, Malladi VS, Dreszer TR, Learned K, Kirkup VM, Wong MC, Maddren M, Fang R, -Heitner SG, Lee BT, Barber GP, Harte RA, Diekhans M, Long JC, Wilder SP, Zweig AS, Karolchik D, -Kuhn RM, Haussler D, Kent WJ. ENCODE data in the UCSC Genome Browser: year 5 update. Nucleic Acids Res. 2013 -Jan;41(Database issue):D56-63.
- --Besides developing, supporting, and continuing to improve the genome browser, the UCSC Genome -Browser group conducts research into the functional elements of the human genome that have evolved -under natural selection. Since the first assembly of the human genome, the UCSC group has added a -growing number of species to the UCSC Genome Browser, including roundworm, pufferfish, chicken, -mouse, and chimpanzee. Interspecies alignments allow researchers to compare human genes to similar -genes in other species. The UCSC Genome Browser allows rapid comparisons between species, which can -lead to many different types of new discoveries:
--As we begin to better understand the molecular mechanisms responsible for human disease, entirely -new avenues of treatments will be possible. We are only now getting a first glimmer of the molecular -functions of a healthy human cell or organ, and we are still a long way from understanding the often -subtle and complex ways that these can go awry. Yet knowledge of the human genome puts us on the -brink of a revolution in medicine.
--Rather than relying on trial and error to design and test new drugs, researchers will increasingly -use their knowledge of the molecular causes of diseases to design new, targeted therapies. Research -based on genome studies and new experimental methods like CRISPR, all viewable on the UCSC Genome -Browser, will also form the basis for new diagnoses and therapies for human disease that will -transform the practice of medicine in this century.
--The UCSC Genome Browser supports the latest endeavor of the National Human Genome Research Institute -(NHGRI), a medical sequencing project intended to amass data relating genes to health conditions. -This project sets the stage for the time when it becomes affordable for an individual's genome to be -sequenced. The information obtained will allow estimates of future disease risk and improve the -prevention, diagnosis, and treatment of disease. The project focuses on rare Mendelian disorders, -complex disorders, and normal human variation.
--The practice of medicine will become much more individualized, with therapies tailored to be most -effective given an individual's genetic makeup. Medical tests are already available to identify -individual genetic variations that affect a patient's response to commonly used medications. These -tests can allow doctors to avoid adverse reactions and choose medications appropriate for specific -individuals. Someday we may even be able to repair or replace the disease-causing genes, -re-orchestrating the molecular pathways needed for health.
-