Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Microsoft SQL Server 2008 R2 Customer Solution Case Study University Uses Business Intelligence Software to Boost Gene Research Overview Country or Region: Scotland Industry: Education Customer Profile The University of Dundee is one of the UK’s leading universities, internationally recognised for its expertise across a range of disciplines, including science, medicine, engineering, and art. Business Situation The university needed a high performance business intelligence solution to help analyse research data on how genes and chromosomes work. Solution The team supported its research with Microsoft SQL Server 2008 R2, Microsoft SQL Server 2008 Analysis Services, and Microsoft SQL Server PowerPivot for Microsoft Excel. Benefits Helps analysis Supports advanced laboratory work Revolutionises research Offers user-friendly toolset Harnesses power of Microsoft technologies “If we created a database to hold all the data from the many hundreds of experiments that are conducted …We could use the database to carry out what we’ve termed ‘super experiments.’” Prof. Angus Lamond, Director of the Wellcome Trust Centre for Gene Regulation and Expression, University of Dundee The University of Dundee needed a high performance computing solution to help analyse data on how genes and chromosomes work. Microsoft has long recognised that business intelligence can be applied to any data, including large volumes of information collected in scientific work. The Wellcome Trust Centre for Gene Regulation and Expression, working with the university’s School of Computing, has built a solution with advanced Microsoft technologies. It is using Microsoft SQL Server 2008 R2, Microsoft SQL Server 2008 Analysis Services, and Microsoft SQL Server PowerPivot for Microsoft Excel. The toolset has been highly effective for the research project in removing the previous data analysis bottleneck and the team is now planning a new range of “super experiments.” These will help scientists understand the relationship between genes and diseases such as diabetes and cancer. Situation “Experiments can generate 600 gigabytes or more of raw data. These values keep increasing as researchers design more complex investigations with multiple time points.” Prof. Angus Lamond, Director of the Wellcome Trust Centre for Gene Regulation and Expression, University of Dundee The University of Dundee in Scotland was founded in 1881 and enjoys a global reputation as a centre for teaching and research. In recent years, its molecular biology, biochemistry, and cell biology research activities have grown to become among the most influential in the United Kingdom (U.K.), recently being awarded a Queen’s Anniversary Prize for drug discovery and development. Among the many advanced research programmes at the university are experiments aimed at studying thousands of human proteins in parallel. Prof. Angus Lamond, Director of the Wellcome Trust Centre for Gene Regulation and Expression at the University of Dundee, says: “Our research is aimed at understanding basic cell mechanisms and how genes and chromosomes work. The entire human genome has now been sequenced, and, with that done, we can start to understand the relationship between genes, inherited disorders, and diseases, including cancer.” In the past, it was usually only possible to study one or two proteins at a time. But the advent of more advanced biochemical techniques, and, in particular, the use of mass spectrometers has changed the scenario. It’s now possible to study tens of thousands of proteins, and their associated genes, in a single experiment. In principle, these experiments are straightforward. Scientists grow human cells in the laboratory and then process them so that their proteins are broken down into fragments for analysis by a mass spectrometer. Prof. Lamond says: “Some experiments can now analyse more than 400,000 fragments from as many as 10,000 proteins. Such large experiments can generate 600 gigabytes or more of raw data. These values keep increasing as researchers design more complex investigations with multiple time points.” While simple in principle, in practice these experiments are complex, expensive to deliver, and require detailed design and management. So, while a given researcher may be focused on studying responses affecting only a couple of proteins in any one experiment, that experiment still generates data for all the many thousands of proteins that are analysed. This has several consequences. Each experiment generates a huge quantity of data that researchers must sort through to find the information they need. By default, the data from each experiment is generated, held and analysed separately. Indeed, until recently, researchers often held the data on their local computers and, when they left the laboratory to work elsewhere, the data went with them. Different researchers are interested in different genes, so it’s not unusual for one researcher to run an experiment and obtain the information they want. Six months later, another researcher who’s interested in a different gene might have to run exactly the same experiment to examine a different part of the output. Duplicating this work is an unnecessary waste of precious research funds. Prof. Lamond saw that the collection and analysis of data was limiting scientific progress. “If we created a database to hold all the data from the many hundreds of experiments that are conducted over the years, we would, of course, preserve the data and reuse it,” he says. “But we would also be able to combine the data from multiple experiments and extract information that simply isn’t available in the individual experiments considered in isolation. We realised that we could use the database to carry out what we’ve termed ‘super experiments.’” Solution By working with the Dean of the School of Computing Prof. Peter Gregor, Prof. Lamond tapped into a considerable pool of business intelligence expertise. The “PowerPivot “The progressis we’ve also made in creating valuable for interacting PepTracker quickly and has easily been with transformational data. I can formulate for us. It’s allowedand questions us get to extract value from instant answers the data in a that wouldofhave ‘train thought’ been impossible analysis, without in anybeing other way.” dependent on a Yasmeen Ahmad, Computer Scientist, specialist.” Prof. Lamond’s laboratory Yasmeen Ahmad, Computer Scientist, University of Dundee two academics quickly realised that business intelligence techniques could be the key to running super experiments. Prof. Gregor says: “Led by Dr. Mark Whitehorn, we have a very strong focus on business intelligence, which we see as an increasingly important area.” The first step in the project was to create a customised suite of software—called PepTracker—which manages and analyses complex proteomics data. To this was added a data warehouse to store the data from the many experiments. PepTracker was built by Yasmeen Ahmad, a postgraduate student in the Wellcome Centre, who graduated from the School of Computing at the University of Dundee, and is now working as a computer scientist in the team of cell biologists and biochemists in Prof. Lamond’s laboratory. Ahmad says: “The users start by performing an experiment in the laboratory. As well as the data output from the mass spectrometer, we also collect a great deal of metadata about each experiment. Among other things, it includes information about the specific mass spectrometer that was used, the cell line, the time, and the researcher. The data and metadata are entered into PepTracker and then stored on a dedicated database server—the data warehouse.” The Dundee research team chose Microsoft SQL Server 2008 R2 for its experiments because it offered features that were highly applicable to their work. Biomedical researchers were already using Microsoft SQL Server 2008 Analysis Services to help understand the proteome—the set of expressed proteins in an organism—and have recently added Microsoft SQL Server PowerPivot for Microsoft Excel to their business intelligence suite of applications. PowerPivot for Excel gives users the power to create compelling self-service business intelligence solutions. It supports sharing and collaboration in a Microsoft SharePoint Server 2010 environment, and helps researchers increase operational efficiencies through SQL Server 2008 R2–based management tools. Ahmed says: “The progress we’ve made in creating PepTracker has been transformational for us. It’s allowed us to extract value from the data that would have been impossible in any other way.” That done, the team turned its attention to the multidimensional analysis of super experiments. Dr. Whitehorn says: “The metadata collected made it straightforward to perform the super experiments that Prof. Lamond had envisaged. We can combine the data outputs from hundreds of independent experiments and compare, for example, the accuracy of the different mass spectrometers. We could also use the combined data from multiple experiments to filter out the background ‘noise,’ which in turn is helping us extract even more information.” Benefits The use of Microsoft business intelligence software has been highly effective in the University of Dundee research project in removing the previous data analysis bottleneck. The team is now planning a new range of super experiments. They intend to run a series of experiments at different stages in the life cycle of the cell to help them track the changes that occur in the proteome. Speed of Development Helps Advance Data Analysis One factor that surprised the researchers was the speed of development of this analytical phase due to the Microsoft business intelligence technology, which they consider is as applicable to the academic environment as it is to commerce. Ahmad says: “Our initial meeting was in August 2009, when we first discussed the “The use of business intelligence technology has revolutionised the way we analyse proteomics data. Multidimensional analysis allows us to compare and integrate many independent experiments." Dr. Séverine Boulon, Post-Doctoral Fellow in the Wellcome Trust Centre, University of Dundee project with the researchers. A lot of time was spent with the users, creating a logical model, and deciding on the measures and dimensions that would be useful in a multidimensional analysis. “From there we created the online analytical processing (OLAP) cube in SQL Server 2008 Analysis Services and were able to see some interesting results, which were written up and submitted to the Molecular and Cellular Proteomic journal in just over two months. The paper was published in December 2009, and has already created considerable international interest in the application of business intelligence to proteomics.” PowerPivot Feature Makes Research Easier for Developers PowerPivot for Excel is proving invaluable in continuing this work. It makes life much easier for developers. Ahmad says: “As a developer, I really like features in PowerPivot such as the ability to create relationships between tables. You can just add another tab of data, create a relationship, and start graphing the data, conducting comparisons between different datasets. In addition, the ability to connect to OLAP cubes is vital. I can set up cubes and give the connection details to users and they can connect and look at the data immediately.” Business Intelligence Revolutionises Data Analysis of Proteomics The university users find Microsoft business intelligence tools remarkably flexible. Dr. Séverine Boulon, PostDoctoral Fellow in the Wellcome Trust Centre, University of Dundee, says: “The use of business intelligence technology has revolutionised the way we analyse proteomics data. Multidimensional analysis allows us to compare and integrate many independent experiments. However, until now, business intelligence analysis has been the exclusive domain of developers. The great thing about PowerPivot for Excel is that it helps me carry out my own multidimensional analysis using a familiar and intuitive environment.” Dr. Boulon adds: “PowerPivot is also valuable for interacting quickly and easily with data. I can formulate questions and get instant answers in a ‘train of thought’ analysis, without being dependent on a specialist. Using PepTracker and business intelligence technology, we have uncovered trends in specific types of experiments—for example, in discriminating genuine protein interaction partners from the ‘experimental noise.’” User-Friendly Toolset Helps Researchers Analyse Data A major factor in user adoption of the toolset is the fact that PowerPivot is part of Excel. Ahmad says: “The researchers use Excel on a daily basis, it’s the tool of choice for them because they find it intuitive and easy to use. They can start charting data in the way that they’re used to doing already. In the future, I envisage we’ll create more multidimensional structures, holding different types of data and also bringing in more data from external sources. And PowerPivot will play a vital part in helping my users conduct that kind of analysis. It will help them connect to the cubes, with no development time needed to generate custom graphical user interfaces.” Universities Recognise the Power of Microsoft Technologies for Research The word “business” in business intelligence implies that it is only applicable to commercial data. However, Microsoft has long recognised that business intelligence serves any data, including scientific work. David HobbsMallyon, SQL Server Product Manager at Microsoft, says: “We’re always delighted when scientists choose Microsoft business intelligence products. Scientists tend to make innovative demands on the technology, which is great for us and for the products.” For More Information For more information about Microsoft products and services, call the Microsoft Sales Information Center at (800) 426-9400. In Canada, call the Microsoft Canada Information Centre at (877) 568-2495. Customers who are deaf or hard-of-hearing can reach Microsoft text telephone (TTY/TDD) services at (800) 892-5234 in the United States or (905) 568-9641 in Canada. Outside the 50 United States and Canada, please contact your local Microsoft subsidiary. To access information using the World Wide Web, go to: www.microsoft.com Microsoft Server Product Portfolio Significantly, a research group at Cambridge University used SQL Server 2005 to help rewrite the history of how Charles Darwin developed the theory of evolution. The same group used the spatial mapping capabilities in the Community Technology Preview of SQL Server 2008 to uncover more about Darwin’s pioneering work. For more information about the Microsoft server product portfolio, go to: www.microsoft.com/servers/default.mspx For more information about the University of Dundee, call or visit the Web site at: www.dundee.ac.uk Software and Services This case study is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Document published May 2010 Microsoft Server Product Portfolio − Microsoft SQL Server 2008 R2 − Microsoft SharePoint Server 2010 Technologies − Microsoft SQL Server PowerPivot for Microsoft Excel − Microsoft SQL Server 2008 Analysis Services