Download Microsoft SQL Server 2008 R2 Customer Solution Case Study

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Microsoft SQL Server 2008 R2
Customer Solution Case Study
University Uses Business Intelligence
Software to Boost Gene Research
Overview
Country or Region: Scotland
Industry: Education
Customer Profile
The University of Dundee is one of
the UK’s leading universities,
internationally recognised for its
expertise across a range of
disciplines, including science,
medicine, engineering, and art.
Business Situation
The university needed a high
performance business intelligence
solution to help analyse research data
on how genes and chromosomes
work.
Solution
The team supported its research with
Microsoft SQL Server 2008 R2,
Microsoft SQL Server 2008 Analysis
Services, and Microsoft SQL Server
PowerPivot for Microsoft Excel.
Benefits
 Helps analysis
 Supports advanced laboratory work
 Revolutionises research
 Offers user-friendly toolset
 Harnesses power of Microsoft
technologies
“If we created a database to hold all the data from the
many hundreds of experiments that are conducted
…We could use the database to carry out what we’ve
termed ‘super experiments.’”
Prof. Angus Lamond, Director of the Wellcome Trust Centre for Gene Regulation and
Expression, University of Dundee
The University of Dundee needed a high performance
computing solution to help analyse data on how genes and
chromosomes work. Microsoft has long recognised that
business intelligence can be applied to any data, including large
volumes of information collected in scientific work. The
Wellcome Trust Centre for Gene Regulation and Expression,
working with the university’s School of Computing, has built a
solution with advanced Microsoft technologies. It is using
Microsoft SQL Server 2008 R2, Microsoft SQL Server 2008
Analysis Services, and Microsoft SQL Server PowerPivot for
Microsoft Excel. The toolset has been highly effective for the
research project in removing the previous data analysis
bottleneck and the team is now planning a new range of “super
experiments.” These will help scientists understand the
relationship between genes and diseases such as diabetes and
cancer.
Situation
“Experiments can
generate 600 gigabytes
or more of raw data.
These values keep
increasing as researchers
design more complex
investigations with
multiple time points.”
Prof. Angus Lamond, Director of the
Wellcome Trust Centre for Gene
Regulation and Expression, University of
Dundee
The University of Dundee in Scotland was
founded in 1881 and enjoys a global
reputation as a centre for teaching and
research. In recent years, its molecular
biology, biochemistry, and cell biology
research activities have grown to become
among the most influential in the United
Kingdom (U.K.), recently being awarded a
Queen’s Anniversary Prize for drug
discovery and development. Among the
many advanced research programmes at
the university are experiments aimed at
studying thousands of human proteins in
parallel. Prof. Angus Lamond, Director of
the Wellcome Trust Centre for Gene
Regulation and Expression at the
University of Dundee, says: “Our research
is aimed at understanding basic cell
mechanisms and how genes and
chromosomes work. The entire human
genome has now been sequenced, and,
with that done, we can start to
understand the relationship between
genes, inherited disorders, and diseases,
including cancer.”
In the past, it was usually only possible to
study one or two proteins at a time. But
the advent of more advanced
biochemical techniques, and, in
particular, the use of mass spectrometers
has changed the scenario. It’s now
possible to study tens of thousands of
proteins, and their associated genes, in a
single experiment. In principle, these
experiments are straightforward.
Scientists grow human cells in the
laboratory and then process them so that
their proteins are broken down into
fragments for analysis by a mass
spectrometer. Prof. Lamond says: “Some
experiments can now analyse more than
400,000 fragments from as many as
10,000 proteins. Such large experiments
can generate 600 gigabytes or more of
raw data. These values keep increasing as
researchers design more complex
investigations with multiple time points.”
While simple in principle, in practice
these experiments are complex,
expensive to deliver, and require detailed
design and management. So, while a
given researcher may be focused on
studying responses affecting only a
couple of proteins in any one
experiment, that experiment still
generates data for all the many
thousands of proteins that are analysed.
This has several consequences.



Each experiment generates a huge
quantity of data that researchers must
sort through to find the information
they need.
By default, the data from each
experiment is generated, held and
analysed separately. Indeed, until
recently, researchers often held the
data on their local computers and,
when they left the laboratory to work
elsewhere, the data went with them.
Different researchers are interested in
different genes, so it’s not unusual for
one researcher to run an experiment
and obtain the information they want.
Six months later, another researcher
who’s interested in a different gene
might have to run exactly the same
experiment to examine a different part
of the output. Duplicating this work is
an unnecessary waste of precious
research funds.
Prof. Lamond saw that the collection and
analysis of data was limiting scientific
progress. “If we created a database to
hold all the data from the many
hundreds of experiments that are
conducted over the years, we would, of
course, preserve the data and reuse it,”
he says. “But we would also be able to
combine the data from multiple
experiments and extract information that
simply isn’t available in the individual
experiments considered in isolation. We
realised that we could use the database
to carry out what we’ve termed ‘super
experiments.’”
Solution
By working with the Dean of the School
of Computing Prof. Peter Gregor, Prof.
Lamond tapped into a considerable pool
of business intelligence expertise. The
“PowerPivot
“The
progressis we’ve
also
made in creating
valuable
for interacting
PepTracker
quickly
and has
easily
been
with
transformational
data.
I can formulate
for us.
It’s allowedand
questions
us get
to extract
value from
instant
answers
the data
in a that
wouldofhave
‘train
thought’
been
impossible
analysis,
without
in anybeing
other
way.”
dependent
on a
Yasmeen Ahmad, Computer Scientist,
specialist.”
Prof. Lamond’s laboratory
Yasmeen Ahmad, Computer Scientist,
University of Dundee
two academics quickly realised that
business intelligence techniques could
be the key to running super experiments.
Prof. Gregor says: “Led by Dr. Mark
Whitehorn, we have a very strong focus
on business intelligence, which we see as
an increasingly important area.”
The first step in the project was to create
a customised suite of software—called
PepTracker—which manages and
analyses complex proteomics data. To
this was added a data warehouse to
store the data from the many
experiments. PepTracker was built by
Yasmeen Ahmad, a postgraduate student
in the Wellcome Centre, who graduated
from the School of Computing at the
University of Dundee, and is now
working as a computer scientist in the
team of cell biologists and biochemists in
Prof. Lamond’s laboratory.
Ahmad says: “The users start by
performing an experiment in the
laboratory. As well as the data output
from the mass spectrometer, we also
collect a great deal of metadata about
each experiment. Among other things, it
includes information about the specific
mass spectrometer that was used, the
cell line, the time, and the researcher.
The data and metadata are entered into
PepTracker and then stored on a
dedicated database server—the data
warehouse.”
The Dundee research team chose
Microsoft SQL Server 2008 R2 for its
experiments because it offered features
that were highly applicable to their work.
Biomedical researchers were already
using Microsoft SQL Server 2008 Analysis
Services to help understand the
proteome—the set of expressed proteins
in an organism—and have recently
added Microsoft SQL Server PowerPivot
for Microsoft Excel to their business
intelligence suite of applications.
PowerPivot for Excel gives users the
power to create compelling self-service
business intelligence solutions. It
supports sharing and collaboration in a
Microsoft SharePoint Server 2010
environment, and helps researchers
increase operational efficiencies through
SQL Server 2008 R2–based management
tools.
Ahmed says: “The progress we’ve made
in creating PepTracker has been
transformational for us. It’s allowed us to
extract value from the data that would
have been impossible in any other way.”
That done, the team turned its attention
to the multidimensional analysis of super
experiments. Dr. Whitehorn says: “The
metadata collected made it
straightforward to perform the super
experiments that Prof. Lamond had
envisaged. We can combine the data
outputs from hundreds of independent
experiments and compare, for example,
the accuracy of the different mass
spectrometers. We could also use the
combined data from multiple
experiments to filter out the background
‘noise,’ which in turn is helping us extract
even more information.”
Benefits
The use of Microsoft business
intelligence software has been highly
effective in the University of Dundee
research project in removing the
previous data analysis bottleneck. The
team is now planning a new range of
super experiments. They intend to run a
series of experiments at different stages
in the life cycle of the cell to help them
track the changes that occur in the
proteome.
Speed of Development Helps Advance
Data Analysis
One factor that surprised the researchers
was the speed of development of this
analytical phase due to the Microsoft
business intelligence technology, which
they consider is as applicable to the
academic environment as it is to
commerce.
Ahmad says: “Our initial meeting was in
August 2009, when we first discussed the
“The use of business
intelligence technology
has revolutionised the
way we analyse
proteomics data.
Multidimensional
analysis allows us to
compare and integrate
many independent
experiments."
Dr. Séverine Boulon, Post-Doctoral Fellow
in the Wellcome Trust Centre, University
of Dundee
project with the researchers. A lot of time
was spent with the users, creating a
logical model, and deciding on the
measures and dimensions that would be
useful in a multidimensional analysis.
“From there we created the online
analytical processing (OLAP) cube in SQL
Server 2008 Analysis Services and were
able to see some interesting results,
which were written up and submitted to
the Molecular and Cellular Proteomic
journal in just over two months. The
paper was published in December 2009,
and has already created considerable
international interest in the application
of business intelligence to proteomics.”
PowerPivot Feature Makes Research
Easier for Developers
PowerPivot for Excel is proving
invaluable in continuing this work. It
makes life much easier for developers.
Ahmad says: “As a developer, I really like
features in PowerPivot such as the ability
to create relationships between tables.
You can just add another tab of data,
create a relationship, and start graphing
the data, conducting comparisons
between different datasets. In addition,
the ability to connect to OLAP cubes is
vital. I can set up cubes and give the
connection details to users and they can
connect and look at the data
immediately.”
Business Intelligence Revolutionises
Data Analysis of Proteomics
The university users find Microsoft
business intelligence tools remarkably
flexible. Dr. Séverine Boulon, PostDoctoral Fellow in the Wellcome Trust
Centre, University of Dundee, says: “The
use of business intelligence technology
has revolutionised the way we analyse
proteomics data. Multidimensional
analysis allows us to compare and
integrate many independent
experiments. However, until now,
business intelligence analysis has been
the exclusive domain of developers. The
great thing about PowerPivot for Excel is
that it helps me carry out my own
multidimensional analysis using a
familiar and intuitive environment.”
Dr. Boulon adds: “PowerPivot is also
valuable for interacting quickly and easily
with data. I can formulate questions and
get instant answers in a ‘train of thought’
analysis, without being dependent on a
specialist. Using PepTracker and business
intelligence technology, we have
uncovered trends in specific types of
experiments—for example, in
discriminating genuine protein
interaction partners from the
‘experimental noise.’”
User-Friendly Toolset Helps
Researchers Analyse Data
A major factor in user adoption of the
toolset is the fact that PowerPivot is part
of Excel. Ahmad says: “The researchers
use Excel on a daily basis, it’s the tool of
choice for them because they find it
intuitive and easy to use. They can start
charting data in the way that they’re
used to doing already. In the future, I
envisage we’ll create more
multidimensional structures, holding
different types of data and also bringing
in more data from external sources. And
PowerPivot will play a vital part in
helping my users conduct that kind of
analysis. It will help them connect to the
cubes, with no development time
needed to generate custom graphical
user interfaces.”
Universities Recognise the Power of
Microsoft Technologies for Research
The word “business” in business
intelligence implies that it is only
applicable to commercial data. However,
Microsoft has long recognised that
business intelligence serves any data,
including scientific work. David HobbsMallyon, SQL Server Product Manager at
Microsoft, says: “We’re always delighted
when scientists choose Microsoft
business intelligence products. Scientists
tend to make innovative demands on the
technology, which is great for us and for
the products.”
For More Information
For more information about Microsoft
products and services, call the
Microsoft Sales Information Center at
(800) 426-9400. In Canada, call the
Microsoft Canada Information Centre
at (877) 568-2495. Customers who are
deaf or hard-of-hearing can reach
Microsoft text telephone (TTY/TDD)
services at (800) 892-5234 in the
United States or (905) 568-9641 in
Canada. Outside the 50 United States
and Canada, please contact your local
Microsoft subsidiary. To access
information using the World Wide
Web, go to:
www.microsoft.com
Microsoft Server Product Portfolio
Significantly, a research group at
Cambridge University used SQL Server
2005 to help rewrite the history of how
Charles Darwin developed the theory of
evolution. The same group used the
spatial mapping capabilities in the
Community Technology Preview of SQL
Server 2008 to uncover more about
Darwin’s pioneering work.
For more information about the
Microsoft server product portfolio, go to:
www.microsoft.com/servers/default.mspx
For more information about the
University of Dundee, call or visit the
Web site at:
www.dundee.ac.uk
Software and Services

This case study is for informational purposes only.
MICROSOFT MAKES NO WARRANTIES, EXPRESS OR
IMPLIED, IN THIS SUMMARY.
Document published May 2010
Microsoft Server Product Portfolio
− Microsoft SQL Server 2008 R2
− Microsoft SharePoint Server 2010

Technologies
− Microsoft SQL Server PowerPivot for
Microsoft Excel
− Microsoft SQL Server 2008 Analysis
Services