Download Profile in Pan European Networks Science and Technology – Issue 15

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

3D optical data storage wikipedia , lookup

Transcript
PROFILE
EXCELLENT SCIENCE & INFRASTRUCTURE
MINING FOR NATURE’S DATA
There has been a lot of media hype surrounding Big Data, but the use
of powerful data mining tools requires profound theoretical expertise –
at least when it comes to natural sciences
IS
it true that Big Data makes a revolution both in
business and science? In many respects, yes. The
phrase ‘Big Data’ most often refers to data building up
in social media or electronic registers. When people buy, sell and
pay using the internet, they leave behind digital footprints that can
be utilised by various business-makers: social scientists say that
sorting the data from YouTube might be a method of predicting
riots, Google can follow movements of flu epidemics based on the
analysis of searched keywords, and so on.
Nature never tweets
Natural scientists do, of course, utilise Big Data, as well. But in
fields like physics, chemistry or atmospheric sciences, data
mining is a more complicated tool to use than in business or
social sciences.
One reason for this is that when it comes to Nature – for example,
the atmosphere – the data does not flow into our computers by
itself. The molecules of the atmosphere do not gossip on
Facebook, they do not sign in to different registers, nor do they
answer polls. The same applies to practically all matters of the
physical universe.
To make Nature reveal something about itself, one of the first
things scientists have to do is construct and install various items
of equipment and apparatus that can capture a physical
phenomenon or certain chemical reaction. The problem is that
most often there are no instruments to be used for novel basic
research – they have to be planned and designed by the
scientists themselves.
What is even more, experiment settings cannot be carried out
randomly. The scientist always starts their work by evaluating the
existing theory: what are they looking for and why?
First things first
Natural scientists do not even dream of data mining before they
have solid theoretical guidelines, installed instruments in the field
or in the laboratory, and expertise to turn physical measurement
signals, the raw data, into analysable data.
This is exactly the way many atmospheric scientists do their
everyday work. Atmospheric sciences is an ‘umbrella field’ where
physicists, chemists and forest scientists co-operate. Some of
them have special skills in data mining, too.
Heikki Junninen, PhD, is a physicist working at the University of
Helsinki. He unravels the effects that certain molecular
processes have on cloud formation. To find out how
nanometre-sized particles in the air behave and react with
each other, he needs a variety of special measurement
apparatus, like mass spectrometers.
“I am a data miner, yes, but the most time-consuming part of my
work is the substance – developing, installing and running our
instruments in the field to make the actual experiments succeed,”
emphasises Junninen.
When the data finally flows in and Junninen can concentrate on
analysing it, he uses factor analysis, where algorithms search for
correlations and causalities of a huge amount of data. He may
sometimes also resort to so-called ‘self-organising maps’, which
are algorithms developed to classify the samples.
“Both are very powerful tools in atmospheric sciences, and they
will be utilised even more in the future. But they can also be
dangerous: if the scientist does not know the substance theory of
his research area, algorithms and data methods might turn out to
be a ‘black box’ and end up in totally wrong conclusions.
“To put this a bit differently, data mining tools are often statistical
models that learn from data without teaching the user about the
data. This feature is a disadvantage when one wants to learn the
system or phenomenon being modelled,” adds Junninen.
Observations from SMEAR-stations
Although Nature is not revealing its secrets as easily as we
humans do, atmospheric scientists are lucky to have a lot of data
at their disposal. This happy situation is owed to the excellent
observational stations established in Finland, Estonia and China.
The observational stations are large complexes of laboratories
located in both forest and urban areas measuring the material
and energy flows between the atmosphere and the rest of the
ecosystem. The first SMEAR-stations have been producing long
term data since 1991.
At the moment, atmospheric scientists plan to establish several
new observational stations in the area of the Nordic Cap.
Sounds like Big Data is growing ever bigger for future analysts.
Heikki Junninen PhD
Research Scientist
Division of Atmospheric Sciences
Department of Physics
University of Helsinki
[email protected]
www.atm.helsinki.fi
Reproduced by kind permission of Pan European Networks Ltd, www.paneuropeannetworks.com
© Pan European Networks 2015
www.paneuropeannetworks.com
Pan European Networks: Science & Technology 15
79