Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
PROFILE EXCELLENT SCIENCE & INFRASTRUCTURE MINING FOR NATURE’S DATA There has been a lot of media hype surrounding Big Data, but the use of powerful data mining tools requires profound theoretical expertise – at least when it comes to natural sciences IS it true that Big Data makes a revolution both in business and science? In many respects, yes. The phrase ‘Big Data’ most often refers to data building up in social media or electronic registers. When people buy, sell and pay using the internet, they leave behind digital footprints that can be utilised by various business-makers: social scientists say that sorting the data from YouTube might be a method of predicting riots, Google can follow movements of flu epidemics based on the analysis of searched keywords, and so on. Nature never tweets Natural scientists do, of course, utilise Big Data, as well. But in fields like physics, chemistry or atmospheric sciences, data mining is a more complicated tool to use than in business or social sciences. One reason for this is that when it comes to Nature – for example, the atmosphere – the data does not flow into our computers by itself. The molecules of the atmosphere do not gossip on Facebook, they do not sign in to different registers, nor do they answer polls. The same applies to practically all matters of the physical universe. To make Nature reveal something about itself, one of the first things scientists have to do is construct and install various items of equipment and apparatus that can capture a physical phenomenon or certain chemical reaction. The problem is that most often there are no instruments to be used for novel basic research – they have to be planned and designed by the scientists themselves. What is even more, experiment settings cannot be carried out randomly. The scientist always starts their work by evaluating the existing theory: what are they looking for and why? First things first Natural scientists do not even dream of data mining before they have solid theoretical guidelines, installed instruments in the field or in the laboratory, and expertise to turn physical measurement signals, the raw data, into analysable data. This is exactly the way many atmospheric scientists do their everyday work. Atmospheric sciences is an ‘umbrella field’ where physicists, chemists and forest scientists co-operate. Some of them have special skills in data mining, too. Heikki Junninen, PhD, is a physicist working at the University of Helsinki. He unravels the effects that certain molecular processes have on cloud formation. To find out how nanometre-sized particles in the air behave and react with each other, he needs a variety of special measurement apparatus, like mass spectrometers. “I am a data miner, yes, but the most time-consuming part of my work is the substance – developing, installing and running our instruments in the field to make the actual experiments succeed,” emphasises Junninen. When the data finally flows in and Junninen can concentrate on analysing it, he uses factor analysis, where algorithms search for correlations and causalities of a huge amount of data. He may sometimes also resort to so-called ‘self-organising maps’, which are algorithms developed to classify the samples. “Both are very powerful tools in atmospheric sciences, and they will be utilised even more in the future. But they can also be dangerous: if the scientist does not know the substance theory of his research area, algorithms and data methods might turn out to be a ‘black box’ and end up in totally wrong conclusions. “To put this a bit differently, data mining tools are often statistical models that learn from data without teaching the user about the data. This feature is a disadvantage when one wants to learn the system or phenomenon being modelled,” adds Junninen. Observations from SMEAR-stations Although Nature is not revealing its secrets as easily as we humans do, atmospheric scientists are lucky to have a lot of data at their disposal. This happy situation is owed to the excellent observational stations established in Finland, Estonia and China. The observational stations are large complexes of laboratories located in both forest and urban areas measuring the material and energy flows between the atmosphere and the rest of the ecosystem. The first SMEAR-stations have been producing long term data since 1991. At the moment, atmospheric scientists plan to establish several new observational stations in the area of the Nordic Cap. Sounds like Big Data is growing ever bigger for future analysts. Heikki Junninen PhD Research Scientist Division of Atmospheric Sciences Department of Physics University of Helsinki [email protected] www.atm.helsinki.fi Reproduced by kind permission of Pan European Networks Ltd, www.paneuropeannetworks.com © Pan European Networks 2015 www.paneuropeannetworks.com Pan European Networks: Science & Technology 15 79