Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data Mining, Data Science, Big Data Data Science Data Science aims to extract insights from large data Less emphasis on algorithms More emphasis on ‘outreach’ Term Data Science is about 10 years old, very popular nowadays Many people reinvent themselves as Data Scientists data miners, statisticians, BI people, analysts, database developers Data Mining & Data Science Data Science Data Mining Statistics Computational methods Dealing with large data Visualisation Involving domain knowledge Interpretable and interpreted results Big Data Because you can… cheap storage Administrative/financial reasons Internet and social computing Internet of Things, ubiquitous computing cost per Gigabyte in dollars $1,000,000 $10,000 $100 $1 $0.01 1980 1990 2000 2010 Cheap Storage 1956, IBM 350, 5 Mb 90 Tb Big Data Many facets, often people focus on only one Very, very large data CERN, Google, Facebook, Twitter, … Analytics Internet-generated Social data Heterogeneous, unstructured data Large-scale technologies MapReduce, Hadoop Size-complexity trade-off Technological restrictions produce a trade-off Many Big Data projects algorithmically not so complex Embarrassingly parallel size CERN complexity