Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lars Arge Lars Arge 1/12 Massive Data Pervasive use of computers and sensors Increased ability to acquire/store/process data → Massive data collected everywhere Society increasingly “data driven” → Access/process data anywhere any time Nature issue 2/06: “2020 – Future of computing” Trend continues as e.g. sensors increasingly pervasive → Exponential growth of scientific data Paradigm shift: Science will be about mining data → Computer science paramount in all sciences Increased data availability: “nano-technology-like” opportunity Lars Arge 2/12 Algorithm Inadequacy Importance of scalability/efficiency → Algorithmics core computer science area Traditional algorithmics: Transform input to output using simple machine model Inadequate with e.g. Massive data Small/diverse devices Continually arriving data → Software inadequacies! Communities addressing inadequacies have emerged But much work still needs to be done Lars Arge 3/12 Center for Massive Data Algorithmics Established 2007 Initially funded for 5 years by the national research foundation High level objectives: Advance algorithmic knowledge in “massive data” processing area Train researchers in world-leading international environment Be catalyst for multidisciplinary/industry collaboration Building on: Strong international team Establish vibrant international environment (focus on people) Focus areas: I/O-efficient, streaming, cache-oblivious algorithms Algorithm engineering Lars Arge 4/12 I/O-Efficient Algorithms Problems involving massive data on disk track read/write head read/write arm magnetic surface Disk access is 106 times slower than main memory access “The difference in speed between modern large CPU and diskof data Large access time amortized by transferring blocks technologies is analogous difference in of speed in → Important to store/access datato tothe take advantage blocks sharpening a pencil using a sharpener on one’s desk or by taking an airplane to the other side of the world and using a I/O-efficient algorithms: sharpener on someone else’s desk.” (D. Comer) Move as few disk blocks as possible to solve given problem Lars Arge 5/12 I/O-Efficient Algorithms Matters Example: Traversing linked list (List ranking) Array size N = 10 elements Disk block size B = 2 elements Main memory size M = 4 elements (2 blocks) 1 5 2 6 3 8 9 4 7 10 1 2 10 9 5 Algorithm 1: N=10 I/Os 6 3 4 8 7 Algorithm 2: N/B=5 I/Os Difference between N and N/B large since block size is large Example: N = 256 x 106, B = 8000 , 1ms disk access time N I/Os take 256 x 103 sec = 4266 min = 71 hr N/B I/Os take 256/8 sec = 32 sec Lars Arge 6/12 Streaming Algorithms Problems involving truly massive data track read/write head read/write arm magnetic surface Sequential read of disk blocks much faster than random read In many modern (sensor) applications data arrive continually → (Massive) problems often have to be solved in one sequential scan Streaming algorithms: Use single scan, handle each element fast, using small space Lars Arge 7/12 Cache-Oblivious Algorithms Problems to be solved on unknown and/or changing devices Block access important on all levels of memory hierarchy But memory hierarchies are very diverse Cache-oblivious algorithms: Use blocks efficiently on all levels of any hierarchy Lars Arge 8/12 Algorithm Engineering Algorithm engineering: Design/implementation of practical algorithms Experimentation Center motivated by theory inadequacy Center wants to promote interdisciplinary/industry work → Natural to consider algorithm engineering Algorithm engineering work can lead to practical breakthroughs Example: Flow simulation on massive terrain models ~18 billion points at 1 meter (>>1TB) Implementation of I/O-efficient alg. → two weeks to three hours!! Lars Arge 9/12 Center Team International core team of algorithms researchers Including top ranked US and European groups Arge Brodal Mehlhorn Meyer Demaine Indyk AU MIT MPI Leading expertise in focus areas AU: I/O, cache and algorithm engineering MPI: I/O (graph) and algorithm engineering MIT: Cache and streaming Lars Arge 10/12 Center Activities Visits of core researchers Exchange of AU, MPI and MIT Post Docs and PhD students Visiting faculty, Post doc and students from other institutions Frequent summer schools: Streaming algorithms this week! Major international events: 25th Annual Symposium on Computational Geometry in 2009 New conference: Symposium on Algorithms for Massive Datasets Theme Workshops to foster multidisciplinary/industry collaboration Lars Arge 11/12 Initially funded for 5 years by the National Research Foundation Collaboration between three leading international groups Addresses inadequacies of traditional theory When processing massive data When using diverse devices Initial research focus on I/O-efficient, streaming, cache-oblivious Also focus on algorithm engineering and multidisciplinary work Focus on people: Establish vibrant international environment www.madalgo.au.dk Lars Arge 12/12