Download Lars Arge - Department of Computer Science

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

K-means clustering wikipedia , lookup

Transcript
Lars Arge
Lars Arge
1/12
Massive Data
 Pervasive use of computers and sensors
 Increased ability to acquire/store/process data
→ Massive data collected everywhere
 Society increasingly “data driven”
→ Access/process data anywhere any time
Nature issue 2/06: “2020 – Future of computing”
 Trend continues as e.g. sensors increasingly pervasive
→ Exponential growth of scientific data
 Paradigm shift: Science will be about mining data
→ Computer science paramount in all sciences
 Increased data availability: “nano-technology-like” opportunity
Lars Arge
2/12
Algorithm Inadequacy
 Importance of scalability/efficiency
→ Algorithmics core computer science area
 Traditional algorithmics:
Transform input to output using simple machine model
 Inadequate with e.g.
 Massive data
 Small/diverse devices
 Continually arriving data
→ Software inadequacies!
 Communities addressing inadequacies have emerged
 But much work still needs to be done
Lars Arge
3/12
Center for Massive Data Algorithmics
 Established 2007
 Initially funded for 5 years by the national research foundation
 High level objectives:
 Advance algorithmic knowledge in “massive data” processing area
 Train researchers in world-leading international environment
 Be catalyst for multidisciplinary/industry collaboration
 Building on:
 Strong international team
 Establish vibrant international environment (focus on people)
 Focus areas:
 I/O-efficient, streaming, cache-oblivious algorithms
 Algorithm engineering
Lars Arge
4/12
I/O-Efficient Algorithms
 Problems involving massive data on disk
track
read/write head
read/write arm
magnetic surface
 Disk access is 106 times slower than main memory access
“The
difference
in speed between
modern large
CPU and
diskof data
 Large
access
time amortized
by transferring
blocks
technologies
is analogous
difference
in of
speed
in
→ Important
to store/access
datato
tothe
take
advantage
blocks
sharpening a pencil using a sharpener on one’s desk or by
taking an airplane to the other side of the world and using a
 I/O-efficient
algorithms:
sharpener
on someone else’s desk.” (D. Comer)
 Move as few disk blocks as possible to solve given problem
Lars Arge
5/12
I/O-Efficient Algorithms Matters
 Example: Traversing linked list (List ranking)
 Array size N = 10 elements
 Disk block size B = 2 elements
 Main memory size M = 4 elements (2 blocks)
1 5 2 6 3
8 9 4 7 10
1 2 10 9 5
Algorithm 1: N=10 I/Os
6 3 4 8 7
Algorithm 2: N/B=5 I/Os
 Difference between N and N/B large since block size is large
 Example: N = 256 x 106, B = 8000 , 1ms disk access time
 N I/Os take 256 x 103 sec = 4266 min = 71 hr
 N/B I/Os take 256/8 sec = 32 sec
Lars Arge
6/12
Streaming Algorithms
 Problems involving truly massive data
track
read/write head
read/write arm
magnetic surface
 Sequential read of disk blocks much faster than random read
 In many modern (sensor) applications data arrive continually
→ (Massive) problems often have to be solved in one sequential scan
 Streaming algorithms:
 Use single scan, handle each element fast, using small space
Lars Arge
7/12
Cache-Oblivious Algorithms
 Problems to be solved on unknown and/or changing devices
 Block access important on all levels of memory hierarchy
 But memory hierarchies are very diverse
 Cache-oblivious algorithms:
 Use blocks efficiently on all levels of any hierarchy
Lars Arge
8/12
Algorithm Engineering
 Algorithm engineering:
 Design/implementation of practical algorithms
 Experimentation
 Center motivated by theory inadequacy
 Center wants to promote interdisciplinary/industry work
→ Natural to consider algorithm engineering
 Algorithm engineering work can lead to practical breakthroughs
 Example: Flow simulation on massive terrain models
~18 billion points at 1 meter (>>1TB)
Implementation of I/O-efficient alg.
→ two weeks to three hours!!
Lars Arge
9/12
Center Team
 International core team of algorithms researchers
 Including top ranked US and European groups
Arge
Brodal
Mehlhorn
Meyer
Demaine
Indyk
AU
MIT
MPI
 Leading expertise in focus areas
 AU: I/O, cache and algorithm engineering
 MPI: I/O (graph) and algorithm engineering
 MIT: Cache and streaming
Lars Arge
10/12
Center Activities
 Visits of core researchers
 Exchange of AU, MPI and MIT Post Docs and PhD students
 Visiting faculty, Post doc and students from other institutions
 Frequent summer schools: Streaming algorithms this week!
 Major international events:
 25th Annual Symposium on Computational Geometry in 2009
 New conference: Symposium on Algorithms for Massive Datasets
 Theme Workshops to foster multidisciplinary/industry collaboration
Lars Arge
11/12
 Initially funded for 5 years by the National Research Foundation
 Collaboration between three leading international groups
 Addresses inadequacies of traditional theory
 When processing massive data
 When using diverse devices
 Initial research focus on I/O-efficient, streaming, cache-oblivious
 Also focus on algorithm engineering and multidisciplinary work
 Focus on people: Establish vibrant international environment
www.madalgo.au.dk
Lars Arge
12/12