Download External Memory Algorithms

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
External Memory Algorithms
Saswata Shannigrahi
IIT Guwahati
Saswata Shannigrahi
External Memory Algorithms
Motivation
◮
Geographic Information Systems (GIS)
◮
Bioinformatics
◮
Social Network Analysis
◮
Any other research area that deals with massive datasets
Saswata Shannigrahi
External Memory Algorithms
Memory Hierarchy
◮
Cache (very fast, very expensive, limited storage)
◮
Random access memory (fast, expensive, limited storage)
◮
External memory (slow, cheap, massive storage)
Saswata Shannigrahi
External Memory Algorithms
External Memory Algorithms
◮
At present, we are dealing with datasets in the order of Petabytes.
◮
Such massive amount of data cannot be stored in the RAM.
◮
There are two options: (i) store the data in an external memory
device, (ii) don’t store the data and use streaming algorithms.
◮
In this talk, we would focus on the first approach.
◮
In this approach, the main bottleneck is the time required to do I/O
operations between the RAM and the external memory.
◮
In this talk, we would study the design of I/O-efficient algorithms
for a number of problems.
Saswata Shannigrahi
External Memory Algorithms
Notations
◮
M = The capacity (number of bytes) of the RAM
◮
B = The size (number of bytes) of a block
◮
m = The capacity (number of blocks) of the RAM =
◮
N = The amount of data (number of bytes) stored in the external
memory
◮
n = The amount of data (number of blocks) stored in the external
memory = N
B
Saswata Shannigrahi
External Memory Algorithms
M
B
Summary of the Talk
◮
Sorting: merge sort and distribution sort
◮
Graph algorithms: list-ranking, time forward processing, maximal
independent set
◮
I/O efficient data structures: B-tree, B + -tree and buffer tree
◮
Lower bound on sorting
◮
Parallelism with multiple disks
Saswata Shannigrahi
External Memory Algorithms
External Memory Merge Sort
◮
Step 1: Store O(m) blocks in internal memory, sort them and store
these blocks in a run in the external memory. Repeat this O( mn )
times and create O( mn ) runs.
◮
Step 2: Take R sorted runs and merge them in the internal memory.
Note that R can be at most m − 1.
◮
Step 3: Repeat Step 2 till there is only one run left.
Saswata Shannigrahi
External Memory Algorithms
Analysis of Merge Sort
◮
Step 1 requires O(n) I/O operations.
◮
Step 2 is repeated O(logm n) times, and it performs O(n) I/O
operations in each step.
◮
Total number of I/O operations = O(n logm n)
◮
This is significantly better than the traditional merge sort, which
requires O(n log2 n) I/O operations.
Saswata Shannigrahi
External Memory Algorithms
Distribution Sort
◮
Step 1: Read the elements stored in the external memory, and find
S − 1 = Θ(m) partitioning elements.
◮
Step 2: Partition the input into S buckets of roughly equal size,
such that any element in bucket i is greater than or equal to the
(i − 1)-th partitioning element and lesser than or equal to the i-th
partitioning element.
◮
Step 3: Recursively apply Step 1 and Step 2 till the size of a bucket
becomes less than or equal to M.
Saswata Shannigrahi
External Memory Algorithms
Analysis of Distribution Sort
◮
There is no algorithm known for finding m partitioning elements
using O(n)√I/O operations. However, we would study an algorithm
that finds m partitioning elements using O(n) I/O operations.
◮
The number of I/O operations to implement the distribution sort is
therefore O(n log√m n) = O(n logm n).
Saswata Shannigrahi
External Memory Algorithms
Algorithm to identify
I/Os
√
m partitioning elements using O(n)
◮
Find the median of N elements using O(N) I/Os.
◮
Find the k th smallest element using O(n) I/Os.
√
Find m partitioning elements using O(n) I/Os.
◮
Saswata Shannigrahi
External Memory Algorithms
I/O-efficient Graph Algorithm: List-ranking problem
◮
Problem statement: Calculate the weighted ranks of the elements
present in a linked list.
◮
The naive algorithm uses Ω(N) I/Os in the worst case.
◮
We will present an I/O-efficient algorithm that requires O(n logm n)
I/Os.
◮
This algorithm uses the algorithm to find the maximal independent
set in a graph as a subroutine.
Saswata Shannigrahi
External Memory Algorithms
Euler Tour Technique
◮
Rooting a tree
◮
Labelling rooted trees
◮
Computing the depth of every vertex in a rooted tree
◮
Computing the number of descendents for every vertex in a rooted
tree
Saswata Shannigrahi
External Memory Algorithms
Time Forward Processing
◮
Helps evaluating a directed acyclic graph (DAG).
◮
The vertices of the directed acyclic graph are assumed to be stored
in a topologically sorted order.
◮
Note: There is no known I/O efficient algorithm for topologically
sorting an arbitrary DAG.
Saswata Shannigrahi
External Memory Algorithms
I/O-efficient Algorithm to Identify the Maximal
Independent Set in an Undirected Graph
◮
Step 1: Direct the edges of an undirected graph from lower
numbered vertices to higher number vertices.
◮
Step 2: Sort the vertices of the graph by the numbers and the edges
by the number of their sources.
◮
Step 3: For every vertex v of the graph in sorted order, do the
following: If no neighbour of v is in the independent set, then add v
to the independent set.
Saswata Shannigrahi
External Memory Algorithms
B-tree
◮
Search using O(logB n) I/Os
◮
Insert using O(logB n) I/Os
◮
Delete using O(logB n) I/Os
Saswata Shannigrahi
External Memory Algorithms
B + -tree
◮
Search using O(logB n) I/Os
◮
Insert using O(logB n) I/Os
◮
Delete using O(logB n) I/Os
◮
Supports range queries.
Saswata Shannigrahi
External Memory Algorithms
Buffer Tree
◮
Built on the concept of (a, b)-trees.
◮
The amortized cost of an insert operation is O( logBm n ).
◮
Can be used to sort N elements using O(n logm n) I/Os.
Saswata Shannigrahi
External Memory Algorithms
Lower Bound on Sorting
◮
Rearranging N elements according to a given permutation requires
Ω(min{N, n logm n}) I/O operations.
◮
Using a similar proof technique, we will show that sorting N
elements requires Ω(n logm n) I/O operations.
◮
The external-memory merge and distribution sort algorithms achieve
this bound.
Saswata Shannigrahi
External Memory Algorithms
Multiple Disks
◮
Disk striping is useful in transforming sequential algorithms to
parallel algorithms.
◮
If there are D disks, the striped format allows N data items to be
input or output using O( Dn ) I/Os.
◮
Using more sophisticated techniques, N elements can be sorted
using O( Dn logm n) I/Os.
Saswata Shannigrahi
External Memory Algorithms
References
J. S. Vitter. Algorithms and Data Structures for External Memory,
Now Publishers, 2008.
N. Zeh. I/O-efficient graph algorithms, Lecture notes from EEF
Summer School on Massive Data Sets, 2002.
L. Arge. The Buffer Tree: A New Technique for Optimal I/O
Algorithms, Proceedings of the fourth International Workshop on
Algorithms and Data Structures (WADS), 1995.
J. Erickson. Buffer Trees, Lecture notes at UIUC, 2002.
Saswata Shannigrahi
External Memory Algorithms
Thank you!
Saswata Shannigrahi
External Memory Algorithms