Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Anti-Persistence or History Independent Data Structures Moni Naor Weizmann Institute Vanessa Teague Stanford Why hide your history? Core dumps Losing your laptop – Emailing files – The entire memory representation of data structures is exposed The editing history may be exposed (e.g. Word) Maintaining lists of people – Sports teams, party invitees Making sure that nobody learns from history A data structure has: – – A “legitimate” interface: the set of operations allowed to be performed on it A memory representation The memory representation should reveal no information that cannot be obtained from the “legitimate” interface History of history independence Issue dealt with in Cryptographic and Data Structures communities Micciancio (1997): history independent trees – – – Motivation: incremental crypto Based on the “shape” of the data structure, not including memory representation Stronger performance model! Uniquely represented data structures – – Treaps (Seidel & Aragon), uniquely represented dictionaries Ordered hash tables (Amble & Knuth 1974) More History Persistent Data Structures: possible to reconstruct all previous states of the data structure (Sarnak and Tarjan) – We want the opposite: anti-persistence Oblivious RAM (Goldreich and Ostrovsky) Overview Definitions History independent open addressing hashing History independent dynamic perfect hashing – Memory Management (Union Find) Open problems Precise Definitions A data structure is – – history independent: if any two sequences of operations S1 and S2 that yield the same content induce the same probability distribution on memory representation strongly history independent: if given any two sets of breakpoints along S1 and S2 s.t. corresponding points have identical contents, S1 and S2 induce the same probability distributions on memory representation at those points Relaxations Statistical closeness Computational indistinguishability – Example where helpful: erasing Allow some information to be leaked – Total number of operations – n-history independent: identical distributions if the last n operations where identical as well Under-defined data structures: same query can yield several legitimate answers, – – e.g. approximate priority queue Define identical content: no suffix T such that set of permitted results returned by S1 T is different from the one returned by S2 T History independence is easy (sort of) If it is possible to decide the (lexicographically) “first” sequence of operations that produce a certain contents, just store the result of that This gives a history independent version of a huge class of data structures Efficiency is the problem… Dictionaries Operations are insert(x), lookup(x) and possibly delete(x) The content of a dictionary is the set of elements currently inserted (those that have been inserted but not deleted) Elements x U some universe Size of table/memory N Goal Find a history independent implementation of dictionaries with good provable performance. Develop general techniques for history independence Approaches Unique representation – – e.g. array in sorted order Yields strong history independence Secret randomness – – e.g. array in random order only history independence Open addressing: traditional version Each element x has a probe sequence h1(x), h2(x), h3(x), ... – – – Element is inserted into the first free space in its probe sequence – Linear probing: h2(x) = h1(x)+1, h3(x) = h1(x)+2, ... Double hashing Uniform hashing Search ends unsuccessfully at a free space Efficient space utilization – Almost all the table can be full Open addressing: traditional version Not history independent because later-inserted elements move further along in their probe sequence y x x arrived before y, so move y y y No clash, so insert y History independent version At each cell i, decide elements’ priorities independently of insertion order Call the priority function pi(x,y). If there is a clash, move the element of lower priority At each cell, priorities must form a total order Insertion y xy x x p2(x,y)? No, so move x Search Same as in the traditional algorithm In unsuccessful search, can quit as soon as you find a lower-priority element No deletions Problematic in open addressing anyway Strong history independence Claim: For all hash functions and priority functions, the final configuration of the table is independent of the order of insertion. Conclusion: Strongly history independent Proof of history independence A static insertion algorithm (clearly history independent): Gather up the rejects and restart x2 x1 x2 p1(x2,x1) so insert x2 x6x5x4 x45 pinsert 3(x4,xx5) and p3(x4,x6). Insert x54 and remove x5 x1 x4 x x1 x1 x3 x6 x4 x5 x2 x3 6 x3 p1(x6,x4) and p6(x3,x6), so insert x3 Proof of history independence Nothing moves further in the static algorithm than in the dynamic one – Vice versa – By induction on rounds of the static alg. By induction on the steps in the dynamic alg. Strongly history independent Some priority functions Global – Random – A single priority independent of cell Choose a random order at each cell Youth-rules – Call an element “younger” if it has moved less far along its probe sequence; younger elements get higher priority Youth-rules y p2(x,y) because x has taken fewer steps than y x y y x Use a tie-breaker if steps are equal This is a priority function Specifying a scheme Priority rule – – Choice of priority functions In Youth-rules – determined by probe sequence Probe functions – – – How are they chosen Maintained Computed Implementing Youth-rules Let each hi be chosen from a pair-wise independent collection – For any two x and y the r.v. hi(x) and hi(y) are uniform and independent. Let h1, h2, h3,… be chosen independently Example: hi(x) = (ai ·x mod U) + bi mod N Space: 2 elements per function Need only log N functions Performance Analysis Based on worst-case insertion sequence The important parameter: - the fraction of the table that is used ·N elements Analysis of expected insertion time and search time (number of probes to the table) – Have to distinguish successful and unsuccessful search Analysis via the Static Algorithm For insertions, the total number of probes in static and dynamic algorithm are identical – Easier to analyze the static algorithm Key point for Youth-rules: in the phase i all unsettled elements are in the ith probe in their sequence – Assures fresh randomness of hi (x) Performance For Youth-rules, implemented as specified: For any sequence of insertion the expected probe-time for insertion is at most 1/(1-) For any sequence of insertion the expected probe-time for successful or unsuccessful search is at most 1/(1-) Analysis based on static algorithm is the fraction of the table that is used Comparison to double hashing Analysis of double hashing with truly random functions [Guibas & Szemeredi, Lueker & Molodowitch] Can be replaced by log n wise independent functions (Schmidt & Siegel) log n wise independent is relatively expensive: – either a lot of space or log n time Youth-rules is a simple and provably efficient scheme with very little extra storage Extra benefit of considering history independence Other Priority Functions [Amble & Knuth] log(1/(1-)) for global – Truly random hash functions Experiments show about log(1/(1-)) for most priority functions tried Other types of data structures Memory management (dealing with pointers) – Memory Allocation Other state-related issues Dynamic perfect hashing: FKS scheme, dynamized n elements to be inserted x1 x3 x6 x5 x4 x2 Top-level table: O(n) space h0 s0 h1 h s1 sk Low-level tables: O(n) space total. Each gets about si2 hk The hi are perfect on their respective sets. Rechoose h or some hi to maintain perfection and linear space. A subtle problem: the intersection bias problem Suppose we have: – – – Keep a “current” h as states change Change h only if it is no longer “good”. – a set of states {1, 2, ...} a set of objects {h1, h2, ...} a way to decide whether hi is “good” for j. Choose uniformly from the “good” ones for . Then this is not history independent – h is biased towards the intersection of those good for current and for previous states. Dynamized FKS is not history independent Does not erase upon deletion Uses history-dependent memory allocation Hash functions (h, h1, h2, ...) are changed whenever they cease to be “good” – – Hence they suffer from the intersection bias problem, since they are biased towards functions that were “good” for previous sets of elements Hence they leak information about past sets of elements Making it history independent Use history independent memory allocation Upon deletion, erase the element and rechoose the appropriate hi. This solves the low-level intersection bias problem. Some other minor changes Solve the top-level intersection bias problem... Solving the top-level intersection bias problem Can’t afford a top-level rehash on every deletion Generate two “potential h”s 1 and 2 at the beginning – – – Always use the first “good” one If neither are good, rehash at every deletion If not using 1, keep a top-level table for it for easy “goodness” checking (likewise for 2) Proof of history independence Table’s state is defined by: The current set of elements Top-level hash functions – Low-level hash functions – Uniformly chosen from perfect functions Arrangement of sub-tables in memory – Always the first “good” i, or rechosen each step Use history-independent memory allocation Some other history independent things Performance Lookup takes two steps Insertion and deletion take expected amortized O(1) time – There is a 1/poly chance that they will take more Open Problems Better analysis for youth-rules as well as other priority functions with no random oracles. Efficient memory allocation – ours is O(s log s) Separations – – Between strong and weak history independence Between history independent and traditional versions e.g. for union find Can persistence and (computational) history independence co-exist efficiently? Conclusion History independence can be subtle We have two history independent hash tables: – Based on open addressing – Very space efficient but no deletion Dynamic perfect hashing Allows deletion, constant-time lookup Open addressing: implementing hash functions For all i, generate random independent ai, bi hi(x) = (aix mod U + bi) mod N – – U : size of universe; prime N : size of hash table x’s probe sequence is h1(x), h2(x), h3(x), ... We need log n hash functions – n is the number of elements in the table