Survey							
                            
		                
		                * Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu (GA Tech) Presented by: Cesar Gutierrez About Me 2  ISYE Senior and CS minor  Graduating December, 2008  Humanitarian Logistics and/or Supply Chain  Originally from Lima, Peru  Travel, paintball and politics Outline 3  Intro. & Motivation  Problem Definition  Important Concepts & Examples  Private Algorithm  Conclusion Introduction    ↓ of information-sharing restrictions due to technology ↑ need for distributed data-mining tools that preserve privacy Accuracy Trade-off Efficiency 4 Privacy Motivating Scenarios   5 CDC needs to study insurance data to detect disease outbreaks  Disease incidents  Disease seriousness  Patient Background Legal/Commercial Problems prevent release of policy holder's information Motivating Scenarios (cont'd)  6 Industrial trade group collaboration  Useful pattern: "manufacturing using chemical supplies from supplier X have high failure rates"  Trade secret: "manufacturing process Y gives low failure rate" Problem & Assumptions  Model: n nodes, horizontal partitioning ...  7 Assume Semi-honesty:  Nodes follow specified protocol  Nodes attempt to learn additional information about other nodes Challenges   8 Why not use a Trusted Third Party (TTP)?  Difficult to find one that is trusted  Increased danger from single point of compromise Why not use secure multi-party computation techniques?  High communication overhead  Feasible for small inputs only Recall Our 3-D Goal Accuracy Efficiency 9 Privacy Private Max   Actual Data sent on first pass start Static Starting Point Known 30 2 1 30 10 40 30 40 20 4 3 40 10 Multi-Round Max  Randomly perturbed data passed to successor during multiple passes Start 18 0 32 35 D2 D2   11 No successor can determine actual data from it's predecessor Randomized Starting Point 30 32 35 10 40 18 20 40 D4 D3 32 35 40 32 35 Evaluation Parameters Parameter Description n # of nodes in the system k KNN parameter Po Initial randomization probability in neighbor selection d Dampening factor in neighbor selection r # of rounds in neighbor selection 12  Large k = "avoid information leaks"  Large d = more randomization = more privacy  Small d = more accurate (deterministic)  Large r = "as accurate as ordinary classifier" Accuracy Results 13 Varying Rounds 14 Privacy Results 15 Conclusion  16 Problems Tackled  Preserving efficiency and accuracy while introducing provable privacy to the system  Improving a naive protocol  Reducing privacy risk in an efficient manner Critique    17 Dependency on other research papers in order to obtain a full understanding Few/No Illustrations A real life example would have created a better understanding of the charts