Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
<Information and Data Privacy: An Indian Perspective> Policy Brief There have been considerable concerns in the developed countries over the issue of using a customer's personal information or data for intrusive and malicious purposes. In developing countries like India the issue of information and data privacy as it is related to individual customers has not been of much importance primarily because of lack of awareness among the general consumers, law enforcement agencies and the various organizations with whom the consumer has to interact and also because the concept of privacy is somewhat differently perceived in most developing countries like India as compared to the developed western countries. With the recent advances in the field of Data Mining it is now possible for an individual to use data sometimes freely available on the web to extract certain patterns or information about a consumer which can be used by some organizations to discriminate against the particular customer. Therefore the issue of preserving an individual customer’s privacy while using Data Mining techniques to extract useful and meaningful information from customer data has become even more significant. In this paper we look at the existing or stated privacy policies of some leading companies operating in India in the telecom, banking and insurance sectors. We then introduce the concept of Privacy Preserving Data Mining (PPDM) and describe the main approaches that are prevalent. Finally we suggest a framework to suggest which PPDM method may be applied in which domain. Key Recommendations/Findings Findings 1> In the telecommunications domain Vodafone Essar is the only company that emphasizes on the issue of sharing the customers’ information outside India. Findings 2> In the banking sector we find only State Bank of India has a policy on how to limit access to customer information by their employees. On the other hand HDFC bank’s privacy policy does not allow it to share customers confidential information to protect its own interests (as mentioned by ICICI bank) but only as required by law. Findings 3> In the insurance sector LIC’s privacy policy states that LIC may collect unnamed statistics, which do not personally identify the user and LIC reserves the right to perform statistical analyses but will provide only aggregated data from these analyses to third parties. ICICI Lombard’s policy mentions that the log files are analyzed such that individual user is not identified while HDFC Standard Life’s policy retains the right to share aggregated non-personally identifiable information with third parties. Recommendation 1> For the telecom domain we suggest Data Transformation/randomization under the Privacy Preserving Data Mining (PPDM) approach Recommendation 2> For the banking sector we suggest secure multiparty computation as the best suited method under PPDM related methods. Recommendation 3> For the insurance sector we suggest vertically partitioning the Data to ensure that personal data that identifies a person uniquely and their medical history are stored separately and can't be brought together. This can be followed by a simple Data transformation of the private data for additional security. Justification In the telecommunications domain looking at the policies given by the three companies we find that Vodafone Essar is the only company that emphasizes on the issue of sharing the customers’ information outside India. This is a very important issue in our judgment since the applicability of Indian privacy policies to data that is outside Indian jurisdiction makes the issue completely different. This is an area where the privacy laws in one country may or may not be applicable to other countries and therefore the issue of an Indian customer’s privacy may be governed by laws of a different country where the data is stored. A comparison of the privacy policies in the banking sector shows that HDFC Bank may disclose information about a customer as permitted or required by law only unlike ICICI bank which may disclose the information provided by customers to, “Protect and defend ICICI Bank's or its Affiliates' rights, interests or property”. In other words ICICI bank’s interests seem to be given more importance than the customer’s right to privacy. On the other hand HDFC bank’s privacy policy does not allow it to share customers confidential information to protect its own interests (as mentioned by ICICI bank) but only as required by law. SBI seems to be the only bank with a policy to limit accessibility to customers’ information to the bank employees. LIC’s policy which states that only aggregated data will be given to third parties is one of the main points that we would like to emphasize in the use of data mining techniques on large databases in the three domains that we have chosen to look at. Our objective is to emphasize those data analysis and data mining techniques that can reveal hidden patterns and aggregate behaviours in the data without revealing individual identities. Though LIC does not state explicitly the steps taken by it to protect identification of individual identities it implicitly recognizes the need to protect individual privacy while sharing aggregate information with third parties. In ICICI Lombard’s policy it is mentioned that the log files are analyzed such that individual user is not identified the goal being to analyze overall trends on user movements and demographic information and not revealing the identity of a particular individual. In case of HDFC SL the policy states that it shall not share, rent or sell any of your personally identifiable information provided by you, unless otherwise stated at the time of collection or otherwise. HDFC SL retains the right to share aggregated non-personally identifiable information with third parties outside of the Website for business purposes, to assess website traffic, patterns and other such services. Based on the PPDM methods that we have looked at we now venture our suggestions for the three domains of interest that we have chosen to look at. In the telecom domain the companies primarily collect personal data on calling patterns of customers so that they can target their product recommendations. They also tend to conduct various surveys for planning their business and the customers may give more accurate information if they knew that their privacy would be protected even when data is shared with other companies. For this we propose the Data transformation/ randomization approach as solution. In the banking sector we suggest the second approach i.e. secure multiparty computation. In this approach different parties who own the data stored at several banks agree to disclose the result of certain data mining calculations performed on the joint data which can be horizontally partitioned. The parties use a cryptographic protocol to exchange messages which are encrypted to make some calculations efficient while making other calculations computationally intractable. For example 2 or more banks may share their individual data mining results on ATM frauds without it being possible by individual banks to trace the particular customers or ATMs which were associated with the frauds. Lastly in the insurance sector we tend to work with most sensitive types of private data like health records for example. In many countries the privacy standards in this domain have been protected by law, like HIPAA (Health Insurance Portability and Accountability Act) in the United States (Office for Civil Rights [OCR],2003). In India an initiative in this direction is only recently being taken. Data mining over insurance records particularly medical or health records is important for pharmaceutical companies, insurance companies themselves and also government policy makers. In view of the recent progress in DNA sequences and DNA mapping it should be made mandatory to store the DNA sequences, the personal data of the individual that identifies him/her uniquely and their medical history in different data stores/repositories so that they can not be brought together. Then we can perform PPDM over the vertically partitioned data to calculate the aggregate statistics while keeping the private data intact. As an additional level of security we suggest a simple transformation of the private data before it is made available to third parties for extracting hidden patterns using data mining algorithms. This is important in a country like India with weak data privacy laws to ensure there is no discrimination against an individual when he/she applies for insurance and one way of doing that would be a combination of data transformation and vertical partitioning of the sensitive data as suggested above. <R.P.Datta, [email protected], Indian Inst. Of Foreign Trade, J-1/14, Block EP & GP,Sec-5, Salt Lake City, Kolkata-700091, India >