Download Policy Brief

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
<Information and Data Privacy: An Indian Perspective>
Policy Brief
There have been considerable concerns in the developed countries over the issue of using a customer's personal information or
data for intrusive and malicious purposes. In developing countries like India the issue of information and data privacy as it is
related to individual customers has not been of much importance primarily because of lack of awareness among the general
consumers, law enforcement agencies and the various organizations with whom the consumer has to interact and also because the
concept of privacy is somewhat differently perceived in most developing countries like India as compared to the developed
western countries. With the recent advances in the field of Data Mining it is now possible for an individual to use data sometimes
freely available on the web to extract certain patterns or information about a consumer which can be used by some organizations
to discriminate against the particular customer. Therefore the issue of preserving an individual customer’s privacy while using
Data Mining techniques to extract useful and meaningful information from customer data has become even more significant. In
this paper we look at the existing or stated privacy policies of some leading companies operating in India in the telecom, banking
and insurance sectors. We then introduce the concept of Privacy Preserving Data Mining (PPDM) and describe the main
approaches that are prevalent. Finally we suggest a framework to suggest which PPDM method may be applied in which domain.
Key Recommendations/Findings
Findings 1>
In the telecommunications domain Vodafone Essar is the only company that emphasizes on the issue of sharing the customers’
information outside India.
Findings 2>
In the banking sector we find only State Bank of India has a policy on how to limit access to customer information by their
employees. On the other hand HDFC bank’s privacy policy does not allow it to share customers confidential information to protect its
own interests (as mentioned by ICICI bank) but only as required by law.
Findings 3>
In the insurance sector LIC’s privacy policy states that LIC may collect unnamed statistics, which do not personally identify the user
and LIC reserves the right to perform statistical analyses but will provide only aggregated data from these analyses to third parties.
ICICI Lombard’s policy mentions that the log files are analyzed such that individual user is not identified while HDFC Standard
Life’s policy retains the right to share aggregated non-personally identifiable information with third parties.
Recommendation 1>
For the telecom domain we suggest Data Transformation/randomization under the Privacy Preserving Data Mining (PPDM) approach
Recommendation 2>
For the banking sector we suggest secure multiparty computation as the best suited method under PPDM related methods.
Recommendation 3>
For the insurance sector we suggest vertically partitioning the Data to ensure that personal data that identifies a person uniquely and
their medical history are stored separately and can't be brought together. This can be followed by a simple Data transformation of the
private data for additional security.
Justification
In the telecommunications domain looking at the policies given by the three companies we find that Vodafone Essar is the only
company that emphasizes on the issue of sharing the customers’ information outside India. This is a very important issue in our
judgment since the applicability of Indian privacy policies to data that is outside Indian jurisdiction makes the issue completely
different. This is an area where the privacy laws in one country may or may not be applicable to other countries and therefore the issue
of an Indian customer’s privacy may be governed by laws of a different country where the data is stored.
A comparison of the privacy policies in the banking sector shows that HDFC Bank may disclose information about a customer as
permitted or required by law only unlike ICICI bank which may disclose the information provided by customers to, “Protect and
defend ICICI Bank's or its Affiliates' rights, interests or property”. In other words ICICI bank’s interests seem to be given more
importance than the customer’s right to privacy. On the other hand HDFC bank’s privacy policy does not allow it to share customers
confidential information to protect its own interests (as mentioned by ICICI bank) but only as required by law. SBI seems to be the
only bank with a policy to limit accessibility to customers’ information to the bank employees.
LIC’s policy which states that only aggregated data will be given to third parties is one of the main points that we would like to
emphasize in the use of data mining techniques on large databases in the three domains that we have chosen to look at. Our objective
is to emphasize those data analysis and data mining techniques that can reveal hidden patterns and aggregate behaviours in the data
without revealing individual identities. Though LIC does not state explicitly the steps taken by it to protect identification of individual
identities it implicitly recognizes the need to protect individual privacy while sharing aggregate information with third parties. In
ICICI Lombard’s policy it is mentioned that the log files are analyzed such that individual user is not identified the goal being to
analyze overall trends on user movements and demographic information and not revealing the identity of a particular individual. In
case of HDFC SL the policy states that it shall not share, rent or sell any of your personally identifiable information provided by you,
unless otherwise stated at the time of collection or otherwise. HDFC SL retains the right to share aggregated non-personally
identifiable information with third parties outside of the Website for business purposes, to assess website traffic, patterns and other
such services.
Based on the PPDM methods that we have looked at we now venture our suggestions for the three domains of interest that we have
chosen to look at. In the telecom domain the companies primarily collect personal data on calling patterns of customers so that they
can target their product recommendations. They also tend to conduct various surveys for planning their business and the customers
may give more accurate information if they knew that their privacy would be protected even when data is shared with other
companies. For this we propose the Data transformation/ randomization approach as solution.
In the banking sector we suggest the second approach i.e. secure multiparty computation. In this approach different parties who own
the data stored at several banks agree to disclose the result of certain data mining calculations performed on the joint data which can
be horizontally partitioned. The parties use a cryptographic protocol to exchange messages which are encrypted to make some
calculations efficient while making other calculations computationally intractable. For example 2 or more banks may share their
individual data mining results on ATM frauds without it being possible by individual banks to trace the particular customers or ATMs
which were associated with the frauds.
Lastly in the insurance sector we tend to work with most sensitive types of private data like health records for example. In many
countries the privacy standards in this domain have been protected by law, like HIPAA (Health Insurance Portability and
Accountability Act) in the United States (Office for Civil Rights [OCR],2003). In India an initiative in this direction is only recently
being taken. Data mining over insurance records particularly medical or health records is important for pharmaceutical companies,
insurance companies themselves and also government policy makers. In view of the recent progress in DNA sequences and DNA
mapping it should be made mandatory to store the DNA sequences, the personal data of the individual that identifies him/her uniquely
and their medical history in different data stores/repositories so that they can not be brought together. Then we can perform PPDM
over the vertically partitioned data to calculate the aggregate statistics while keeping the private data intact. As an additional level of
security we suggest a simple transformation of the private data before it is made available to third parties for extracting hidden
patterns using data mining algorithms. This is important in a country like India with weak data privacy laws to ensure there is no
discrimination against an individual when he/she applies for insurance and one way of doing that would be a combination of data
transformation and vertical partitioning of the sensitive data as suggested above.
<R.P.Datta, [email protected], Indian Inst. Of Foreign Trade, J-1/14, Block EP & GP,Sec-5, Salt Lake City, Kolkata-700091, India
>