* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download prise 2007
Survey
Document related concepts
Transcript
81,9(56,7¬'(*/,678',',520$ ³6$3,(1=$´ 5DSSRUWR7HFQLFRQ $WWLGHO6HFRQGRZRUNVKRSLWDOLDQRVX 35,YDF\H6(FXULW\ ± 35,6(± 0HUFROHGuJLXJQR 6KHUDWRQ5RPD+RWHO 9LDOH'HO3DWWLQDJJLR 5RPD,WDOLD PREFAZIONE Secondo Workshop Italiano su PRivacy e SEcurity La sicurezza dei dati e delle reti in funzione del suo impatto sul sistema Paese, con particolare riferimento all’economia e alla sicurezza dei cittadini, è oramai diventata un tema centrale nel contesto della moderna Società dell’Informazione e della Comunicazione. In questo quadro si sono moltiplicate in tutto il mondo le iniziative mirate a stimolare attività di ricerca, sviluppo e innovazione nel campo della sicurezza informatica. Gli attori coinvolti in queste iniziative non sono solo le Accademie e gli istituti di ricerca ma anche soggetti privati e pubbliche amministrazioni interessate alla realizzazione di dispositivi e applicazioni che oltre ad innovare i processi produttivi tengano conto dei necessari requisiti di sicurezza. Anche nel nostro Paese, nel corso degli ultimi anni abbiamo assistito al moltiplicarsi di iniziative, tra le più disparate, nel settore. Diversi gruppi di ricerca hanno iniziato ad operare su temi specifici del settore, sono stati avviati Master Universitari sul tema, Corsi di Laurea e numerose realtà aziendali sono impegnate in progetti di ricerca su tematiche centrali o molto contigue a quelle della sicurezza informatica. Dopo il successo dell’edizione 2006, si apre a Roma il 6 Giugno 2007, il Secondo Workshop Italiano di Privacy and Security (PRISE 2007) patrocinato dal Master in Sicurezza informatica e dal Master in Gestione della Sicurezza informatica del Dipartimento di Informatica dell’Università di Roma “Sapienza”, nonché dal CLUSIT e da Infosecurity. Grazie ai numerosi articoli sottomessi, è stato possibile stilare un programma dei lavori che copre le principali tematiche di ricerca della sicurezza dell’informazione. In particolare, gli argomenti affrontati includono: Anonymity and Pseudonymity Applied Cryptography Denial of Service Digital Forensics Electronic Privacy Intrusion Detection Systems Privacy-enhancing technology Peer-to-Peer Security Secure Hardware and Smartcards Wireless network security Non posso che essere soddisfatto dell’accoglienza positiva di PRISE 2007 da parte della comunità italiana, per questo il mio ringraziamento più sincero va a tutti coloro che vi hanno dedicato tempo e lavoro prezioso. Vorrei anche esprimere la mia profonda gratitudine ai membri del comitato programma per il loro aiuto valido e puntuale. Workshop chair Prof. Luigi V. Mancini - Università di Roma “Sapienza” I COMMITTEES Secondo Workshop Italiano su PRivacy e SEcurity WORKSHOP CHAIR • Luigi V. Mancini - Università di Roma "Sapienza" PROGRAMME COMMITTEE • Maurizio Aiello - IEIIT CNR Genova • Cosimo Anglano - Università Piemonte Orientale • Massimo Bernaschi - IAC CNR Roma • Claudio Bettini - Università di Milano • Danilo Bruschi - Università di Milano • Giuseppe Corasaniti - Magistrato • Roberto Di Pietro - Università di Roma Tre • Roberto Gorrieri - Università di Bologna • Pino Italiano - Università di Roma "Tor vergata" • Pino Persiano - Università di Salerno III INDICE Secondo Workshop Italiano su PRivacy e SEcurity Gene Tsudik – University of California, Irvine, USA and Università di Roma “La Sapienza” On Privacy in Critical Internet Services 1 M. Esposito, C. Mazzariello, C. Sansone – Università di Napoli A network traffic Anonymizer 4 S.P. Romano, L. Peluso, F. Oliviero – Università di Napoli REFACING: an Autonomic approach to Network Security based on Multidimensional Trustworthiness 8 A. Dainotti, A. Pescape’, G. Ventre – Università di Napoli Wavelet-based Detection of DoS Attacks 15 A. Botta, A. Dainotti, A. Pescape’, G. Ventre – Università di Napoli Experimental Analysis of Attacks Against Intradomain Routing Protocols 20 E. De Cristofaro, C. Blundo, C. Galdi, G. Persiano – Università di Salerno e Università di Napoli Validating Orchestration of Web Services with BPEL and Aggregate Signatures 25 Ivan Visconti – Dipartimento di Informatica e Applicazioni, Università di Salerno PassePartout Certificates 35 L. Catuogno, C. Galdi – Università di Salerno e Università di Napoli A Graphical PIN Authentication Mechanism for Smart Cards and Low-Cost Devices 46 M. Aiello, D. Avanzini, D. Chiarella, G. Papaleo – CNR IEIIT Genova SMTP sniffing for intrusion detection purposes 53 C. Bettini, S. Mascetti, L. Pareschi – Università di Milano The general problem of privacy in location-based services and some interesting research directions 59 C.A. Visaggio, G. Canfora, E. Costante, I. Pennino – Università di Sannio Bottom up approach to manage data privacy policy through the front end filter paradigm 65 D. Ariu, I. Corona, G. Giacinto, R. Perdisci, F. Roli – Università di Cagliari Intrusion Detection Systems based on Anomaly Detection techniques 73 D. Adami, C. Callegari, S. Giordano, M. Pagano – Università di Pisa A Statistical Network Intrusion Detection System 78 V C. Caruso, D. Malerba, G. Camporeale – Università di Bari Aggregation of network sensors messages by alarm clustering method: choosing the parameters 82 A. Savoldi, P. Gubian – Università di Brescia Embedded Forensics: An Ongoing Research about SIM/USIM Cards 90 S. Aterno – Studio Legale Aterno and Università di Roma “La Sapienza” Lavoro: le linee guida del Garante per posta elettronica e internet 98 A. Pasquinucci – UCCI.IT A Pratical Web-Voting System 101 VI On Privacy in Critical Internet Services Professor Gene Tsudik Computer Science Department University of California Irvine and Dipartimento di Informatica Universita’ degli Studi di Roma, “La Sapienza” EMAIL: gts(AT)ics.uci.edu Recent advances in network security and cryptography have enabled private communication over the public Internet to some extent. For example, public key cryptography allows entities to establish secure communications without pre-shared secrets. Many applications build upon this capability. For example, ssh allows a user to log in to a remote host from anywhere without leaking its secrets (e.g., a password) to intermediate routers. (Similarly, SSL/TLS protects web client/server communication by creating – via public key techniques -- secure session-layer tunnels.) IPv4 Security extension (IPSec [KA98]) takes this a step further by allowing the actual end-points of IP packets to be concealed, via its tunneling mode. More advanced anonymization techniques, such as onion routing and TOR [DMS04], allow network-layer (IP) addresses of communicating hosts to be hidden from any adversary with non-global observation powers. Such techniques clearly have their merits in preserving secrecy of communication and privacy of communicating end-points (with respect to intervening network elements, such as routers). However, they side-step privacy issues in the bootstrapping process that precedes actual communication between two hosts. Bootstrapping typically involves a domain name (DNS) query to resolve the hostname of the target. Furthermore, it increasingly includes revocation checking of the target host’s credentials (Public Key Certificate or PKC), which is particularly the case with web-based communication. For example, to communicate with the web site http://www.whitehouse.gov, a web client (e.g., at a host my.freedom.nk) first sends a query to its DNS [MD88] resolver, asking for the IP address of www.whitehouse.gov. This resolver, in turn, queries upper-level name servers until the query is eventually resolved by the name server responsible for keeping the record for www.whitehouse.gov. In this process, privacy of both source and destination (my.freedom.nk and www.whitehouse.gov) hosts is compromised: the source host reveals to its resolver the target host it intends to communicate; and the name query reveals to remote name server(s) that someone is interested in the targeted destination www.whitehouse.gov. Public key certificate (PKC) revocation checking faces the same privacy issue. An Internet user needs to verify that the target host’s credentials (e.g., a web server’s PKC) are still valid before sending important information, such as credit card numbers. (Note that this is different from establishing PKC authenticity, which is essentially a binding between a public key and some claimed identity; it is attained by verifying a CA’s signature on the PKC.) Despite appearing authentic, a PKC may be revoked prematurely 1 for a number of reasons, such as the loss or compromise of a private key, a change of affiliation or job function, algorithm compromise, or a change in security policy. Therefore, a user must check the revocation status of a PKC before it accepts it as valid [MA+99]. Similar to a DNS query, a revocation check reveals both the source and the target of communication to the (potentially un-trusted) components of the revocation checking system. In modern society preoccupied with gradual erosion of electronic privacy, information leakage of this magnitude is a cause for concern. Consider, for example, certain countries with less-than-stellar human rights records where mere intent to communicate (indicated by revocation checking or a name query) with an unsanctioned or dissident host or website may be grounds for arrest or worse. In the same vein, sharp increase in popularity (deduced from being a frequent target of revocation checking or a name query) of a website may lead unscrupulous authorities to conclude that something subversive is going on. The problem can also manifest itself in other less sinister settings. For example, many Internet service providers already keep detailed statistics and build elaborate profiles based on their clients’ communication patterns. Current name service and revocation checking methods – by revealing sources and targets of name or revocation queries – represent yet another set of sources of easily exploitable and potentially misused personal information. We therefore need examine a clean-slate solution towards a privacy-preserving Internet name service as well as more realistic near-term solutions that lend themselves to DNS coexistence and gradual migration. Since hiding sources of name or revocation queries can be achieved with modern anonymization techniques, such as TOR, our research focuses on hiding the target of the query from the third party service that answers the query. This is a challenging task because of the conflicting goals. On the one hand, the service must have sufficient information to resolve a query; but, on the other hand, it must not know the target of a query so as to preserve the privacy of both source and target. As we have already made some progress into privacy-preserving revocation checking [ST06,NT07], our first step is to apply lessons learned in that domain to tackle the challenges of preserving privacy in the Internet name service. However, the latter is a more formidable task because of the greater scale: a revocation checking system only keeps records for certificates that are revoked, whereas, an Internet-wide name service must keeps a record for every hostname. Some recent proposals to modify or re-design DNS offer some hope, e.g., [HG05,DCW05]. We are currently exploring novel data structures and state-of-the-art cryptographic primitives to architect a scalable privacy-preserving name service. We are investigating applications of various techniques that include range queries, verifiable secret sharing (VSS) [GMW98] and private information retrieval (PIR) [CK+98]. References: [CK+98] B. Chor, E. Kushilevitz, O. Goldreich and M. Sudhan. Private Information Retrieval, J. ACM 45(6), pp. 965–981, 1998. [DCW05] T. Deegan, J. Crowcroft, and A. Warfield. The main name system: An exercise 2 in centralized computing. ACM SIGCOMM CCR, 35(5):5–13, Oct. 2005. [DMS04] R. Dingledine, N. Mathewson, and P. Syverson. ”Tor: The second-generation onion router”, the 13th USENIX Security Symposium, Aug. 2004. [HG05] M. Handley and A. Greenhalgh. The case for pushing DNS. HotNets IV, November 2005. [KA98] S. Kent and R. Atkinson. Security architecture for the internet protocol. Internet Request for Comments: RFC 2401, November 1998. [MD88] P. Mockapetris and K. Dunlap. Development of the Domain Name System. ACM SIGCOMM 1988. [MA+99] M. Myers, R. Ankney, A.Malpani, S. Galperin, and C. Adams. Internet public key infrastructure online certificate status protocol - OCSP. Internet Request for Comments: RFC 2560, 1999. [ST06] J. Solis and G. Tsudik. Simple and flexible private revocation checking. In 2006 Workshop on Privacy Enhancing Technologies (PET’06), June 2006. [NT06] M. Narasimha and G. Tsudik. Privacy-Preserving Revocation Checking, In 2007 EuroPKI Workshop, June 2007. [GMW98] R. Gennaro, M. Rabin and T. Rabin, Simplified VSS and fast-track multiparty computations with applications to threshold cryptography, ACM PODC’98. 3 A Network Traffic Anonymizer M. Esposito1,2, C. Mazzariello1, C. Sansone1 1 Dipartimento di Informatica e Sistemistica Università degli Studi di Napoli Federico II, Napoli (Italy) {mesposit,cmazzari,carlosan}@unina.it 2 Università Campus Bio-Medico, Roma (Italy) [email protected] Abstract. Research in networking often relies on the availability of huge archives of traffic. Unfortunately, due to the presence of sensible information and to privacy issues, such archives cannot always be distributed. Hence, tests and results obtained by using them cannot be reproduced and validated. To such extent, it's useful to have tools which eliminate sensible information from network traffic, making traffic traces freely distributable. In this paper we will present an approach to network traffic anonymization by means of a tool which is flexible, easy to use, and multiplatform. By introducing additional options with respect to other well-known anonymization tools, we will show how it is still possible to keep the resource usage under reasonable limits. Keywords: Anonymity, Privacy, Intrusion Detection Systems. 1. Introduction By processing information contained in network traffic archives, it is possible to perform repeated experiments to gain knowledge about the network traffic properties, and to develop techniques for exploiting such knowledge for several research purposes. Anyway, such archives, due to laws protecting users' right to privacy, cannot be distributed as-is. To address this problem, some approaches propose methods aimed at avoiding the distribution of the traces, totally preventing information leakage [1]. These approaches, however, cannot by followed if traffic traces have to be used to simulate and evaluate the performance of security systems, such as Intrusion Detection Systems (IDS). In this field, traffic traces are typically used to estimate and eventually validate models of either normal or anomalous traffic for a particular network environment. In this case a different approach must be used, by anyhow considering that a preliminary phase, aimed at deleting all the private information from network traffic, is required by the law in many countries. Network traffic archives, in fact, allow us to reconstruct the activities of each user, harvest passwords, bank account and credit card numbers, thus exposing the users to many risks. Hence, tools for deleting or disguising such sensible information are needed. In this paper we will present our proposal for such a tool. In particular, we implemented the possibility of anonymizing header fields at Link State, Network and Transport Layer. Furthermore, our tool also deletes sensible information at the Application layer, by obfuscating the whole payload, though keeping the packet size intact. To such purpose, we substitute the whole payload content with meaningless random bytes, and reevaluate the packet checksum in order to obtain a traffic trace 4 containing valid packets. What other tools usually do, in fact, is truncating the payload, thus obtaining malformed packets which are not always analyzable by means of traffic sniffers. The type of anonymization strategy adopted is tightly related to the application at hand. In Sections 2 and 3 we will describe the foreseen application context for our tool and, according to its requirements, we'll introduce the techniques used in order to anonymize each packet accordingly. Finally, in Section 4 we will show the tool's functionalities, and we'll compare its performance with those of two other well-known anonymization softwares [2, 3]. 2. Anonymization in practice 2.1 Header Anonymization In order for two applications to exchange data correctly, and for the packets to be well-formed, its header fields must be correctly formatted. In the context of privacy enforcement, we want to avoid the possibility of reconstructing any activity performed by each of the hosts. We can choose to hide the information about the type of hardware used, the position in the network, the subnet a host belongs to, and also about the type of service the communication is bound to. According to each of the aforementioned requirements, an anonymization tool has to be able to modify respectively: the MAC address, the IP address and the ports. As to MAC addresses anonymization, as well as port numbers, the process is very simple: we only need to define an injective function for the transformation. For IP addresses, instead, by using our approach, it is possible to either decide to use a random injective function, or a class preserving or a prefix preserving transformation, as it will be shown in next Section. 2.2 Payload Anonymization While the header contains information about the sending and receiving host of each packet, the payload contains data produced by the applications communicating through the network. Usually private data are exchanged, as well as unencrypted passwords and bank account and credit card numbers. The simplest approach to anonymization consists in completely deleting such information. Yet, the payload can also be substituted by random symbols. In the first case, malformed packets are generated, as the size field in the header won't correspond to the actual size of the packet, while in the second the packet checksum will have to be recomputed, as its value is related to the actual content of the payload. Our tool implemented the latter strategy, by using a technique for recomputing the checksum described in the next Section. 3 Implementation Details In this section we will discuss some of the implementation details, by motivating the choices we made while developing our anonymization tool. We will present two strategies for anonymizing IP addresses, and some issues related to payload anonymization. Yet, for the sake of brevity, we will not describe the live 5 anonymization functionality in details. The developed software, called Anonymizer, is available on Sourceforge1 for download. 3.1 Class Preserving Class Preserving anonymization is a strategy mainly developed for IP address anonymization. Its aim is to implement an injective transformation between original and anonymized /8, /16 and /24 class subnets. This means that, for example, if two addresses belong to the same original /16 subnet, they will still belong to the same /16 anonymized subnet even after transformation. Furthermore, the class of IP addresses will be preserved after anonymization. Hence, the first bits of the addresses will be preserved unchanged in order to maintain the address class. 3.2 Prefix Preserving This anonymization strategy allows us to keep IP addresses grouped according to the longest prefix they have in common. In other words, if two IP addresses share an M bits long common prefix, they will still share an M bits long common prefix after anonymization. This automatically allows us to transform IP's belonging to the same subnet into IP's belonging to the same anonymized subnet. 3.3 Signature Preserving Since we want our anonymization tool to be used for network security problems, we want to allow transformed traffic traces to be used for the evaluation of signature-based IDS, too. Hence, given a database of known signatures (as, for example, the ones used by SnortTM [4]), we enable our software to obfuscate all the payload but the desired signature. Then, in case a predefined string is found within the payload, it is not overwritten but simply rewritten in the same way and at the same position in the anonymized payload. 3.4 Checksum Correction As stated before, we want to allow our anonymizer to be able to keep the packet size unchanged, by obfuscating the information contained in the payload, instead of deleting it. In such a case, though, it's necessary to recalculate the checksum for each packet. In order to keep the anonymization time for each trace as low as possible, we decided to implement a strategy for incremental checksum update, described in [5-7], and generally used when the TTL value is changed in the header field of the packet. 4 Experimental Results and Conclusions In order to prove the effectiveness and usability of our tool, we compared it with other two well-known tools, namely tcpdpriv [2] and AnonTool [3]. To such aim, a brief resume of the functionalities of these tools is presented in Table 1, while in Table 2 the execution time of each anonymization tool on different traffic traces is shown. 1 http://anonymizer.sourceforge.net/ 6 Live Anonymization x x x x x x x x x FreeBSD Compatible Port Anonymization x x Linux Compatible Checksum Correction x x x Pattern Matching Payload Darkening x MAC Address Anonymization Prefix Preserving tcpdpriv [2] AnonTool [3] Anonymizer Class Preserving TABLE 1. Functionalities of different anonymization tools. x x x x x x x TABLE 2. Anonymization time using different strategies and operating systems. Linux - prefix preserving strategy FreeBSD - class preserving strategy Tool 0.5 GB 1 GB Tool 0.5 GB 1 GB Anonymizer 46s 1m 46s Anonymizer 26s 1m 53s tcpdpriv AnonTool 46s 1m 43s 26s 1m 52s By considering the data reported in both tables, it is possible to conclude that we were actually able to develop a tool that extends the functionalities provided by [2] and [3], by keeping execution times practically unchanged. Acknowledgments This work has been partially supported by the Ministero dell'Università e della Ricerca (MiUR) in the framework of the PRIN Project “Robust and Efficient traffic Classification in IP nEtworks” (RECIPE). References [1]. Mogul J. C., Arlitt M., “SC2D: An Alternative to Trace Anonymization” in Proceedings of the 2006 SIGCOMM Workshop on Mining Network Data, Pisa, Italy, 2006, pp. 323 - 328. [2]. TcpdPriv: A program for eliminating confidential information from traces. http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html [3]. Koukis D., Antonatos S., Antoniades D., Trimintzios P., Markatos E.P., “A Generic Anonymization Framework for Network Traffic” in Proceedings of IEEE International Conference on Communications 2006 (ICC 2006), vol. 5, June, Istanbul, Turkey, 2006, pp. 2302-2309. [4]. Beale J., Foster J.C., Snort 2.0 Intrusion Detection, Rockland, MA: Syngress Publishing, Inc., 2003. [5]. Barden R., Borman D., Partridge C., RFC 1071 - Computing the Internet Checksum, 1998. [6]. Mallory T., Kullberg A., RFC 1441 - Incremental Updating of the Internet Checksum, 1990. [7]. Rijsinghani A., RFC 1624 - Computation of the Internet Checksum via Incremental Update, 1994. 7 REFACING: an Autonomic approach to Network Security based on Multidimensional Trustworthiness F. Oliviero, L. Peluso, S.P. Romano University of Napoli “Federico II” Abstract Several research efforts have recently focused on achieving distributed anomaly detection in an effective way. As a result, new information fusion algorithms and models have been defined and applied in order to correlate information from multiple intrusion detection sensors distributed inside the network. In this field, an approach which is gaining momentum in the international research community relies on the exploitation of the Dempster-Shafer (D-S) theory. Dempster and Shafer have conceived a mathematical theory of evidence based on belief functions and plausible reasoning, which is used to combine separate pieces of information (evidence) to compute the probability of an event. However, the adoption of the D-S theory to improve distributed anomaly detection efficiency generally involves facing some important issues. The most important challenge definitely consists in sorting the uncertainties in the problem into a priori independent items of evidence. We believe that this can be effectively carried out by looking at some of the principles of autonomic computing in a self-adaptive fashion, i.e. by introducing support for self-management, self-configuration and self-optimization functionality. In this paper we intend to tackle some of the above mentioned issues by proposing the application of the D-S theory to network information fusion. This will be done by proposing a model for a self-management supervising layer exploiting the innovative concept of multidimensional reputation, which we have called REFACING (RElationship-FAmiliarityConfidence-INteGrity). 1 Introduction As computer attacks become more and more sophisticated, the need to provide effective intrusion detection methods increases. Current best practices for protecting networks from malicious attacks rely on the deployment of an infrastructure that includes network intrusion detection systems. However, most such practices suffer from several deficiencies, like the inability to detect distributed or coordinated attacks and the high false alarm rates. Indeed, detecting intrusions becomes a hard task in any networked environment, since a network naturally lends itself to a distributed exploitation of its resources. In such a scenario, the identification of a potential attack requires that information is gathered from many different sources and in many different places, since no 8 locality principle (neither spatial nor temporal ) can be fruitfully applied in the most general case. The classical approaches to distributed protection of a network rely on the effective dissemination of probes and classifiers/analyzers across the infrastructure. We claim that the current solutions to the above mentioned issues lack two fundamental features, namely dynamicity and trustworthiness. Indeed, in our view a network should be capable to self-protect against attacks by means of an autonomic approach which highly depends on the effective exploitation, in each node, of on-line information coming both from local analysis of traffic and from synthetic information delivered by neighboring nodes. Self-organization demands for an un-coordinated capability to appropriately orchestrate the behavior of a number of distributed components. Besides this, the second challenge we identify resides in the need for having an agreed-upon means of deciding whether or not information coming from the outside world can be assumed to be reliable. In this paper we discuss the main issues related to improving network security through manipulating and combining data coming from multiple sources. We present a model for a self-management supervising layer exploiting the innovative concept of multidimensional reputation. 2 Detection from multiple sources As soon as one starts spreading detection components across a network, the issue arises to appropriately orchestrate their operation. In fact, information retrieved from a single sensor is usually limited and sometimes provides for low accuracy. The use of multiple sensors definitely represents a valid alternative to infer additional information about the environment in which the sensors operate. To this aim, many research efforts have so far been conducted with the goal of defining effective approaches for the combination of information coming form multiple sources. Data fusion deals with the combination of information produced by different sensors, with the final aim of improving both the accuracy of the classification process and the reliability of the decision-making process. It looks clear that any approach relying on information fusion brings in some contrasting points. In fact, if on one hand the data fusion process can highly improve reliability of the detection, on the other hand it also makes a strong hypothesis on the reliability of the information which is subject to the analysis. Stated in different terms, as soon as I start relying on data coming from the outside world, I have to make sure that such data can be considered as reliable as my local information, in order to avoid that the fusion process becomes even worse than it used to be in the absence of cooperation. This adds a further level of complexity to the overall intrusion detection system. The ideal situation foresees the possibility to associate local and foreign decisions with a corresponding weight, which actually represents the current level of trustworthiness assigned to the corresponding originating source. In the depicted scenario, each decision in a single node would be taken by appropriately measuring a weighted combination of local and foreign data, with the weights which should vary in time as a function of the reliability of all participating nodes all along their past history. While simple in its formulation, the above ideal scenario definitely looks 9 ideal, in the sense that it is not at all easy to dynamically set weights in an ever-changing environment such as a network crossed by a variegated portfolio of potential traffic profiles (with each such profile subject to unpredictable changes in space and time). Hence, the contribution of our research aims to bring some insights specifically suited to tackle the above mentioned issue. To this aim, we propose to exploit the concept of weighted information fusion in a highly dynamic fashion. The key issue we are addressing is that of dynamically changing the values of the weights assigned to information sources in such a way as to let them concretely follow the current level of reliability of the sources themselves. The system we devise can be compared to a dynamic controller which appropriately sets the values of the parameters of a control function in which the variables to be tuned represent the decisions taken at different points of the network. By summarizing the above considerations, we can easily identify one major challenge, concerning the need to effectively measure the level of trustworthiness to be assigned to both local and foreign decisions. In the following of this paper we will touch upon the above issue. We will introduce a new model for determining the degree of loyalty of a node based on a multidimensional framework (REFACING – Relationship-FAmiliarity-ConfidenceINteGrity) envisaging the thorough analysis of the relations between each pair of interacting nodes. 3 Autonomic Communications In the recent years, we have been witnessing of many radical changes in thinking computer networks. The on-going convergence of networked infrastructures and services, in fact, has changed the traditional view of the network from the simple wired interconnection of few manually administered homogeneous nodes, to a complex infrastructure encompassing a multitude of different technologies, heterogeneous nodes, and diverse services. This situation has put a challenge for the research community to engineer systems and architectures that will increase the robustness of the current and future internetwork whilst alleviating both management costs and operational complexity. The autonomic communications research community has been formed to respond to this challenge. From this perspective, Autonomic Communication (AC) represents a new emergent paradigm for today’s networked cooperation. Many efforts have been devoted to proposing its most appropriate definition and application in different actual scenarios. Based on interdisciplinary grounds, AC tries to tackle the problem by developing architectures and models of networked systems that can manage themselves in a reliable way always fulfilling their service mission. In fact, the essence of autonomic computing systems consists in the self-management requirements, the intent of which is to free system administrators from the details of system operation and maintenance and to allow systems managing themselves given high-level objectives. Independently from networked systems’ behaviors and purposes, the following properties should be exhibited by any autonomic computing system in order to fulfill self-management needs: • Automatic: this essentially means being able to self-control its internal 10 functions and operations. As such, an autonomic system must be selfcontained and able to operate without any external intervention; • Adaptive: an autonomic system must be able to change its operation. This will allow the system to cope with temporal and spatial changes in its operational context either long term (environment customization/optimization) or short term (exceptional conditions such as faults, attacks); • Aware: an autonomic system must be able to monitor its operational context as well as its internal state in order to be able to assess if its current operation serves its purpose. Awareness will control adaptation of its operational behavior in response to situation or state changes. The sequence of the above mentioned properties highlights the basic principle of the Autonomic Computing paradigm. Any autonomic system must have a sensing capability in order to enable the overall system to observe its external operational context and to self-adapt its behavior to fit any environment changes. 3.1 Applying AC principles to Reputation Assessment Autonomic Communication systems support dynamic coalitions of users or entities sharing common interests. In this context, self-management approaches become fundamental to enforce “law and order” through distributed an loosely coupled schemes based on democratic rules, therefore avoiding the complexity and rigidity of centralized control at an extreme, and the complete anarchy leading to irrelevant information, malicious or free behavior at the other extreme. Therefore, the need arises to reach the following objectives: (i) to distribute community control into the community itself in order to allow self-management; (ii) to detect, remove and isolate malicious and malfunctioning components; (iii) to identify components that are overloaded or prone to failure or simply have lower capabilities. 4 Dynamically renewing network nodes’reputation The model we propose to assess the reputation of network components taking part to the distributed detection process is called REFACING (RElationshipFAmiliarity-Confidence-INteGrity) and is based on a multi-layered approach, as depicted in Fig. 1. The lowermost layer provides information about the existence of some form of connection among detection components (probes, detection engines, decision engines, etc.). The absence of connection indicates the actual impossibility of carrying out any form of social relationship with the other nodes of the network. Otherwise, the second layer in the stack can prove useful to quantitatively measure the level of interaction existing between each pair of network nodes. The more we interact, the more familiar we result with respect to each other. Though, this does not necessarily imply that we trust each other: I can know you quite well, but (or even better, just because of this) I can hardly trust you if our past interactions showed me that you are not that reliable. This is the reason why we introduce the third layer of the trustworthiness stack, which deals 11 Integrity Confidence Familiarity Relationship Fig. 1: The REFACING model with confidence. If I have relations with others, and if I am familiar with the others as well, I can much more objectively determine their level of trustworthiness with respect to our social interactions. This said, to further foster the capability of assessing someone else’s loyalty level related to his/her interactions in the network, one more dimension should be taken into account to somehow reflect the variability in the behavioral interaction patterns of each node. To make things clearer, the fact that some node has showed a balmeless behavior in one single interaction does not necessarily mean that such node shall be irreproachable also in its subsequent interactions. Some form of estimation of the line of conduct over time is definitely needed for all nodes: the more coherent my behavior has been in the past, the less probable it will be that I will behave badly in the near future. This is dealt with at the uppermost layer, which provides information about the level of integrity of network nodes. We do believe that the adoption of such a multi-layered model helps add objectivity to the assessment of network nodes’ reputation, since it takes into account a number of complementary, though highly correlated, facets. In our view, the REFACING methodology is implemented at the level of management of the overall infrastructure, as depicted in Fig. 2. The management layer has a global view of the physical topology of the network and is thus capable to determine whether or not there exists some form of relationship (layer 1 in the trustworthiness stack) between the network nodes. Furthermore, thanks to monitoring it can also determine the frequency of the interactions among the network elements (layer 2 of the stack). Information pertaining to the third layer can be retrieved through a comparison between each evaluation provided by a single node and the global opinion of the system (e.g. my confi- 12 REFACING Management Layer Event Label REFACING peer A Attacker REFACING peer REFACING peer D C E Attacker REFACING peer Target Attacker REFACING peer C Attacker Fig. 2: The REFACING methodology dence level gets higher if my personal evaluation was found in accordance with the final decision taken by the distributed detection system after analyzing all single decisions coming form the network nodes). Finally, data at the fourth layer can be computed by statistically analyzing the information related to all past interactions among all underlying nodes (e.g. my integrity level gets higher if my confidence level has kept on growing over the past interactions). After each evaluation turn, the management layer can compute a set of labels (one for each network node involved in the detection process), which are assigned to the nodes through, for example, a policy-based approach. The label computation process can be as general as possible and will normally be influenced by information belonging to all of the layers in the trustworthiness stack (in a simplistic scenario, it might for example be a simple weighted sum of the values computed at each of the four layers). The labels are then used by all nodes whenever they start a new interaction. Each label acts like a business card for the node involved in the interaction and can be used by the other nodes in order to assign a weight to the information that it has received from its partners. 5 Conclusions In this short paper we presented a novel approach to distributed detection of network threats. The core of our contribution resides in having designed a selfmanagement layer exploiting the concept of trustworthiness in order to make the detection process more reliable. The idea of dynamically tuning the currently estimated level of trust of each peer in the community proves fundamental during the information fusion process, which in our architecture is based on the application of an enhanced version of the well-known Dempster and Shafer’s theory of evidence. Such enhanced version of the D-S formula proposes to appropriately weigh the various inputs to the information fusion process on the basis of their estimated impact on the final merged information. The REFACING approach has been tested through extensive measurements 13 based on simulation1 . The experimental campaign (which is not illustrated here for the sake of conciseness) has showed that our solution helps dramatically improve the overall performance of the detection process in a number of realworld operational scenarios. On the other hand, it has also helped us set the limits of our approach when applied to situations envisaging the presence of a high number of unreliable sensors whose responses can negatively bias the output of the information fusion process towards a faulty decision. 1 See http://sourceforge.net/refacing 14 Wavelet-based Detection of DoS Attacks Alberto Dainotti, Antonio Pescapé, and Giorgio Ventre University of Napoli “Federico II” (Italy), {alberto,pescape,giorgio}@unina.it I. I NTRODUCTION Accurate detection and classification of anomalies in IP networks is still an open issue due to the intrinsic complex nature of network traffic. Several anomaly detection systems (ADS) based on very different approaches and techniques have been proposed in literature. Please refer to [1] for a list of related works. In this work we propose an approach to anomaly detection, based on the wavelet transform, which we tested against several types of DoS attacks. Such approach presents several differences with past works. First, we make use of the Continuous Wavelet Transform (CWT), exploiting its interpretation as the cross-correlation function between the input signal and wavelets and its redundancy in terms of available scales and coefficients. All previous works, instead, are based on the use of the Discrete Wavelet Transform (DWT), which is more oriented to the decomposition of the signal over a finite set of scales, each one with a reduced number of coefficients, in order to make the original signal reconstructable from them. This is typically done in a way that avoids redundancy. Second, our detection approach takes explicitly into account - beside hits and false alarms - accuracy of the estimation of the time interval during which the anomalous event happens and the resolution (in terms of ability to distinguish between subsequent anomalies). In the context of security incidents, these aspects can be crucially important, for example to trace back the source of an attack, or during forensics analysis, etc. Third, we propose a cascade architecture made of two different systems - the first one based on classical ADS techniques for time series, the second one based on the analysis of wavelet coefficients - which allows more flexibility and performance improvements as regards the hits/false alarms trade-off. Finally, as fourth point, we present an experimental analysis of the performance of the system under an extensive set of attack - traffic trace combinations (≈ 15000). II. A N A NALYTICAL BASIS The Continuous Wavelet Transform (CWT) is defined as: fCW T (a, b) = +∞ −∞ ∗ f (t)ψab (t)dt = f (t)|ψab (t) , with 1 ψab (t) = √ ψ a t−b a , (1) f (.) is the signal under analysis, ψ(.) is a function of finite energy whose integral over R is 0, called mother wavelet, and a and b are the scaling and translation factors respectively. Each (a, b) pair furnishes a wavelet coefficient, which can also be seen as the cross-correlation at lag b between f (t) and the mother wavelet function dilated to scaling factor a. An important difference between the CWT and the DWT is that the former calculates such correlation for each lag at every possible scale, whereas the DWT calculates a number of coefficients that decreases with the scaling factor. The scale of the coefficients global maximum, is where the input signal is most similar to the mother wavelet. This function is chosen to be oscillating but with a fast decay from the center to its sides, in order to have good scale (frequency) and time localization properties. This makes the CWT a good tool for analyzing transient signals as network traffic time series. In the context of the study of wavelets and image processing, it has been proved that the local maxima of a wavelet transform can detect the location of irregular structures in the input signal [4]. Please refer to [1] and [4] for analytical details. Here we just summarize that by using the derivative of a smoothing function as a mother wavelet (e.g. derivatives of the gaussian function), the zero-crossings or the local extrema of the wavelet transform applied to a signal indicate the locations of its sharp variation points and singularities. The CWT coefficient redundancy, allows to identify these points at every scale with the same time-resolution of the input signal. 0 The content of this extended abstract is part of the paper A. Dainotti, A. Pescapè, G. Ventre, “Wavelet-based Detection of DoS Attacks” 2006 IEEE GLOBECOM - Nov 2006, San Francisco (CA, USA) 15 ROUGH DETECTION Detection-R Normal Behavior Model Input Signal Normal Behavior Model Construction FINE DETECTION Detection-F Rough Detection Signal Detection Signal CWT Coefficients CWT Computing Threshold Signal Analysis Network Data Signal Statistics Threshold Calculation Library of the Anomalies Fig. 1. Anomaly Detection System: Proposed Architecture. III. A RCHITECTURE In Fig. 1 a block diagram representing the two-stage architecture of the proposed ADS is shown. The ADS takes as input a time series of samples representing the packet rate and outputs an ON-OFF signal reporting the presence of an anomaly for each sample. The first stage, which we called Rough Detection, can be implemented using statistical anomaly detection techniques previously presented in literature and it is just responsible to detect any suspicious change in the traffic trend and to report an alarm to the second stage. Its output is equal to 0 or 1 for each input sample. Here we impose a high sensitivity aiming at catching as much anomalies as possible, whereas the second stage, which we called Fine Detection, is designed to reduce the number of false alarms. For each detected anomaly, this stage also estimates the time interval during which it is present. As for the Rough Detection module, we adopted the two techniques proposed in [2] to detect SYN flooding attacks (an adaptive threshold algorithm and the CUSUM algorithm) and we applied them to generic traffic traces. A similar implementation of the CUSUM algorithm has also been proposed in [3] to detect different DoS attacks. More details on our implementation of these well-known algorithms are given in [1]. The CWT computing block computes the continuous wavelet transform of the whole input signal. We used the Wavelab [5] set of routines under the Matlab environment. The block output is a matrix W of M rows and N columns, where N is the number of samples of the input trace. Each row reports the wavelet coefficient at a different scale. The number of available scales M is given by the number of octaves, J = [log2 N ] − 1 times the number of voices per octave. The CWT function implemented under Wavelab allowed us to work with 12 voices per octave. This matrix is fed as an input to the Detection-F block, which receives as inputs also a threshold level (that will be explained in the following) and the Rough Detection Signal. For each alert reported in the Rough Detection Signal, the Detection-F block basically operates by looking for the scale at which the coefficients reach the maximum variation. The use of the CWT guarantees that we have a coefficient for each input sample at every scale - differently from the DWT, where typically the number of coefficients decreases as the scale grows. This way, if an anomaly is recognized, we can identify with good precision the zero-crossing points of the wavelet coefficients at the scale where the anomaly is present. The choice of the threshold level for the wavelet coefficients (Threshold Calculation block) is based on the mean and standard deviation of the traffic trace, computed in the Signal Analysis block, and on the Library of Anomalies, which is a collection of signals representing some traffic anomalies (see Section IV). IV. T RAFFIC T RACES AND A NOMALIES To study and develop our ADS, we made several experiments under a broad range of situations. Our approach was to generate traffic signals superimposing anomaly profiles to real traffic traces in which no anomalies were present. This choice is partly due to the scarce availability of traffic traces containing classified anomalies along with all the necessary details. For example, the lack of information on the exact beginning and end of each anomaly would not allow us to evaluate the temporal precision of the detection system. We considered real traffic traces that were known not to contain any anomalies, obtaining a large and heterogeneous set of traces. In Table I the data sets we used are summarized. The first three groups of traces in Table I were derived from the DARPA/MIT Lincoln Laboratory off-line intrusion detection evaluation data set [6], which has been widely used for testing intrusion detection systems and has been referred in many papers (e.g. [7] [8]). We used only traces from the weeks in which no attacks were present. The dataset marked in Table I as UCLA refers to packet traces collected during August 2001 at the border router of Computer Science Department, University of California Los Angeles [9]. They have 16 been collected in the context of the D-WARD project [10]. Finally, the UNINA data set refers to traffic traces we captured by passively monitoring ingoing traffic at the WAN access router at University of Napoli “Federico II”. Table I contains details about the data sets, as the number of traces for each group and the sampling period Ts used to calculate the packet rate time series. Also, indicative values of mean and standard deviation (std) are shown. All traces are composed of 3600 samples. Several kinds of anomaly profiles related to DoS attacks have been synthetically generated. We assigned labels to each anomaly we used (see Table II). Some anomaly profiles were obtained by generating traffic with real DDoS attack tools, TFN2K [11] and Stacheldraht [12]. We launched such tools with several different options and we captured the traffic that was generated by them. The anomaly profiles obtained were stored and labeled depending on the adopted attacking technique. Another group of anomalies have been obtained by synthetically generating the corresponding time series with Matlab, according to known profiles that have been considered in [13]. We considered ‘Constant Rate’, ‘Increasing Rate’, and ‘Decreasing Rate’ anomalies. TABLE II T ESTED A NOMALIES . TABLE I T RAFFIC T RACES . Data Set Year Ts # Traces Mean Std Darpa 1 Darpa 2 Darpa 3 UCLA UNINA 1999 1999 1999 2001 2004 2s 5s 5s 2s 2s 5 5 5 4 3 80 pkt 20 pkt 12 pkt 20 pkt 8 10E3 pkt 90 pkt 40 pkt 30 pkt 15 pkt 1.3 10E3 pkt Tools Matlab TFN2K Anomalies Constant rate, Increasing rate, Decreasing rate ICMP Ping flood, TCP SYN flood, UDP flood, Mix flood, Targa3 flood Stacheldraht TCP ACK flood, TCP ACK NUL flood, TCP random header attach, Mstream (TCP DUP ACK), DOS flood, mass ICMP bombing, IP header attack, SYN flood, UDP food V. E XPERIMENTAL R ESULTS The experimental results shown have been obtained by performing a large set of automated tests. The results have been summarized and the following performance metrics have been calculated: (i) the Hit Rate, HR = number of test hits of f alse alarms × 100; (ii) the False Alarms Ratio, F AR = number number tests total number of alarms × 100; (iii) the estimation errors in the identification of the beginning and the end of the anomaly; (iv) the number of fragments when a single anomaly is recognized as several ones. Our scripts generated traces containing anomalies with various combinations of parameters and ran the ADS on each of them. In order to test the ADS under more complicated situations (i.e. obfuscating the anomalies in the traces), when a trace and an anomaly profile are selected, the amplitude and the duration of the signal representing the anomaly are modified. Then the signal is superimposed to the traffic trace at a randomly selected point - at 1/4, 1/2, or 3/4 of the trace - and the detection system is executed. For a specific trace, the amplitude of an anomaly was scaled in order to make its maximum peak proportional to the root mean square of the original traffic trace. The choice of the proportionality factor varies from 0.5 to 2.00 with a step of 0.25. Anomaly durations range from 50 to 300 samples with a step of 50. Sampling and interpolation of the anomaly profiles were performed for expansion and shortening respectively. Thus we performed a number of tests given by the product (traces × anomalies × intensities × durations). With 22 traces and 16 anomalies, we performed about 15000 tests, each time we tested a system configuration (i.e. with CUSUM, with AT, etc.). In Table III we show the system performance, in terms of HR and F AR, when the rough detection block is implemented with AT and CUSUM algorithms. We report results obtained separately for each of the 5 trace data sets, and in the last row, we show global results obtained working with all the traces. The columns labeled F D(AT ) and F D(CU SU M ) report performance indicators derived from the output of the fine detection stage when the rough detection stage are AT and CU SU M respectively. Instead, the performance results related just to the output of the rough detection stages are reported in columns labeled with RD(AT ) and RD(CU SU M ). This is to show how we tuned the rough detection stage with a very high sensitivity in order to catch as much anomalies as possible at the expense of a high F AR. Indeed, passing from the rough detection output to the fine detection output, while HR remains almost the same, F AR decreases dramatically. This happens for all the sets of traces, and for both AT and CUSUM, and it represents one of the most important features of the proposed ADS. In order to sketch a comparison between the proposed two-stage ADS and AT or CUSUM used as standalone algorithms, in the columns labeled as AT-sa and CUSUM-sa we show how they perform in terms of HR when tuned with approximately the same F AR of the proposed ADS. We see that, in the case of AT, the introduction of the second stage, improves HR of about 10% for 3 out of 5 trace sets, as for AT, while for CUSUM the improvements range from about 12%, for the fifth trace set, to almost 50%, for the first one. 17 TABLE III HR/FAR T RADE -O FF RESULTS . Dataset Darpa 1 Darpa 2 Darpa 3 UCLA UNINA All RD(AT) HR FAR 95.9 72.8 93.7 68.2 92.1 81.1 90.9 17.7 99.6 69.7 94.2 70.9 FD(AT) HR FAR 89.5 34.9 84.9 38.0 83.8 50.1 86.0 14.0 98.0 7.4 87.7 34.1 RD(CUSUM) HR FAR 84.0 68.6 85.7 83.6 88.3 77.9 91.5 89.6 99.6 77.3 83.7 86.2 FD(CUSUM) HR FAR 82.4 1.56 84.8 38.9 84.7 28.1 86.2 39.8 98.0 12.1 86.3 27.2 AT-sa HR FAR 79.0 35.3 74.1 36.4 71.6 51.0 85.7 15.8 86.4 7.0 79.4 33.1 CUSUM-sa HR FAR 35.1 6.7 49.4 32.6 62.7 25.0 56.3 44.4 78.6 13.1 49.2 33.9 In Fig. 2 we show how HR and F AR are influenced by the relative amplitude (left figures) and the duration (right) of the anomalies. Top and bottom figures refer to the system with AT and CUSUM rough detection respectively. We evaluated performance separately for each anomaly profile. It can be observed that the increasing rate and decreasing rate anomalies (red and green lines respectively) are more difficult to be detected, compared to the other anomalies. However, it is interesting to note that the curves related to all the anomaly profiles follow approximately the same trends. The relative amplitude has more influence on HR and F AR than the anomaly duration. But, when the anomaly amplitude is tuned for peak values greater than the RMS of the trace (relative amplitude ≥ 1) HR does not increase anymore. A similar behavior happens for F AR in the AT case, while as for the CUSUM implementation F AR tends to slowly decrease even after the relative amplitude is higher than 1. As regards the anomaly duration, while F AR always decreases when the anomaly lasts longer, HR inverts this trend after a certain duration. This behavior is accentuated in the CUSUM case. Rough Detection: Adaptive Threshold Rough Detection: Adaptive Threshold FAR FP 40 50 50 50 40 Rough Detection: Cumulative SUM 60 60 50 FAR Rough Detection: Cumulative SUM 70 60 60 FAR 70 40 40 30 30 20 0.5 20 30 30 1 Relative Amplitude 1.5 20 20 50 2 100 100 150 Duration 200 250 10 0.5 300 1 Relative Amplitude 1.5 10 50 2 100 150 Duration 200 250 300 100 150 Duration 200 250 300 100 100 100 90 80 60 80 HR 60 HR 80 TH HR 80 60 70 60 40 20 0.5 40 40 1 Relative Amplitude 1.5 20 50 2 100 Fig. 2. 150 Duration 200 250 50 20 0.5 300 1 Relative Amplitude 1.5 2 40 50 HR and FAR as functions of attacks’ relative amplitude and duration. The diagrams in Fig. 3 show the percentage of correct estimates of the start and the end time of the anomalies, when the width of the confidence interval (expressed in number of samples) increases. We consider the estimate to be correct when the start/end time falls into the confidence interval. For a confidence interval of 30 samples, Rough Detection: Adaptive Threshold Rough Detection: Cumulative SUM 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 0 0 10 20 30 40 50 60 Confidence 70 80 Fig. 3. 90 Start Accuracy End accuracy 10 Start accuracy End accuracy 100 0 0 10 20 30 40 50 60 Confidence 70 80 90 100 ADS accuracy. 70% of the start and end times are correctly identified. In general, we note a slightly better performance in the estimation of the start time compared to the end time. We also evaluated when the system did not correctly estimate the anomaly duration because the anomaly was recognized as several different anomalous events. This occurred rarely: for only 4.62% of the detections with the AT rough detection block, and 1.62% with CUSUM. VI. C ONCLUSION AND ISSUES FOR FUTURE RESEARCH This paper proposed a cascade architecture based on the Continuous Wavelet Transform to detect volume-based network anomalies caused by DoS attacks. We showed how the proposed schema is able to improve the trade-off existing between HR and F AR and at the same time to provide insights on anomaly duration (defining starting and ending time intervals) and on the identification of subsequent close anomalies. Our current work is focused on modifying the proposed system to work in a real time (or on-line) fashion. 18 R EFERENCES [1] A. Dainotti, A. Pescapè, G. Ventre, “Wavelet-based Detection of DoS Attacks” 2006 IEEE GLOBECOM - Nov 2006, San Francisco (CA, USA) [2] V. A. Siris, F. Papagalou, “Application of Anomaly Detection Algorithms for Detecting SYN Flooding Attacks”, IEEE GLOBECOM 2004, Nov. 2004, pp. 2050-2054. [3] R. B. Blazek, H. Kim, B. Rozovskii, A. Tartakovsky, “A Novel Approach to Detection of Denial-of-Service Attacks via Adaptive Sequential and Batch-Sequential Change-Point Detection Methods”, IEEE Workshop Information Assurance and Security, 2001, pp. 220-226. [4] S. Mallat, W. L. Hwang, “Singularity Detection and Processing with Wavelets”, IEEE Trans. on information theory, vol. 38, No.2, Mar. 1992. [5] http://www-stat.stanford.edu/ wavelab/ . [6] R. Lippmann, et al., “The 1999 DARPA Off-Line Intrusion Detection Evaluation”, Computer Networks 34(4) 579-595, 2000. Data is available at http://www.ll.mit.edu/IST/ideval/ [7] G. Vigna, R. Kemmerer, “NetSTAT: A Network-based Intrusion Detection System”, Journal of Computer Security, 7(1), IOS Press, 1999. [8] R. Sekar, A. Gupta, J. Frullo, T. Shanbhag, S. Zhou, A. Tiwari and H. Yang, “Specification Based Anomaly Detection: A New Approach for Detecting Network Intrusions”, ACM CCS, 2002. [9] http://lever.cs.ucla.edu/ddos/traces [10] J. Mirkovic, G. Prier, P. Reiher, “Attacking DDoS at the Source”, ICNP 2002, pp. 312-321, Nov. 2002. [11] CERT Coordination Center. Denial-of-service tools - Advisory CA-1999-17, http://www.cert.org/advisories/CA-1999-17.html , Dec. 1999. [12] CERT Coordination Center. DoS Developments - Advisory CA-2000-01, http://www.cert.org/advisories/CA-2000-01.html , Jan. 2000. [13] J. Yuan, K. Mills, “Monitoring the macroscopic effect of DDos flooding attacks”, IEEE Trans. on dependable and secure computing, vol.2, N.4, 2005. 19 Experimental Analysis of Attacks Against Intradomain Routing Protocols Alessio Botta, Alberto Dainotti, Antonio Pescapè, Giorgio Ventre, Dipartimento di Informatica e Sistemistica – University of Napoli “Federico II” {a.botta,alberto,pescape,giorgio}@unina.it Abstract— Nowadays attacks against the routing infrastructure are gaining an impressive importance. In this work we present a framework to conduct experimental analysis of routing attacks and, to prove its usefulness, we study three attacks against routing protocols: route flapping on RIP, Denial of Service on OSPF by means of the Max Age attack and, finally, route forcing on RIP. We present a qualitative analysis and a performance analysis that aims to quantify the effects of routing protocol attacks with respect to routers resources and network traffic over controlled test beds. I. INTRODUCTION Routing protocols implement mechanisms to discover the optimal route between end points, describing peering relationships, methods of exchanging information, and other kind of policies. Since network connectivity depends on proper routing, it follows that routing security is a critical issue for the entire network infrastructure. In spite of this, while other aspects of computer and communication security, as network applications and system security, are subjects of many studies and of widespread interest, attacks against routing protocols are less known. As Bellovin states in [1], this is probably due to two fundamental reasons: effective protection of the routing infrastructure is a really hard problem and it is outside the scope of traditional communications security communities. Moreover, most communications security failures happen because of buggy code or broken protocols, whereas routing security failures happen despite good code and functioning protocols. For instance, in the case of the routing infrastructure one or more dishonest or compromised routers can alter the routing process, and a hop-by-hop authentication is not sufficient. In this abstract, however, we are not focused on particular countermeasure mechanisms, but rather, on the effects of some routing protocols attacks. This work presents an approach to conduct experimental studies of attacks against the routing infrastructure observing the effects and quantifying their impact with respect to network and devices resources. More precisely, we focus our attention on an experimental study of three types of attacks against Interior Gateway Protocols (IGP). In order to have an evaluation of these attacks, we used a controlled and fully configurable open test bed. In this way we were able to control as much variables as possible, as well as to configure several network topologies. This also allowed repeatability of experiments, obtaining numerical results, which show the degradation of the network configurations under test, with high confidence intervals. Pointing our attention on router resources, network traffic, convergence time and, in general, on network behavior before, during and after the attacks, we found interesting results for all of the above attack scenarios. II. RELATED WORKS AND BACKGROUND For a careful analysis of related work refer to [2].RIP and OSPF are the most commonly deployed intra-domain routing protocols. Both these protocols describe methods for exchanging routing information (network topology and routing tables) between routers of an Autonomous System. Both RIP and OSPF are mainly affected by the lack of mechanisms to guarantee integrity and authentication of the information exchanged. In the RIP scenario, with the term Route Flapping we mean an attack consisting in the advertisement and withdrawal of a route (or withdrawal and re-advertisement) alternating in rapid succession, thereby causing a route oscillation. Whereas, with Route Forcing, we mean to force a different path with respect to the optimal path indicated by the routing protocol, causing service degradation. In the OSPF scenario, when an attacker continually interjects legitimate advertisement packets of a given routing entity with spoofed packets in which the age is set to the maximum value, he causes network confusion and may contribute to a DoS condition. Such an attack not only consumes network bandwidth, but also makes the routing information database inconsistent, disrupting correct routing. In this case we have the Max Age attack. The previous selected attacks are quite different in nature, which clearly poses several limitations to the way to fix The content of this abstract is part of the following published paper: Antonio Pescapè, Giorgio Ventre, “Experimental analysis of attacks against intradomain routing protocols”, Journal of Computer Security - Volume 13, Number 6 / 2005, Pages:877 - 903 20 them in a unified fashion. This work provides just a methodology to qualitatively and quantitatively analyze the impact that routing attacks have on the network infrastructure, leaving room to propose innovative mitigation strategies and restoring policies. III. EXPERIMENTAL SCENARIO In order to have an evaluation of the above-mentioned attacks, we used as a proof-of-concept a controlled and fully configurable open test bed. In this way we were able to control as much variables as possible, as well as to configure on our own several network topologies. We preferred to use a controlled test bed also to have a full control of network devices and network traffic. This aspect is important when several tests must be performed in order to obtain numerical results. Indeed, if we want to draw a general behavior and we want to guarantee the experiment repeatability we have to know and control the dependent variables. Our objective, indeed, is to demonstrate how it is possible to show and analyze what happens to a network under attack and in particular how to evaluate the quantitative impact on network resources. Thus, while the presented methodology is of general validity, the numerical results obtained in the following experiments are important to show the degradation of the specific network configurations under test. In Fig. 1 the experimental test bed used in our practical analysis is depicted, whereas in Table 1 a complete description of hardware and software characteristics is shown. The experimental test bed is composed of eight networks and seven routers (with back-to-back connections). We called the routers Aphrodite, Cronus, Poseidon, Gaia, Zeus, Helios, and Calvin. In Figure 1, the numbers in the bullets represent the last field of the IP address related to the indicated network address (placed on the link between two routers). It is worth noting that, by changing the position of the links between each pair of routers and by consequently modifying (when needed) the addressing plane, we are able to produce a large number of network topologies. We used GNU Zebra [3] to implement routing protocols on our Linux routers. As regards traffic generation, we used D-ITG [4,5]. To passively capture transferred packets, without influencing the systems constituting the network configurations under test, we worked with the Ethereal [6] network sniffer. With the terms packet forgers, instead, we mean tools able to create and send known protocols packets, filling the various fields with information chosen by the user, usually an attacker. Such software also allow to indicate IP source and destination addresses in a totally arbitrary way and attend to the calculation of the control fields that are built from the data inserted in the packet. To perform “routing packets forgery” and conduct our simulated attacks, among several available tools as Spoof [7], Nemesis [8], IRPAS [9], and srip [10], we chose Spoof because of its bias towards routing protocols, and because it was previously used in other works reported in literature that are related to security of routing protocols [11]. Using common PCs as well as open source and freeware tools guarantees that our experiments can be repeated in a more simple way by other practitioners or researchers in the field of computer network security. Figure 1: Network test bed – Route Flapping on RIP Figure 2: Route Flapping on RIP: average time of a stable path TABLE I - HARDWARE AND OS DESCRIPTION CPU MEMORY OS Network Cards Intel PII 850 MHz RAM 128MB – Cache 256 KB Linux Red Hat 7.1 – Kernel 2.4.2-2 Ethernet Card 10/100 Mbps Figure 3: Flapping on RIP: number of route changes during tests 21 IV. EXPERIMENTAL ANALYSIS It is worth mentioning that we repeated each test several times. In the following subsections, in the case of numerical values, we present the average value over several experiment repetitions. Thanks to the use of a controlled test bed we had a confidence interval 96%. Therefore, for each experiment, the numerical results found in this work can be considered very reliable in representing how an attack impacts the network configuration under test. This is an important aspect of the proposed methodology. A. Route Flapping on RIPv2 In Fig. 2 the network scenario and the actors (subnets and routers) of this attack are depicted. In this test, by sending false information from Poseidon to Cronus, we forced the RIPv2 to use different routes at different time intervals. By using spoof we announced to Aphrodite a longer distance for the 192.168.2.0 subnet with respect to the real path, this was repeated at regular intervals. Therefore, the effect of the attack was to make Cronus believe that the shortest path to reach subnet 192.168.2.0 had Poseidon as first hop instead of Aphrodite. RIPv2 reacted with a path oscillation (route flapping) between Cronus and subnet 192.168.2.0: one time the best path had Poseidon as first hop, the other time it had Aphrodite. We carried out a high number of trials by varying the time interval between two malicious packets. Also, all trials have been repeated in two different situations. In the first one, the false information (malicious packets) sent from Poseidon to Cronus specified a distance between Aphrodite and 192.168.2.0 subnet equal to 10 hops. In the second one, the false information reported that the 192.168.2.0 subnet was unreachable from Aphrodite. The effect of the attack can be monitored also by looking at Cronus routing table before and during the attack, or by issuing the traceroute command between Cronus and Calvin (192.168.2.2). Before analyzing experimental results we would like to underline that in a normal situation RIP indicates a path between Cronus and 192.168.2.0 subnet made by 3 hops, whereas during route flapping the path length oscillates between 3 and 6 hops. Our test lasted for 200 seconds and we repeated it for different time intervals between two successive malicious packets containing false information (3s, 6s, 9s, … up to 30s). For an interval of 200s we evaluated the number of route flappings and the length of a stable path. In Fig. 5, the number of the route changes (route flapping) inducted is reported. The first cycle is related to the attack that sends information on distance between Aphrodite and 192.168.2.0 subnet equal to 10 hops. The second cycle is related to the attack that sends information on the distance between Aphrodite and 192.168.2.0 subnet equal to 16 hops (unreachable). We noticed that in the second cycle there are less route changes than in the first cycle. In addition, we observed that in both cycles, the maximum number of route changes is obtained far from the interval bounds (as you can see the highest value is obtained in the experiment with a time interval, between two successive packets, of around 12s). Close to the interval bounds a smaller number of oscillations is experienced. In Figs 7, the average durations of the stable path with 6 hops (measured in seconds) in function of the malicious packets inter-departure time (IDT), is depicted. Looking also at the diagram of the stable path made of 3 hops, we noticed that, in both cases, the first cycle the average duration of stable paths was independent from malicious packets IDT. Figure 4: Route Flapping on RIP: OUT Traffic (Second cycle) Figure 5: Denial of Service on OSPF: Experimental test-bed Finally, using SNMP, for each router in the test bed and for each experiment, the total amount of input and output traffic was calculated. We plotted the traffic amount, indicated as a percentage of the minimum number of bytes for each interface of each router (in both input and output directions). The figures contain a single line for each router. In all diagrams it is possible to observe that there is a similar envelope for Calvin, Helios, Aphrodite, and Cronus plots. Here, for lack of space we show only the diagram related to output traffic of the 2nd cycle. 22 B. Denial of Service on OSPF The experimental test bed is depicted in Fig. 12. The target of OSPF Max Age Attack is to cause a DoS. Using spoof we sent false LSAs from Poseidon to Calvin. These LSAs contained information on Zeus and Cronus and had the Age field set to the maximum value. The result of these actions was the loss of contact between Zeus/Cronus and Calvin/Helios. Indeed, because of the information that is believed to be sent by Poseidon, the routers named Calvin and Helios deleted the entries for Zeus and Cronus in their LSA databases. Attack consequences are very simple to understand. We verified the success of the attack using the ping command from Calvin and Helios, directed to subnets 192.168.1.4, 192.168.1.8, and 192.168.1.64 (reporting a network unreachable error during the attack). Figure 6: Denial of Service on OSPF: Traffic analysis Figure 7 Route Forcing on RIP: TCP throughput An analysis of the Helios LSA database before and during the attack confirmed the depicted behavior. During this attack, by using OspfSpfRuns SNMP variable we counted the number of executions of the Dijkstra algorithm (in a single specific area) for each router. By taking into account the number of recalculations during the attack we could evaluate the computational load for each router. It is worth noting that in our test we use a PC with 128 MB RAM. In a real router we may find a smaller amount of memory, hence controlling the number of recalculations – with a large number of routing entries - is important, especially with a large network and a large number of routers. It is important to underline that during our first experiment (50s) there were six executions of the Dijkstra algorithm for each router. This means that, besides the depicted DoS effect, we were able to increase also the computational load of each router in the testbed. Finally, it is interesting to note that if we increase the duration of the experiment (100s, 150s, ...) the number of recalculations per second is basically constant. After this first measure we repeated the experiment eliminating all other traffic sources (ping, …) from our experimental network. In this way we were able to measure the precise amount of traffic on the network during the attack. In order to have a reference value, during a time interval equal to 60s, we measured the traffic load with and without the Max Age Attack on OSPF. In Fig. 17 comparisons among data related to network load when the systems were under attack, expressed in percentages with respect to the values found without the attack, are depicted. The most important result is that the maximum increase of traffic happens on the router interfaces that rely on the subnet that was unreachable during the attack. C. Route Forcing on RIP Between Aphrodite and Cronus there is one single hop. Using the Route Forcing attack, we forced the traffic to follow a longer path from Aphrodite to Cronus, to finally reach subnet 192.168.2.0. In order to obtain such result we used spoof on both Zeus and Poseidon. Repeating this packet sending activity every 3s we forced the artificial route for the entire experimental time interval. Moreover, because of the Route Forcing on RIPv2, we moved from a path with a link bandwidth equal to 100 Mbps to a path made of links of 10 Mbps. Using D-ITG we measured the throughput on both paths. During the attack, TCP and UDP flows experienced a saturated path, whereas after the attack both flows relied on a non-saturated path (Fig.24). V. CONCLUSION In this work we presented a methodology to conduct experimental analysis of attacks against routing protocols. We showed how an attack can be performed and emulated, with which tools it is possible to carry out the emulation and analysis, and what are the impacts on the networks under attack. As for the effects, we pointed our attention on router resources, network traffic, convergence time and on network behavior before, during and after the attacks. 23 REFERENCES [1] [2] S. M. Bellovin, Routing Security, Talk at British Columbia Institute of Technology, June 2003. Antonio Pescapè, Giorgio Ventre, “Experimental analysis of attacks against intradomain routing protocols”, Journal of Computer Security - Volume 13, Number 6 / 2005, Pages:877 – 903 [3] http://www.zebra.org/ (as of April 2007) [4] http://www.grid.unina.it/software/ITG (as of April 2007) [5] S. Avallone, D. Emma, A. Pescapè, and G. Ventre. Performance evaluation of an open distributed platform for realistic traffic generation. Performance Evaluation: An International Journal (Elsevier Journal), Volume 60, Issues 1-4, pp. 359-392 (May 2005) - ISSN: 0166-5316 [6] http://www.ethereal.com (as of April 2007) [7] http://www.ouah.org/protocol_level.htm (as of April 2007) [8] The nemesis packet injection tool-suite. http://nemesis.sourceforge.net/ (as of April 2007) [9] FX. “IRPAS – Internetwork Routing Protocol Attack Suite”. http://www.phenoelit.de/irpas/ (as of April 2007) [10] Reliable Software Group Website (University of California Santa Barbara), http://www.cs.ucsb.edu/~rsg/ (as of April 2007) [11] V. Mittal, G. Vigna. Sensor-Based Intrusion Detection for Intra-Domain Distance-Vector Routing. In Proceedings of CCS 2002, 9th ACM Conference on Computer and Communications Security. November 17-21, 2002, Washington, DC. 24 Validating Orchestration of Web Services with BPEL and Aggregate Signatures ∗ Carlo Blundo † Clemente Galdi § Emiliano De Cristofaro ‡ Giuseppe Persiano ¶ May 17, 2007 Abstract In this paper, we present a framework providing integrity and authentication for secure workflow computation based on BPEL Web Service orchestration. We address a recent cryptographic tool, aggregate signatures, to validate the orchestration by requiring all partners to sign the result of computation. Security operations are performed during the orchestration and require no change in the services implementation. 1 Introduction Web Services technology [12] provides software developers with a wide range of tools and models to produce innovative distributed applications. Interoperability, cross platform communication, and language independence are only a part of the appealing characteristics of Web Services. The standardization and the flexibility introduced by Web services in the development of new applications translate into increased productivity and gained efficiency. Furthermore, the challenges of fast growing and fast evolving business processes lead to a higher level of interactions and to the need of applications integration across organizational boundaries. Services are no longer designed ∗ This work has been partially supported by the European Commission through the IST program under contracts FP6-1596 (AEOLUS). † Dipartimento Informatica ed Applicazioni - Università di Salerno - [email protected] ‡ Dipartimento Informatica ed Applicazioni - Università di Salerno - [email protected] § Dipartimento di Scienze Fisiche - Università di Napoli Federico II - [email protected] ¶ Dipartimento Informatica ed Applicazioni - Università di Salerno - [email protected] 25 as isolated processes, but to be invoked by other services and to invoke themselves other services. This paradigm is often referred to as Service-Oriented Computing (SOC) [27]. Two different approaches can be considered: service orchestrations, i.e., there is one particular service that directs the logical order of all other services, and service choreographies, i.e., individual services work together in a loosely coupled network [24]. Interactions should be driven by explicit process models. Therefore, the need of languages to model business process arises in particular for processes implemented by Web Services. Recently, many languages have been proposed, such as BPML [17], XLANG [11], WSCI [10], WS-BPEL [22], WSCDL [9]. Among these, the Business Process Execution Language for Web Services (WS-BPEL) has emerged as the de-facto standard for “enabling users to describe business process activities as Web services and define how they can be connected to accomplish specific tasks” [22]. However, whereas security and access control policies for normal Web Services are well studied, Web Services composition still lacks a standard tool to validate the orchestration of processes, in terms of providing integrity and authenticity of the computation. In this paper, we refer to the standard tool for providing integrity and authentication in network interactions, i.e., cryptographic signatures. However, two different issues should be considered when addressing this tool to provide security within Web Services interactions. First, the number of signatures would grow linearly to the number of users and processes, arising issues related to the size of certificate chains and bandwidth overhead. Second, there could be applications requiring a specific order to perform processes. To this aim, we address a recent cryptographic tool, aggregate signatures, first presented by Boneh et al. [15]. Aggregate signatures allow to aggregate multiple signatures on distinct messages into a single short signature. Furthermore, we consider a variant of this primitive, in which the set of signers is ordered. This scheme is referred to as sequential aggregate signatures and it has been presented by Lysyankaya et al.[25]. In this paper, we present a framework which uses these two schemes to validate Web Services orchestration, by requiring each partner to sign the result of its computation. In particular, since Web Services interactions can either be performed in parallel or sequentially, we need both schemes: when the computation is carried out in parallel, we use the “standard” aggregate signature scheme presented in [15] and its sequential variant presented in [25] to validate sequential workflow computation. The rest of the paper is structured as follows. In Section 2, we give an overview of the endorsed technologies, while in Section 3 we present our framework. 26 2 2.1 Background Workflow and WS-BPEL A workflow describes the automation of a business process. During the process documents, information, or roles are exchanged among actors in order to complete a task as specified by a well-defined set of rules. In other words, a workflow is composed by a set of tasks, related to each other through different types of relationships [20]. A workflow management system allows to define, create, and manage the execution of a workflow through a software executing on one or more workflow engines. A workflow manager interpreters the formal definition of a process, in order to interact with several actors by managing states and tasks coordination. Tasks, actors, and processes within the workflow can be designed as desired. In this paper, we focus on the Web Service technology and on WSBPEL, the Business Process Execution Language for Web Services. For a thorough overview of Web Services, we refer to [21]. WS-BPEL is an XMLbased language for describing the behavior of business process based on Web Services [22]. It provides activities, which can be either basic or structured. A basic activity can communicate with the partners by messages (invoke, receive, reply), manipulate data (assign), wait for some time (wait), do nothing (empty), signal faults (throw), or end the entire process (terminate). A structured activity defines a causal order on the basic activities and can be nested in another structured activity itself. The structured activities include sequential execution (sequence), parallel execution (flow), data-dependent branching (switch), timeout-or message-dependent branching (pick), and repeated execution (while). The most important structured activity is a scope. It links an activity to a transaction management and provides fault, compensation, and event handling. A process is the outmost scope of the described business process. For more details about BPEL, we refer to [28], [19], and [23]. WS-BPEL is a standard specification, therefore it must be implemented. Nowadays, many implementations are available, such as Bexee [3], Oracle BPEL Process Manager [6], BPEL Maestro [7], or iGrafx [5]. Among these, the ActiveBPEL implementation [2] appears to be one of the most interesting solutions, as being open source and freely available. ActiveBPEL is deployed as a servlet into Apache’s Jakarta Tomcat container. It has extensive documentation and has been released by Active EndPoints into the public domain under a Lesser Gnu Public License. It provides a full implementation of the BPEL 1.1 specification. Moreover, Active EndPoints released two useful 27 and powerful tools: ActiveBPEL Designer and ActiveBPEL Enterprise. The former is a comprehensive visual tool for creating, testing, and deploying composite applications based on BPEL standard, and it is a plug-in to the Eclipse development environment [4]. The latter is a complete BPEL engine, which provides many features to administrate BPEL processes. 2.2 Aggregate Signatures An aggregate signature scheme is a digital signature that supports aggregation. Given n signatures on n distinct messages from n distinct users, it is possible to aggregate all these signatures into a single short signature. For the rest of the paper, we will use the acronym AS to refer to the signature scheme presented in [15]. Let U be the set of possible users. Each user u ∈ U holds a keypair (P Ku , SKu ). Consider a subset U ⊆ U. Each user u ∈ U signs a message Mu obtaining σu . All the signatures are combined by an aggregating party into a single aggregate σ, whose length is equal to any of the σu . The aggregating party has not to be one of the users in U, neither it has to be trusted by them. The aggregating party needs to know to the users’ public keys, the messages, and the signatures on them, but not any private keys. This scheme allows a verifier, given σ, the identities of the users, and the messages, to check whether each user signed his respective message. The AS scheme is based on Co-gap Diffie-Hellman signatures (for details, see [16]) and bilinear maps and it is proved to be secure in a model that gives the adversary the choice of public keys and message to force. Authors added the constraint that an aggregate signature is valid only if it is an aggregation of signatures on distinct messages. However, in [13] a new analysis and proof is provided to overcome this limitation, yielding the first truly unrestricted aggregate signature scheme. Moreover, a sequential aggregate signature scheme has been proposed in [25]. For the rest of the paper, we refer to it as SAS. In this scheme, the set of signers is ordered. The aggregate signature is computed by having each signer, in turn, add his signature to it. The construction proposed in [25] is based on families of certified trapdoor permutations. In the same paper the authors explicitly instantiate the proposed scheme based on RSA. This scheme is based on the the full-domain hash signature scheme introduced in [14]. Such a scheme works with any trapdoor one-way permutation family. However, also this scheme is affected by the restriction that distinct signers have to sign distinct messages. Once again, in [13] a new analysis and proof has been showed that the restriction can also be dropped, yielding an unrestricted sequential aggregate signature scheme. The main difference with sequential aggregate signatures is that each 28 signer transforms a sequential aggregate into another that includes a signature on a message of his choice. Signing and aggregation are a single operation; sequential aggregates are built in layers: the first signature in the aggregate is the inmost. As with non-sequential aggregate signatures, the resulting sequential aggregate is the same length as an ordinary signature. Such a scheme is proved secure in the random oracle model. 2.3 WS-Security In the context of Web Services, many supplemental standards for guaranteeing security features have been released in the last years, and collected under the WS-Security specification [26]. This specification (whose last version has been released at the beginning of the 2006) is the result of the OASIS Technical Committee’s work [1]. It proposes a standard set of SOAP extensions that can be used when building secure Web services to provide confidentiality (messages should be read only by sender and receiver), integrity and authentication (the receiver should be guaranteed that the message is the one sent by the sender and has not been altered), non repudiation (the sender should not be able to deny the sending of the message), and compatibility (messages should be processed according to the same roles by any node on the path message). Such mechanisms provide a building block that can be used in conjunction with other Web service extensions and higher-level applications to accommodate a wide variety of security models and security technologies. Before the release of the WS-Security standard, such features had to be implemented by adding and managing customized ad-hoc headers in SOAP messages exchanged during Web Services interactions. Instead, within WSSecurity these headers become standard as well as mechanisms to manage them. In fact, WS-Security implementations provide application developers with dedicated handlers to process security information in a transparent way. 3 Our framework Our goal is to build a framework which provides integrity and authentication for secure workflow computation based on BPEL Web Service orchestration. In our scenario, the orchestrator defines the workflow and describes Web Services composition through BPEL. We want to ensure that: 1. The workflow has been correctly followed by the defined partners. 2. The workflow has been correctly followed in the defined order. 29 3. The partners cannot repudiate their computation. 4. The partners can verify that their computation has been correctly inserted into the workflow. Figure 1: Combination of SAS and AS schemes Figure 2: Framework Architecture To this aim, we address to the aggregate signature schemes presented in Section 2, and referred to as AS and SAS. We hereby propose a novel aggregate signature scheme as the combination of the AS scheme in [15] and the SAS scheme in [25]. Actually, we need both schemes since composite Web Services execution can be performed either in a parallel or in a sequential 30 way. Therefore, we use SAS to sign messages sequentially produced by the partners. Instead, whenever a parallel execution is performed, we use AS to sign and combine produced messages. As depicted in Figure 1, when an orchestration includes both parallel and sequential executions, we need to “map” SAS to AS, since the two signature schemes work on different fields. At the end of the computation, we expect the Orchestrator to perform signature verification to check its correctness. However, since AS and SAS allow every user to perform aggregate signature verification, every entity involved in the composition may verify that its computation has been correctly inserted in the workflow. To implement our scheme, it is necessary to modify the ActiveBPEL engine to support signature operations. We address the WS-Security standard to represent information in a standard way. In particular, we refer to the possibility of exchanging security data within the BinarySecurityToken field in the message header. In fact, WS-Security provides to developers the chance of defining new security tokens and store them as binary data into the BinarySecurityToken field. In particular, it is possible to force the value type of the token to respect an XML schema [8]. In order to add this security layer, we need to modify ActiveBPEL to allow intermediate operations to be performed between services’ invocations. In fact, to the best of our knowledge there is no BPEL engine supporting security features in the service composition. Therefore, we need to modify the process deployment operation by introducing a sign service to be invoked. Figure 2 shows the overall architecture of the framework. The core components are the BPEL process, the BPEL engine, and the security service [18]. BPEL engine has been modified to add security services: aggSign for the AS feature, seqAggSign for the SAS feature, and addToken for adding the security information in the SOAP header. References [1] Organization for the Advancement of the Structured Information Standards (OASIS). http://www.oasis-open.org/home/index.php. [2] ActiveEndPoints ActiveBPEL Open Source Engine, BPEL Standard. http://www.activebpel.org/, Last visit, April 2007. [3] Bexee: BPEL EXecution Engine. http://bexee.sourceforge.net/, Last visit, April 2007. 31 [4] Eclipse SDK. http://www.eclipse.org, Last visit, April 2007. [5] iGrafx BPEL. http://www.igrafx.com/products/bpel/, Last visit, April 2007. [6] Oracle BPEL Process Manager. http://www.oracle.com/technology/bpel/index.html, Last visit, April 2007. [7] Parasoft BPEL Maestro. http://www.parasoft.com/BPELMaestro, Last visit, April 2007. [8] W3C XML Schema. http://www.w3.org/XML/Schema, Last visit, April 2007. [9] Web Service Choreography Description Language (WSCDL) 1.0. http://www.w3.org/TR/ws-cdl-10/, Last visit, April 2007. [10] Web Service Choreography Interface (WSCI) http://www.w3.org/TR/wsci/, Last visit, April 2007. [11] XLANG: XML-based extension of http://www.ebpml.org/xlang.htm, Last visit, April 2007. 1.0. WSDL. [12] Gustavo Alonso, Fabio Casati, Harumi Kuno, and Vijay Machiraju. Web Services: Concepts, Architecture and Applications. Springer Verlag, 2004. [13] Mihir Bellare, Chanathip Namprempre, and Gregory Neven. Unrestricted Aggregate Signatures. Cryptology ePrint Archive, Report 2006/285, 2006. http://eprint.iacr.org/. [14] Mihir Bellare and Phillip Rogaway. Random oracles are practical: a paradigm for designing efficient protocols. In CCS ’93: Proceedings of the 1st ACM Conference on Computer and Communications Security, pages 62–73, New York, NY, USA, 1993. ACM Press. [15] Dan Boneh, Craig Gentry, Ben Lynn, and Hovav Shacham. Aggregate and verifiably encrypted signatures from bilinear maps. In Proceedings of Advances in Cryptology - Eurocrypt 2003, Lecture Notes in Computer Science, volume 2656, pages 416–432. Springer-Verlag, Berlin, 2003. [16] Dan Boneh, Ben Lynn, and Hovav Shacham. Short signatures from the weil pairing. In Proceedings of Advances in Cryptology - Asiacrypt 2001, Lecture Notes in Computer Science, volume 2248, pages 514–532. Springer-Verlag, Berlin, 2001. 32 [17] Business Process Modeling Language. http://www.bpmi.org. Last visit, April 2007. [18] Anis Charfi and Mira Mezini. Using aspects for security engineering of web service compositions. In ICWS ’05: Proceedings of the IEEE International Conference on Web Services (ICWS’05), pages 59–66, Washington, DC, USA, 2005. IEEE Computer Society. [19] Xiang Fu, Tevfik Bultan, and Jianwen Su. Analysis of interacting bpel web services. In WWW ’04: Proceedings of the 13th international conference on World Wide Web, pages 621–630, New York, NY, USA, 2004. ACM Press. [20] Dimitrios Georgakopoulos, Mark Hornick, and Amit Sheth. An overview of workflow management: from process modeling to workflow automation infrastructure. Distrib. Parallel Databases, 3(2):119–153, 1995. [21] Steve Graham, Simeon Simeonov, Toufic Boubez, Glen Daniels, Doug Davis, Yuichi Nakamura, and Ryo Neyama. Building Web Services with Java: Making sense of XML, SOAP, WSDL, and UDDI. 2001. [22] IBM, Microsoft, and BEA Systems. cess execution language for web services. http://www.ibm.com/developerworks/library/ws-bpel. Business proAugust 2002. [23] R. Khalaf, A. Keller, and F. Leymann. Business processes for Web Services: principles and applications. IBM Syst. J., 45(2):425–446, 2006. [24] Niels Lohmann, Peter Massuthe, Christian Stahl, and Daniela Weinberg. Analyzing Interacting BPEL Processes. In Business Process Management, 4th International Conference, BPM 2006, Vienna, Austria, September 5-7, 2006, Proceedings, volume 4102 of Lecture Notes in Computer Science, pages 17–32. Springer-Verlag, September 2006. [25] Anna Lysyanskaya, Silvio Micali, Leonid Reyzin, and Hovav Shacham. Sequential aggregate signatures from trapdoor permutations. In Proceedings of Advances in Cryptology - Eurocrypt 2004, Lecture Notes in Computer Science, volume 3027, pages 74–90. Springer-Verlag, Berlin, 2004. [26] Anthony Nadalin, Chris Kaler, Phillip Hallam-Baker, and Ronald Monzillo. Web Services Security: SOAP Message Security 1.1, OASIS. 2006. http://www.oasis-open.org/committees/download.php/16790/wssv1.1-spec-os-SOAPMessageSecurity.pdf . 33 [27] Mike P. Papazoglou. Agent-oriented technology in support of e-business. Communications of ACM, 44(4):71–77, 2001. [28] Petia Wohed, Wil M. P. van der Aalst, Marlon Dumas, and Arthur H. M. ter Hofstede. Analysis of Web Services Composition Languages: The Case of BPEL4WS. In Proceedings of the 22nd International Conference on Conceptual Modeling (ER), pages 200–215, 2003. 34 PassePartout Certificates Ivan Visconti Dipartimento di Informatica ed Applicazioni Università di Salerno via Ponte Don Melillo 84084 Fisciano (SA) - ITALY E-mail: [email protected] Abstract The invention of public-key cryptography revolutioned the design of secure systems with a tremendous impact on the development of a cyberspace where digital transactions replace all activities of the physical world. The use of public-key cryptography evidenced the need of public-key infrastructures and digital certificates. Such tools were developed and integrated into the de-facto standard security architectures on the Internet. After the invention of public-key cryptography the crypto world continued the development of cryptographic gadgets that when security, privacy and usability are considered as major requirements, can produce amazing impacts on current technologies. The goal of this work is to show how to plug in recent results of the crypto world in some of the current standards used by the information security world. In particular we consider the use of “trapdoor commitments” and of “zero-knowledge sets” and show how to extend the features of current standard implementations of access controlbased systems using such two crypto primitives, still preserving (and in some cases improving) the efficiency and usability of current systems. Keywords: PKIX, attribute certificates, commitment schemes, ZK sets. The work of the author is supported in part by the European Commission through the IST program under Contract IST-2002-507932 ECRYPT, and in part by the European Commission through the FP6 program under contract FP6-1596 AEOLUS. 1 Introduction The design of secure systems includes the use of cryptographic tools and their combination in a sophisticated process. Since cryptography evolves and new powerful gadgets are provided, security experts should reconsider the design of their systems as they could benefit from such novel results. Unfortunately this is not always the case, the crypto world and the information security world do not seem to have a so strong interaction and thus more 35 attention should be paid to the blending of these two worlds. It is crucial that security experts ask to cryptographers for new tools with specific properties, and that cryptographers inform security experts about recent and potentially interesting results. Access control by means of digital certificates. In this work we focus on the problem of designing secure systems that need an access control functionality, i.e., the possibility of personalizing the execution of a service on the base of the privileges of the parties. This is a typical security problem that is already widely spread in the cyberspace and thus it already deserved a central attention in the past. We observe that these crucial and popular features seem to suffer the previously discussed gap between the crypto and the information security worlds. The widely used public-key infrastructure based on X509v3 certificates [7] and its integration by means of so called attribute certificates [5] represents a major combination of cryptographic tools (i.e., signature schemes and collision-resistant hash functions) for realizing secure transactions. However these crypto tools were already known long time ago1 , and one should wonder whether new crypto tools could allow the design of better systems. X509 and attribute certificates. An X509 certificates is a digital certificate that contains some mandatory information to identify the owner (e.g., the issuer, the subject, the expiration date, the signature algorithm), along with a digital signature that “certifies” the binding between such data and a public key. Moreover it can contain some extensions to add new fields. Such certificates are currently the standard on the Web [4, 6] and for secure e-mail [14]. The possibility of adding extensions is a first method to perform access control. Indeed, the certification authority can verify possession of some credentials and can add some corresponding fields to the certificate. Then, the owner of the certificate can simply send his certificate to a service provider and proving (in general by means of a signature) the ownership of the certificate, it obtains all privileges that correspond to the encoded credentials. This mechanism is very efficient but unfortunately it is not flexible. The set of valid credentials of a user is dynamic, indeed credentials expire and new credentials could be obtained in the future. The previous solution not only is not flexible but also it is not privacy aware. Indeed, adding all credentials to the digital certificate exposes user data to a privacy theft. Digital certificates are used for many purposes and giving all credentials to a service provider that “should” not be interested in many of them can be dangerous. A both flexible and privacy-aware solution is that of combining X509 certificates (without extensions) and attribute certificates. A user obtains a X509 certificate from a certification authority, then when he needs a “certified” credential, he asks an attribute authority for an attribute certificate. This authority verifies that the user owns such credentials and that the user owns the digital certificate, then it releases a specific certificate for those credentials that however is linked to the X509 certificate. Such a link guarantees that the attribute certificate can not be used by other users. This solution is flexible because attribute certificates can be short-lived while X509 certificates can still be long-lived. Moreover the user will send to a service provider the sole attribute certificates that are needed to obtain the appropriate privileges. 1 There are always updates regarding the specific implementations of these tools that generate then updates of the implementations of security architectures, but the design of such systems do not change. 36 The combined use of X509 and attribute certificates is thus a practical proposal that is used with success as it seems to guarantee satisfying security, privacy and flexibility. Our contribution. Given this state-of-the art, we first discuss the power of such standard techniques for access-control based transactions showing some weaknesses in concrete applications. Then, we discuss some useful additional tools produced by the crypto world, namely trapdoor commitment schemes and zero-knowledge sets. We show that these tools can be integrated in current standard technologies thus making current systems more robust with respect to many concrete applications, where security, privacy and usability are currently not satisfied. 2 Weaknesses of Current Standards Assume an access-control policy is based on the nationality2 of a user. This is a concrete example as many access-control policies include such rules or variations of them. A citizen can get from his government (that works as an attribute authority) an attribute certificate so that he uses it along with his X509 (identity) certificate to obtain the required privileges. We need wildcards. There are special cases in which some specific users have strong privileges. A diplomatic representative at the United Nations would need for its missions a wildcard that allows him to have the privileges of all nationalities. This can be managed using current standards in two ways. The first way is to give him an attribute certificate where the nationality is a special string that corresponds to all nationalities. The second way is to give him an attribute certificate for each nation in the United Nations. Obviously the former solution generates a privacy theft as using this certificate, the diplomatic representative announce this special credential each time he uses the certificate and this could be used against him. The latter solution is unpractical as it requires a large batch of certificates and in other applications (e.g., day and city of birth) this would correspond to millions of certificates. In section 3.1 we show a solution based on a “trapdoor commitment”. This crypto gadget allows the attribute authority to assign to a party a special value for the field corresponding to the nationality. Then the party can only reveal his nationality to successfully complete the transaction. Instead, if a diplomatic representative receives from the attribute authority a specific trapdoor, then he will be able to use any nationality to successfully complete the transaction, and moreover the “verifier” will not notice any difference with respect to a transaction of a normal party. We need to deal with sets. There are more sophisticated, but still concrete and of practical use, examples where both current standard technologies and solutions based on trapdoor commitments fail. 2 This does not necessarily corresponds to the “country” field of X509 certificates since that value can be the nation in which the user lives, or the one in which he works, or the one in which he was born etc. It is therefore always possible to give a specific example where the “country” does not concern the appropriate credential. 37 Consider again the nationality issue discussed so far. We actually have in the world citizens with more than one nationality (i.e., more passports). Again, having all nationalities in the same attribute certificate would represent a privacy threat (i.e., if access to a service needs Italian nationality, communicating the Italian+X nationality could be used against the user if the service provider does not like citizens from X). Moreover having an attribute certificate for each nationality could be unpractical. It is possible to use again trapdoor commitments for solving this problem. In this case the authority will not give the trapdoor to the user but will only give him the information required to open that committed value to each nationality that belongs to the user. However, while in general a user has some nationalities, it could be the case that access to a system requires to show that the nationality is not a specific one (i.e., services from U.S. could be restricted to people that are not from countries considered dangerous for U.S. security). Thus having for each other country an attribute certificate that says that a user is not a citizen of that country would be really unpractical3 . We finally note that the solution based on wildcards does not work here as we do not want to give a party the power of claiming any nation, but only of affirmatively claiming some of them and negatively claiming the (exponentially many) other ones. In section 3.2 we show a solution based on “zero-knowledge sets”. This crypto gadget allows an attribute authority to assign a special value for the field corresponding to the nationality. Later the party can only reveal his nationalities and his non-nationalities to successfully complete the transactions. Moreover, this can be done even in case the size of the set of nationalities is exponential! Additionally, each time a nationality or nonnationality is shown, no additional information is learned by the service provider about the other nationalities/non-nationalities. This recently-introduced (it was introduced in [10] and later studied in [3] and in [2]) primitive is quite powerful and can be of interest in other important secure systems. The advances of the crypto world. X509 and attribute certificates simply use digital signatures and hash functions, crypto tools that were known long time ago. Stronger results have been achieved more recently, in particular the so called “anonymous credential systems” [9, 1]. These systems guarantee a strong privacy as a user can prove possession of credentials to many service providers still preserving unlinkability. Such a property is not enjoyed by standard technologies and even the ones proposed in this work. Indeed, we discuss the standard systems where the “same” certificate is potentially sent to different service providers. Another recent result that is more compatible with standard systems is that of “oblivious attribute certificates” [8] where only qualified users obtain the services but service providers do not distinguish qualified users from unqualified ones. However while anonymous credential systems and oblivious attribute certificates guarantee strong securities, their efficiency and usability is still controversial when access control policies are complex and the space of possible credentials is large. We finally stress that these systems would have a stronger impact on the current standard technologies with respect to our work that instead focuses on both integrating and improving current standards. We finally cite the notion of a “crypto certificate” introduced in [12, 13] where an encryption of 3 In some cases, the set can have an exponential number of elements where only few of them are credentials owned by a user (e.g., consider a k-bit string credential, there can be 2k possibilities but a user will have only poly(k) of them). 38 an attribute is stored in the certificate. The functionalities provided by crypto certificates are however properly included in the setting where trapdoor commitments are used instead. 3 PassePartout Certificates In this section we present our extensions for current X509 and attribute certificates in order to strength their usability, privacy and security. We use the term “PassePartout” to define a certificate that contains the special values that we discuss and that allow for much more powerful access-control based systems. The first extension is that of wildcards and is discussed in Section 3.1, while the second extension is that of zero-knowledge sets and is discussed in Section 3.2. 3.1 Adding Wildcards to Digital Certificates Here we show that an attribute certificate can contain a field that as value contains a string that does not immediately correspond to the credential, but however it binds the owner of the certificate to only one value (i.e., the credential). Such a value will be sent by the owner of the certificate in order to successfully complete the transaction. This game corresponds to a “commitment scheme” that we briefly discuss below. Commitment schemes. Intuitively a commitment scheme can be seen as the digital equivalent of a sealed envelope. If a party A wants to commit to some message m she just puts it into the sealed envelope. Whenever A wants to reveal the message, she simply opens the envelope. In order for such a mechanism to be useful, some basic requirements need to be met. The digital envelope should hide the message: no party other than A should be able to learn m from the commitment (this is often referred in the literature as the hiding property). Moreover, the digital envelope should be binding, in the sense that A can not change her mind about m, and, when checking the opening of the commitment, one can verify that the obtained value is actually the one A had in mind originally (this is often referred as the binding property). A trapdoor commitment scheme is a commitment scheme with associated a pair of public and private keys (the latter also called the trapdoor). Knowledge of the trapdoor allows the sender to open the commitment to any message of its choice (this is often referred as the equivocality property). On the other hand, without knowledge of the trapdoor, equivocality remains computationally infeasible. It is known how to construct an efficient trapdoor commitment scheme on top of the standard “discrete logarithm assumption” [11]. For additional details on commitment schemes, see Appendix A. Implementing wildcards with trapdoor commitments. We propose to extend the possible values of a field of an attribute certificate considering also trapdoor commitments. The public parameters of the scheme are chosen by the attribute authority that also adds to the certificate a trapdoor commitment. Moreover the authority gives to the user the information to “open” such a commitment to the credential he owns. A commitment can also be opened with different values, in this case the authority gives to the user additional 39 data that allow him to open the commitment to different values. Finally, the attribute authority can give the trapdoor to the user, in this case the user will be able to open the commitment as any value he wishes. The efficiency of the scheme proposed in [11] guarantees that such features can be used in practice without penalizing the efficiency of the system. 3.2 Adding Sets to Digital Certificates Zero-knowledge sets. In [10], Micali et al. introduced the concept of a zero-knowledge set. There a prover P commits to an arbitrary set S so that for any string x he can later prove to a verifier V that x ∈ S or x ∈ S. Such a proof is required to be both “sound” and “zero knowledge”. The former requirement preserves the security for the verifier since he can not be convinced of a false proof given by an adversarial prover. The latter requirement preserves the security for the prover since no adversarial verifier can learn more information than the mere truthfulness of the proved statements. In [2], on top of the work of [3] it is shown a general paradigm to construct efficient schemes for implementing zero-knowledge sets. For additional details, see Appendix B. Implementing sets with zero-knowledge sets. We propose to extend the possible values of a field of an attribute certificate considering zero-knowledge sets. The crucial point is that we want to give to a user a certificate that certifies both the credentials he owns and the ones he does not own, giving him the possibility of showing both possession and nonpossession. The public parameters of the scheme are chosen by the attribute authorities that also adds to the certificate the commitment to a zero-knowledge set. The user can therefore send his certificate, showing that actually it contains some given credentials and also showing that it does not contain some others credentials. We stress that a given attribute authority gives only one attribute certificate to user for a given type of credentials. This is the only way to give sense to a proof that a user does not posses a credential certified by a given authority, when it is not encoded in one attribute certificate released by that authority. References [1] J. Camenisch and A. Lysyanskaya. An Efficient Non-Transferable Anonymous MultiShow Credential System with Optional Anonymity Revocation. In Advances in Cryptology – Eurocrypt ’01, volume 2045 of Lecture Notes in Computer Science, pages 93–118. Springer-Verlag, 2001. [2] D. Catalano, Y. Dodis, and I. Visconti. Mercurial commitments: Minimal assumptions and efficient constructions. In 3rd Theory of Cryptography Conference (TCC ’06), Lecture Notes in Computer Science. Springer-Verlag, 2006. [3] M. Chase, A. Healy, A. Lysyanskaya, T. Malkin, and L. Reyzin. Mercurial commitments with applications to zero-knowledge sets. In Advances in Cryptology – Eurocrypt ’05, volume 3494 of Lecture Notes in Computer Science, pages 422–439. Springer-Verlag, 2005. [4] T. Dierks and C. Allen. The TLS Protocol Version 1.0. Network Working Group RFC 2246, 1999. 40 [5] S. Farrell and R. Housley. An Internet Attribute Certificate Profile for Authorization. Network Working Group, RFC 3281, April 2002. [6] A. O. Freier, P. Karlton, and P. C. Kocher. The SSL Protocol Version 3.0. Transport Layer Security Working Group, Internet Draft, 1996. http://home.netscape.com/eng/ssl3. [7] R. Housley, W. Polk, W. Ford, and D. Solo. Internet X509 Public Key Infrastructure: Certificate and Certificate Revocation List (CRL) Profile. Network Working Group, RFC 3280, April 2002. [8] Jiangtao Li and Ninghui Li. Oacerts: Oblivious attribute certificates. In Proceedings of 3rd Conference on Applied Cryptography and Network Security (ACNS), volume 3531 of Lecture Notes in Computer Science, pages 301–317. Springer Verlag, 2005. [9] A. Lysyanskaya, R. Rivest, A. Sahai, and S. Wolf. Pseudonym Systems. In Selected Areas in Cryptography (SAC ’99), volume 1758 of Lecture Notes in Computer Science. Springer-Verlag, 1999. [10] S. Micali, M. Rabin, and J. Kilian. Zero-knowledge sets. In 44th IEEE Symposium on Foundations of Computer Science (FOCS ’03), pages 80–91, 2003. [11] T. Pedersen. Non-interactive and information-theoretic secure verifiable secret sharing. In Advances in Cryptology – Crypto ’91, volume 576 of Lecture Notes in Computer Science, pages 129–140. Springer-Verlag, 1992. [12] P. Persiano and I. Visconti. User Privacy Issues Regarding Certificates and the TLS Protocol (The Design and Implementation of the SPSL Protocol). In 7th ACM Conference on Computer and Communications Security (CCS ’00), pages 53–62. ACM, 2000. [13] Pino Persiano and Ivan Visconti. A secure and private system for subscription-based remote services. ACM Transactions on Information and System Security, 6(4):472–500, 2003. [14] B. Ramsdell. S/MIME Version http://www.ietf.org/rfc/rfc2632.txt, 1999. A 3 Certificate Handling. Commitment Schemes A commitment scheme is a primitive to generate and open commitments. More precisely a commitment scheme is a two-phase protocol between two probabilistic polynomial time algorithms sender and receiver. In the first phase (the commitment one) sender commits to a bit b using some appropriate function Com which takes as input b and some auxiliary value r and produces as output a value y. The value y is sent to receiver as a commitment on b. In the second stage (called the decommitment phase) sender “convinces” receiver that y is actually a valid commitment on b. The requirements that we make on a commitment scheme are the following ones. First, if both sender and receiver behave honestly, then at the end of the decommitment phase receiver is convinced that sender had committed to bit b with probability 1. This is often referred as the correctness requirement. Second, a dishonest receiver can not guess b with probability significantly better than 1/2. This is the so-called hiding property. Finally, a cheating sender should be able to open a commitment (i.e., to decommit) with both b and 1 − b only with very small (i.e., negligible) probability (this is the binding property). For readability we will use “for all x” to mean any possible string x of length polynomial in the security parameter. We start with the standard notion of commitment scheme with its two main variants (i.e., unconditionally binding and unconditionally hiding). Note that all definitions will use a commitment generator function that outputs the commitment parameters. Therefore, such commitments have a straightforward implementation in the common reference string model where a trusted third party generates a reference string that is later received as common input by all parties. In some cases the commitment parameters can be deterministically extracted from a random string; in such cases the corresponding commitments can be implemented in the shared random string model which is a set-up assumption weaker than the common reference string model. For the sole sake of simplicity, in the following definitions, we consider the case in which the commitment parameters are used for computing a single commitment. However all the definitions can be extended (e.g., strengthening the computational indistinguishability so that it holds even in case the distinguisher has on input the trapdoor information) and then the same commitment parameters can be used for any polynomial number of commitments (and actually all our results hold in this stronger setting). Definition A.1 (Gen, Com, Ver) is a commitment scheme if: - efficiency: Gen, Com and Ver are polynomial-time algorithms; - correctness: for all v it holds that Prob crs ← Gen(1k ); (com, dec) ← Com(crs, v) : Ver(crs, com, dec, v) = 1 = 1; - binding: for any polynomial-time algorithm sender there is a negligible function ν such that Prob crs ← Gen(1k ); (com, v0 , v1 , dec0 , dec1 ) ← sender (crs) : Ver(crs, com, dec0 , v0 ) = Ver(crs, com, dec1 , v1 ) = 1) ≤ ν(k); - hiding: for all crs generated with non-zero probability by Gen(1k ), for all v0 , v1 where |v0 | = |v1 | the probability distributions: {(com0 , dec0 ) ← Com(crs, v0 ) : com0 } and {(com1 , dec1 ) ← Com(crs, v1 ) : com1 } are computationally indistinguishable. If the binding property holds with respect to any computationally unbounded algorithm sender, the commitment scheme is said unconditionally binding; if instead, the hiding property holds with respect to any computationally unbounded distinguisher algorithm receiver, the commitment scheme is said unconditionally hiding. We now give the definition of a trapdoor commitment scheme. 41 42 Definition A.2 (Gen, Com, TCom, TDec, Ver) is a trapdoor commitment scheme if Gen(1k ) outputs a pair (crs, aux), Gencrs is the related algorithm that restricts the output of Gen to the first element crs, (Gencrs , Com, Ver) is a commitment scheme and TCom and TDec are polynomial-time algorithms such that: - trapdoorness: for all v the probability distributions: {(crs, aux) ← Gen(1k ); (com, dec) ← Com(crs, v) : (crs, com, dec, v)} and {(crs, aux) ← Gen(1k ); (com , auxcom ) ← TCom(aux); dec ← TDec(auxcom , v) : (crs, com , dec , v)} are computationally indistinguishable. Other Notions of Commitments. In [3] Chase et al. considered two different ways for computing and opening commitments introducing the notion of mercurial commitment schemes. In such schemes, the sender is allowed to compute hard and soft commitments. An hard commitment is a classical unconditionally binding commitment. A soft commitment, on the other hand, can be teased (i.e., partially open) to any value by the sender, but can not be (fully) opened. In this sense, soft commitments are quite different than trapdoor commitments as they can be teased to any value but can not actually be opened to any of them. The sender can also tease an hard commitment as the same value that he can open. An important property of mercurial commitment schemes is that, by looking at a commitment, it is computationally infeasible to decide whether it is an hard or a soft commitment. More precisely, a mercurial commitment is secure if there exists a simulator that can produce commitments that it can later open or tease to any value and whose distribution remains indistinguishable with respect to the distribution of the commitments produced by the legitimate sender. Non-interactive mercurial commitments have been constructed, in the shared random string model, under the assumption that non-interactive zero-knowledge proof systems exist [3]. Mercurial commitments can be used to construct zero-knowledge sets by only adding the assumption that collision-resistant hash functions exist (see below). By looking at the properties of mercurial commitments it may seem that they are actually a more powerful primitive than hybrid commitments. This intuition may lead to explain the current gap between the complexity-based assumptions used to construct noninteractive mercurial commitments (i.e., NIZK proofs) and non-interactive hybrid trapdoor commitments (i.e., one-way functions) in the shared random string model. In this work, we show that this intuition is wrong by showing that non-interactive hybrid trapdoor commitments suffice for constructing non-interactive mercurial commitments in the shared random string model. Finally we define mercurial commitments. Definition A.3 (Mercurial Commitments) (Setup, Hard, Soft, Tease, Open, VerTease, VerOpen) is a mercurial commitment scheme if: - efficiency: The algorithms Setup, Hard, Soft, Tease, Open, VerTease and VerOpen run in polynomial-time. - correctness: Let crs be the output of Setup on input the security parameter k. For all messages v it holds that 43 Hard Correctness ⎡ ⎤ (com, dec) ← Hard(crs, v); ⎦=1 Pr ⎣ y ← Tease(v, dec); z ← Open(dec) : VerTease(crs, com, v, y) = 1 ∧ VerOpen(crs, com, v, z) = 1 Soft Correctness Pr (com, dec) ← Soft(crs); y ← Tease(v, dec) : VerTease(crs, com, v, y) = 1 =1 - binding: For any polynomial-time algorithm sender there is a negligible function ν such that (com, dec, v0 , v1 , y, z) ← sender (crs) : VerOpen(crs, com, v0 , z) = 1∧ ≤ ν(k) Pr (VerOpen(crs, com, v1 , y) = 1 ∨ VerTease(crs, com, v1 , y) = 1) ∧ (v0 = v1 ) - hiding: There exist four polynomial-time algorithms (Sim-Setup, Sim-Com, Sim-Open, Sim-Tease) described as follows: Sim-Setup - This algorithm takes as input a security parameter k and produces as output the common parameters crs and some auxiliary information aux. Sim-Com - On input aux, it computes a simulated commitment C = (com, dec). Sim-Open - On input a message m and dec, it outputs a simulated decommitment key π. Sim-Tease - On input a message m and dec, it outputs the simulated teasing τ . We require that for all polynomially bounded receiver there exists a negligible function ν such that Pr crs0 ← Setup(1k ) : receiverO0 (crs0 ) = 1 − Pr (crs1 , aux1 ) ← Sim-Setup(1k ) : receiverO1 (crs1 ) = 1 ≤ ν(k) where Ob operates as follows. 1. O0 initializes L as the empty list and answers hard commit, soft commit, tease and open queries as follows. On input (Hard, v) it computes (com, dec) = Hard(crs0 , v), stores (Hard, com, dec, v) in L and outputs com. On input (Soft, v) it computes (com, dec) = Soft(crs0 ), stores (Soft, com, dec, v) in L and outputs com. On input (Tease, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it retrieves from L the corresponding information. If com was an hard commitment on v , or if com was a soft commitment, O0 outputs y ← Tease(v , dec). Otherwise O0 outputs fail. On input (Open, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it retrieves from L the corresponding information. If com was an hard commitment on v , O0 outputs z ← Open(dec). Otherwise O0 outputs fail. 44 2. O1 initializes L as the empty list and answers the queries above as follows. On input (Hard, v) it computes (com, dec) = Sim-Com(aux1 ), stores (Hard, com, dec, v) in L and outputs com. On input Soft it computes (com, dec) = Sim-Com(aux1 ), stores (Soft, com, dec) in L and outputs com. On input (Tease, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it retrieves from L the corresponding information. If com was an hard commitment on v , or if com was a soft commitment, O1 outputs y ← Sim-Tease(v , dec). Otherwise O1 outputs fail. On input (Open, com, v ) it checks if com ∈ L. If not it answers fail, otherwise it retrieves from L the corresponding information. If com was an hard commitment on v , Ob outputs z ← Sim-Open(v , dec). Otherwise Ob outputs fail. B Zero-Knowledge Sets In [10], Micali et al. introduced the concept of a zero-knowledge set. There a prover P commits to an arbitrary set S so that for any string x he can later prove to a verifier V that x ∈ S or x ∈ S. Such a proof is required to be both sound and zero knowledge. The former requirement preserves the security for the verifier since he can not be convinced of a false proof given by an adversarial prover. The latter requirement preserves the security for the prover since no adversarial verifier can learn more information than the mere truthfulness of the proved statements. A light variation (actually extension) of zero-knowledge sets is that of zero-knowledge elementary databases where x is considered a key and v(x) is the corresponding datum. In this case, for any key x the prover either proves that x ∈ S or proves that x ∈ S ∧ v(x) = u, still preserving soundness and zero knowledge. We will focus on zero-knowledge sets but all the discussions and results extend also to zero-knowledge elementary databases. Mercurial commitments for zero-knowledge sets. In [3] by Chase et al. where the concept of a mercurial commitment is introduced along with their application for the construction of zero-knowledge sets. More specifically, zero-knowledge sets are constructed by using collision-resistant hash functions and mercurial commitments. The concept of mercurial commitments has been later investigated in [2] there a general paradigm for constructing efficient mercurial commitments (and thus efficient zero-knowledge sets) is presented. 45 A Graphical PIN Authentication Mechanism for Smart Cards and Low-Cost Devices∗ Luigi Catuogno Clemente Galdi Dipartimento di Informatica ed Applicazioni Università di Salerno - ITALY Dipartimento di Scienze Fisiche Università di Napoli “Federico II” - ITALY [[email protected]] [[email protected]] Abstract Passwords and PINs are still the most deployed authentication mechanisms and their protection is a classical branch of research in computer security. Several password schemes, as well as more sophisticated tokens, algorithms, and protocols, have been proposed during the last years. Some proposals require dedicated devices, such as biometric sensors, whereas, other of them have high computational requirements. Graphical passwords are a promising research branch, but implementation of many proposed schemes often requires considerable resources (e.g., data storage, high quality displays) making difficult their usage on small devices, like old fashioned ATM terminals, smart cards and many low-price cellular phones. In this paper we present an graphical mechanism that handles authentication by means of a numerical PIN, that users have to type on the basis of a secret sequence of objects and a graphical challenge. The proposed scheme can be instantiatiated in a way to require low computation capabilities, making it also suitable for small devices with limited resources. We prove that our scheme is effective against “shoulder surfing” attacks. Introduction Passwords and PINs are still the most deployed authentication mechanism, although they suffer of relevant and well known weakness [1]. The protection of passwords is a classical branch of research in computer security. Several important improvements to the old-fashioned alphanumeric passwords, according to the context of different applications, have been proposed in the last years. Indeed, literature on authentication and passwords is huge, here we just cite Kerberos [13] and S/Key [6]. Two important aspects in dealing with passwords are the following: 1. Passwords should be easy enough to be remembered but strong enough in order to avoid guessing attacks; 2. The authentication mechanism should be resilient against classical threats, like shoulder surfing attacks, i.e., the capability of recording the interaction of the user and the terminal; moreover, it should be light enough to be used also on small computers. ∗ This work was partially supported by the European Union under IST FET Integrated Project AEOLUS (IST-015964). 46 Consider for example the following scenario. For accessing an ATM services, a user needs a magnetic strip card. In order to be authenticated, the user pushes her card (that carries only her identification data) in the ATM reader and types her four digit PIN; afterwards, the ATM sends the user’s credentials to the remote authentication server through a PSTN network. This approach is really weak. Magnetic strip cards can be easily cloned and, PIN numbers can be collected in many ways. For example, an adversary could have placed a hidden micro-camera pointing to the ATM panel somewhere in the neighborhood. A recent tampering technique is accomplished by means of a skimmer, i.e., a reader equipped with an EPROM memory that is glued upon the ATM reader, so that strips of passing cards can be dumped to the EPROM. A forged spotlight is also placed upon the keyboard in order to record the insertion of the PIN. The skimmer allows adversaries to collect a finite number of user sessions obtaining all information needed to clone user cards. Such information, coupled with the images taken by the camera, allow the attacker to correctly authenticate to the ATM. Such attack is known in literature as “shoulder surfing” attack. Graphical passwords [2, 10, 3, 7, 8, 12, 16, 9, 11, 15] are a promising authentication mechanism that faces many drawbacks of old-style password/PIN based scheme. The basic idea is to ask the user to click on some predefined parts of an image displayed on the screen by the system, according to a certain sequence. Such a method has been improved during the last years, in order to obtain schemes offering enhanced security and usability. Despite its importance, few attention has been devoted to graphical password schemes resilient to shoulder surfing attacks. In particular [11] first addressed this problem under restricted conditions. Subsequently, in [15] presented a graphical password scheme that was claimed to be secure against shoulder surfing attacks. However, this scheme has been proved not to be secure in [5]. For a wider overview about research on graphical passwords, we suggest the reader to take a look at the survey by Suo et al. [14] and visit the web site of the “Graphical Passwords Project”[4] at Rutgers. The majority of proposed schemes require costly hardware (e.g., medium or high resolution displays and graphic adapters, touch screen, data storage, high computational resources etc.). This makes some of the proposed schemes not suitable to be implemented on low cost equipments (e.g., current ATM terminals that are still the overwhelming majority). In this paper we propose a graphical PIN scheme based on the challenge-response paradigm that can be instantiatiated in a way to require low computation capabilities, making it also suitable for small devices with limited resources. The design of the scheme follows three important guidelines: • The scheme should be independent from the specific set of objects that are used for the graphical challenge. In particular, our scheme can be deployed also on terminals that are equipped with small sized or cheap displays like the ones of the cellular phones, or through the classical 10 inch CRT monitor (both color or monochrome) that still equips thousands of ATM terminals. Moreover, user responses should be composed as well by any sophisticated pointing device as by simple keypad. • The generation of challenges and the verification of user’s responses should be affordable also by computer with limited computational resources (e.g. as in the “smart card scenario” described above). 47 • The user is simply required to recognize the position of some objects on the screen. She is not required to compute any function. We present a strategy that can withstand shoulder surfing attacks. This strategy is independent from the specific set of objects that are used to construct the challange. 1 Our Proposal In this paper we assume that the terminal used by the user cannot be tampered. In other words, an adversary is allowed to record the challanges displayed by the terminal and the activity of the user but she is not allowed to alter in any way thebehaviour of the parties. The protocols described in this paper belong to the family of challenge and response authentication schemes, where the system issues a random challenge to the user, who is required to compute a response, according to the challenge and to a secret shared between the user and the system. More precisely, a challenge consists of a picture depicting a random arrangement of some objects (e.g. colored geometrical shapes) into a matrix. The challenge is displayed on the screen. We denote by O the set of all distinct objects and by q its cardinality. A challenge is represented as a sequence α = (o1 , . . . , o|α| ), where oi is an object drawn from O. During her authentication session, the user is required to type as PIN the position of a sequence of secret objects in the challenge matrix. It is clear that the PIN typed by the user changes in each session as the challenge changes, since it is simply the proof that the user knows the secret sequence of objects and so, she can correctly reply to the current challenge. To be more precise the secret is a sequence of m questions, called queries. Each query is a question of the following type: “On which row of the screen do you see the object o?”. Since questions are chosen independently, the set of possible queries has size |O|m . Upon reception of a challenge, the user is required to compute a response, according to the secret queries shared with the system. A response is a vector β = (β1 , . . . , βm ), where each βi is a number drawn from a set A = {0, 1, . . . a − 1}, representing the answer to the i-th query, according to the challenge. A Session Transcript is a pair τ = (α, β), where α is a challenge and β is the user response to α. We stress that the set of objects used to construct the challanges has an impact on the usability of the scheme. For example, it is easier to remember a sequence of pictures like “home, dog, cat” than a sequence of geometrical shapes, like “blue traingle, green circle, yellow square”. On the other hand, complex objects cannot be displayed/managed on lowcost devices. Our scheme is independent from the specific set of objects. This makes it is suitable for deployment both on complex and simple devices. 1.1 Different authentication strategies. Give the above authentication scheme, we have analized three different authentication strategies. In the first strategy, the user is required to correctly answer all the questions in her secret. A second strategy is to allow the user to correctly answer only to a subset of her secret questions. We have considered the case in which the user correctly answers at least k out of m questions of her choice while she is allowed to give random answers to the remaining queries. the last strategy we have analyzed consists in requiring the user to correctly answering exactly k out of n queries while giving wrong answers to the remaining ones. 48 Notice that the last two strategies differ in the sense that wrong answers do give information about the user secret in contrast to random answers that do not give any information on the user secret. For the above strategies we have evaluated the probability with which an adversary can extract the user secret as a function of the number of recorded sessions. Notice that the goal of the adversary may not be the secret extraction but, more simply, a one-time authentication. We notice that, typically, in the scenario we consider the adversary cannot use a “brute force” attack since, for example, the strip card would be disabled after three unsuccessfull authentications. For this reason the adversary should recover either the whole secret or “almost” the whole secret, before trying the authentication. 2 Experimental Evaluation In this section we give an experimental evaluation of the performances of the strategies presented above. For each strategy we report the number of session trascripts that the adversary needs to intercept for extracting the user secret with probability either .95 or .99. In order to present concrete examples, we will fix the number of objects to be either 36 or 80. The value 36 has been chosen so that all the object can be displayed on a low resolution display, e.g., the ATM case. The value 80 could be used in case the device used for displaying the objects is a more advanced one. Furthermore, we fix the number m of queries the user should answer to 15. This choice is due to the fact that (a) It should not be hard for a human to remember 15 objects and (b) The probability of a blind attack is negligible. Table 1 summarizes the performance of the first strategy in which the user correctly answers to all the questions in her secret. In particular, we report the number of sessions the adversary needs in order to compute successfully the user’s secret either with probability 0.95 or with probability 0.99. Always Correct q=36, a=2 q=36, a=6 q=80, a=6 q=80, a=8 p = 0.95 14 6 15 5 p = 0.99 16 7 17 6 Table 1: Number of sessions needed to extract the user secret with probability at least p in case m = 15 query case. As for the second strategy, the results in reported in Table 2 are referred to the single query case. This means that the adversary needs to collect at least x correct answers from the user. If we extend to the multiple query case, we need to consider that in each session, the user answers correctly only to a fraction of the queries. The value of the fraction of correct anwers depends, for some technicalities, on the size of the answer set A. As for the multiple query case, Table 3 reports the expected number of sessions that the adversary needs to collect in order to extract the user’s secret. The last column indicates the probability with which the user correctly answers a query. The multiple query case is strictly related to the Group Coupon Collector’s Problem. Since we are not aware of any result on such problem, 49 we have obtained these results by simulation. Correct & Random q=36 a=2 q=36 a=6 q=80 a=2 q=80 a=8 p = 0.95 10 4 11 4 p = 0.99 12 5 13 5 Table 2: Number of correct sessions needed to extract the user secret with probability at least p in the single query case. Correct & Random q=36, a=2 q=36, a=6 q=80, a=2 q=80, a=8 p = 0.95 15 8 17 11 p = 0.99 18 11 20 15 c 3/2 3 3/2 4 Table 3: Number of sessions needed to extract the user secret with probability at least p in case m = 15 query case. As for the last strategy, let c be the number of questions the user correctly answer in each authentication. Table 4 reports the number of sessions an adversary needs to collect in order to extract the user secret with probability at least 0.95 or 0.99. Correct & Wrong q=36, a=2 q=36, a=2 q=36, a=6 q=36, a=6 q=80, a=2 q=80, a=2 q=80, a=8 q=80, a=8 p = 0.95 16 24 10 16 16 24 10 16 p = 0.99 20 36 12 16 24 36 10 16 c m/2 m/4 m/2 m/4 m/2 m/4 m/2 m/4 Table 4: Number of sessions needed to extract the user secret with probability at least p in case m = 15 query case. 3 Conclusion In this paper we have presented a simple graphical PIN authentication mechanism that is resilient against shoulder surfing. Our scheme is independent on the spcific set of objects used to construct the challanges. Depending on the specific strategy, the adversary may fail in impersonating the user even if she manages to obtain as much as 36 transcripts. The scheme may be implemented on low cost devices and does not require any special training 50 for the users. The user only needs to remember a small sequence of objects. Finally the authentication requires a single round of interaction between the user and the terminal. We have also discussed a prototype implementation. The analysis of the scheme considers the probability of extracting the user’s secret instead of the one of successful “one-time” authentication. Since the number of attempts the adversary can try before the user is disabled is limited to three, we believe that the number of sessions needed by the adversary in the latter case does not differ significantly from the one needed for the former goal. References [1] Ross J. Anderson. Why cryptosystems fail. Commun. ACM, 37(11):32–40, 1994. [2] G. E. Blonder. Graphical passwords. Lucent Technologies Inc, Murray Hill, NJ (US), US Patent no. 5559961, 1996. [3] R. Dhamija and A. Perring. ”dèjá vu: A user study using images for authentication”. In IX USENIX UNIX Security Symposium, Denver, Colorado(USA), August, 14-17 2000. [4] J. C. Birget et al. Graphical password project. http://clam.rutgers.edu/ birget/grPssw, 2002. [5] Philippe Golle and David Wagner. Cryptanalysis of a cognitive authentication scheme (short paper). In 2007 IEEE Symposium on Security and Privacy, to appear. [6] Neil M. Haller. The S/KEY one-time password system. In Proceedings of the Symposium on Network and Distributed System Security, pages 151–157, 1994. [7] W. Jensen, S. Gavrila, V. Korolev, R. Ayers, and R. Swanstrom. ”picture password: a visual login technique for mobile devices”. In National Institute of Standards and Technologies Interagency Report, volume NISTIR 7030, 2003. [8] I. Jermyn, A. Mayer, F. Monrose, M. K. Reiter, and A. D. Rubin. ”the design and analysis of graphical passwords”. In Proceedings of the 8th USENIX security Symposium, Washington D.C. (US), august 23-26 1999. [9] Shushuang Man, Dawei Hong, and Manton M. Matthews. A shoulder-surfing resistant graphical password scheme - wiw. In Proceedings of the International Conference on Security and Management, SAM ’03, June 23 - 26, 2003, Las Vegas, Nevada(US), volume 1, pages 105–111, June 2003. [10] A. Perrig and D. Song. ”hash visualization: A new technique to improve real-world security”. In ”Proceedings of the 1999 Internationa Workshop on Cryptographic Techniques and E-Commerce”, 1999. [11] Volker Roth, Kai Richter, and Rene Freidinger. A pin-entry method resilient against shoulder surfing. In CCS ’04: Proceedings of the 11th ACM conference on Computer and communications security, pages 236–245, New York, NY, USA, 2004. ACM Press. [12] L. Sobrado and J. C. Birget. ”graphical password”. ”The Rutgers Scholar, an electronic Bulletin for undergraduate research”, 4, 2002. 51 [13] Jennifer G. Steiner, B. Clifford Neuman, and Jeffrey I. Schiller. Kerberos: An authentication service for open network systems. In USENIX Winter, pages 191–202, 1988. [14] Xiaoyuan Suo, Ying Zhu, and G. Scott Owen. ”graphical passwords: a survey”. In Proceedings of 21st Annual Computer Security Application Conference (ACSAC 2005) december 5-9, Tucson AZ (US), pages 463–472, december 2005. [15] Daphna Weinshall. Cognitive authentication schemes safe against spyware (short paper). In IEEE Symposium on Security and Privacy, pages 295–300. IEEE Computer Society, 2006. [16] S. Wiedenbeck, J. Waters, L. Sobrado, and J. C. Birget. Design and evaluation of a shoulder-surfing resistant graphical password scheme. In Proceedings of Advanced Visual Interfaces AVI 2006, Venice ITALY, may 23-26 2006. 52 SMTP sniffing for intrusion detection purposes Maurizio Aiello*, David Avanzini*, Davide Chiarella†*, Gianluca Papaleo†* *National Research Council, IEIIT, Genoa †University of Genoa, Department of Computer and Information Sciences, Italy Abstract. Internet e-mail has become one of the most important ways for people and enterprises to communicate with each other. However, this system, in some cases, is used for malicious purposes. A great problem is the worm and spam spreading. A smart e-mail content checking system can help to detect these kinds of threats. We propose a way to capture, store and display e-mails transactions through SMTP packet sniffing. We worked on pcap files dumped by a packet sniffer containing SMTP traffic packets of a real network. After reassembling the TCP streams and SMTP commands, we store the captured e-mails on a database: for privacy reasons, only e-mails headers are stored. Having a tool for clear understanding and monitoring SMTP transaction may help in manager security tasks. Keywords: SMTP, E-mail, Intrusion Detection, Network Security, Worm Detection, sniffing. 1 Introduction Nowadays e-mail has already changed people’s life and work habits thoroughly: in fact the majority of people use the e-mail system to share any type of information ranging from business purposes to illegal ones. A pestering and energy draining problem for mail domain administrators is answering to users' requests regarding e-mail sent that never arrived to destination, or messages that he should have received but he didn't. However the main and most important problem related to e-mail utilization is virus and spam diffusion: software like Spamassassin [1] make an e-mail content checking in order to limit spam while Amavis [2] scans e-mail attachments for viruses using third-party virus scanners in order to detect viruses, but both have to be installed in every mail server in order to work. In all these cases the use of a packet sniffer can be a good solution to the problem; in fact it centralizes the work, enabling administrators to monitor all e-mail traffic of several servers with just one installation, making possible to build a gateway defense system and permitting to identify all the SMTP traffic, including e-mails sent to unknown servers or from infected hosts (trojans, viruses etc.). In fact, in some cases the misconfiguration of firewalls permit a normal user to install and use a SMTP server on his own machine without administrator authorization or to use a SMTP server to send mail that is not internal to the company. The disadvantage of our solution is that the use of ESMTP-TLS and PGP make the packet sniffer useless. While PGP is a server independent solution, ESMTP is bounded to both servers: source and destination have to use ESMTP with TLS enabled to permit an encrypted e-mail traffic; if one of the two servers doesn’t have TLS enabled, the communication is not encrypted. 53 The purpose of this paper is to present a smart solution to check e-mail for intrusion detection purposes. We propose a program based on SMTP flux reassembling[3][9][10] that can be useful for e-mail auditing and intrusion detection purposes. 2 Packet sniffing To allow our system to work we need information about packets within the network we are interested to monitor. We implemented a packet sniffing program to analyze the SMTP traffic using the Perl library Net::Pcap [4] . The sniffer process begins by determining which interface to sniff on. The function lookupdev() is used to get the network interface and the function open_live() to set the interface to promiscuous mode for sniffing. Then, the function compile() and setfilter() are used to set a filter to get the packet; we can filter the packet sniffing on the port 25: this is the standard port in which mail servers are listening for connections. If we have multiple networks, we can furthermore filter our traffic, picking the network we are interested to. The function dump_open() allows the sniffer to dump the captured packets in a pcap file. Finally the function loop() allow the sniffer to start. The pcap files are then passed to the reassembler: reassembler can be located in the same machine of the sniffer or, due to the SMTPSniffer modularity, in another host. To allow the reassembler to read pcap file in on-line mode (almost in real time), the sniffer creates pcap files every n packets sniffed on the network. The pcap files are overlapped to be sure not to lose packets between the different slices. The dimension of pcap files is set by the user. The architecture of the system is shown in Figure 1. Figure 1. System architecture 3 Reassembler The next phase of our system is to take pcap file produced by the sniffer and reconstruct the whole e-mail sent by a client. To do this operations we use the Perl library Net::Analysis [5]. To allow SMTPSniffer to work in online mode, a scheduler check every minute if there are new pcap files to check and then passes the new pcap files to the reassembler. The scheduler has been build, using Schedule::Cron [6] library. First of all, for every packet captured we store in a file the source IP and port, destination IP and port, the timestamp and the data part of the packet. Timestamp 54 represents the number of seconds between the present date and the Unix Epoch (January 1st, 1970). Figure 2. Flux hash table Once we obtained all information interested about TCP packet, we have to reconstruct every SMTP session. To do this, we use two hash tables: the first hash table is called flux (Figure 2) and represents all the SMTP sessions. Figure 3. Timeflux hash table The second hash table is called timeflux (Figure 3), it contains a list of ordered timestamp referred for every smtp session. The second hash table is useful to reorder all the SMTP fluxes.. For a better view the timestamp format of Timeflux hash table in Figure 3 is YYYY/MM/DD hh:mm:ss. During normal operations, if we find a TCP packet related to SMTP port 25, we check if the key in the flux already exists: in positive case we append the data part of the packet in the value of flux hash table, otherwise we create a new entry both in flux hash table and timeflux hash table. Once we have information of all e-mails sent, the next step is to reorder all the fluxes related to their timestamp (this information is taken in timeflux hash table). For privacy reason all the parts about body message are cut. 55 4 Database features Once we have reassembled the flux stream of our e-mails, the system dump the information about e-mails in a MySQL database. The info about e-mails stored in the database are: Timestamp (converted in a human readable format), mail client name (obtained by a reverse lookup), IP mail client, IP mail server, sender address, receiver address, SMTP code (e.g. 250, 450), e-mail size. The usefulness to have all the mail stored in a database is the capability to search information we are interested in at any time through db-queries. For example, you can find how and which e-mails have been sent by a user or to a particular user. Another choice is to see the whole e-mail traffic in a certain period of time or listing all the e-mails rejected by your e-mail server. In Figure 4 we can see an example of e-mails captured and stored in our database. Figure 4. Database example 5 Test scenario We tested SMTPSniffer in a network segmented in more subnets with a total of about 400 hosts. We compare the result between reassembling the SMTP session analyzing a single big pcap file (off-line mode) and reassembling SMTP session analyzing more pcap files overlap slices (on-line mode) and we were able to reconstruct exactly all the e-mails. In Figure 5 we can see a scheme of our test scenario. The SMTPSniffer PC is attached to our network through two interfaces; one interface is connected to the hub to allow SMTPSniffer to capture all the traffic outgoing the network, (the interface is set in promiscuous mode), the other interface is connected to the switch to communicate with hosts in our network. The purpose to have an interface connected to the switch is to control in remote mode the SMTPSniffer pc. The hub is not strictly required, in fact you can connect the promiscuous interface directly to a switch mirror port. Figure 5. Scenario schema 56 6 Security purposes SMTPSniffer is, as already said, a powerful tool for a system administrator, because it allows a fast check of e-mail traffic in real time. This fact takes a lot of possible choices. SMTPSniffer can be used for intrusion detection purposes, and in particular for indirect worm detection [7]. At the present, in fact, worms continue to improve in terms of their sophistication and detrimental effect and by exploiting the benefits of email system they spread very widely and very fast, exhausting network resources. When a worm infects a host, it tries to send the greatest amount of e-mail in the shortest time interval: this behaviour can’t pass unnoticed because SMTPSniffer lists the e-mail activity and so you can try to discover why a host is sending a lot of emails. It is possible to collect the data stored in the database to detect anomalies using statistical methods. Through DB-Query you can filter the data before analyzing them, getting a more efficient result and making possible an instant raw anomaly detection. In fact worm spreading hurry produces a lot of e-mail rejected by the mail-server and if you filter the traffic through a query that consider only rejected e-mails (in database the field status must be 450 or 550) and then analyze the various peaks, there’s a lot of chance to identify worm activity on your network. Moreover all these features permits you to anticipate the antivirus patch: in fact the speed at which viruses spread is faster than the speed to develop and distribute virus signatures for anti-virus protection. 7 Wormpoacher integration In [7] we describe a worm detection technique and system. The program we are developing using these techniques is called Wormpoacher. Wormpoacher use a tool for log mail analysis called LMA [8]. LMA is bounded to the specific mail server, if we install a mail server not supported by LMA we aren’t able to allow Wormpoacher work properly. Moreover many worm use its own SMTP engine to propagate and in this case, Wormpoacher isn’t able to analyze this type of traffic. To overcome the LMA limits, we can add SMTPSniffer feature in Wormpoacher. This operation is very easy to achieve, because Wormpoacher architecture is completely modular. Clearly, if we want to analyze communications between our hosts (internal communications) we have to place the sensor inside our internal network: the same concepts holds if we want LMA work on different servers. 8 Conclusion In this paper we discussed a program useful to store and display all the e-mails sent in a local network. It can be used to prevent attack actions within local network or to investigate mail servers misconfigurations or mail traffic firewalls settings. You can view all e-mails almost in real time through the db feature and, through various kind of queries, it is possible to perform different mail analysis. SMTPSniffer can be integrated in Wormpoacher in order to perform worm detection and it can overcome the LMA constraints in order to have a total view of network mail configuration and activity. 57 Acknowledgments This work was partially supported by National Research Council of Italy, University of Genoa and PRAI-FESR Programme, Innovative Actions of Liguria. REFERENCES [1]. http://spamassassin.apache.org/ [2]. http://www.amavis.org/ [3]. Wang Zhimin, Jia Xiaolin.. Restoration and audit of Internet e-mail based on TCP stream reassembling. Communication Technology Proceedings, 2003. ICCT 2003. International Conference on Volume 1, 911 April 2003 Page(s):368 - 371 vol.. 1 [4]. Net::Pcap - Interface to pcap LBL packet capture library http://search.cpan.org/dist/Net-Pcap/Pcap.pm [5]. Net::Analysis - Modules for analysing network traffic http://search.cpan.org/~worrall/Net-Analysis0.06/lib/Net/Analysis.pm [6]. Schedule::Cron - cron-like scheduler for Perl subroutines http://search.cpan.org/~roland/Schedule-Cron0.97/Cron.pm [7]. Maurizio Aiello, David Avanzini, Davide Chiarella, Gianluca Papaleo. Worm Detection Using E-mail Data Mining, 2006. PRISE 2006, Primo Workshop Italiano su PRIvacy e SEcurity. [8]. Maurizio Aiello, David Avanzini, Davide Chiarella, Gianluca Papaleo. Log Mail Analyzer: Architecture and Practical Utilization, 2006. [9]. Shishi LIU, Jizshou SUN, Xiaoling ZHAO, Zunce WEI. A general purpose application layer IDS Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on Volume 2, 47 May 2003 Page(s):927 - 930 vol.2 [10]. Xiaoling Zhao, Jizhou Sun, Shishi Liu, Zunce Wei. A parallel algorithm for protocol reassembling Electrical and Computer Engineering, 2003. IEEE CCECE 2003. Canadian Conference on Volume 2, 47 May 2003 Page(s):901 - 904 vol. 2 58 The general problem of privacy in location-based services and some interesting research directions Claudio Bettini Sergio Mascetti Linda Pareschi DICo, Università di Milano (Extended Abstract) The proliferation of location-aware devices will soon result in a diffusion of location-based services (LBS). Privacy preservation is a challenging research issue for this kind of services. In general, there is a privacy threat when an attacker is able to associate the identity of a user to information that the user considers sensitive. In the case of LBS, both the identity of a user and her sensitive information can be possibly derived from requests issued to service providers. More precisely, the identity and the sensitive information of a single user can be derived from requests issued by a group of users. Figure 1 shows a graphical representation of this general view of privacy threats in LBS. In order to prevent an attacker from associating a user’s identity to her sensitive information, the ongoing research in this field is tackling two main subproblems: prevent the attacker from inferring the user’s identity and prevent the attacker from inferring the user’s sensitive information. Since the general privacy threat is the association of a user’s identity with her sensitive information, in order to protect the user’s privacy it is sufficient to prevent the attacker from inferring either the identity or the sensitive information. Hence, despite the solution of one of the two subproblems is sufficient to guarantee user’s privacy, we argue that the solution of both subproblems could enhance better techniques for privacy protection. However, the quality of the provided service could be affected by the introduction of stronger mechanisms for the privacy preservation. Indeed, the obfuscation of requests parameters usually involved in privacy protection techniques implies a degradation of the quality of service. A location based privacy preserving system that implements solutions for both the subproblems can combine them in order to to optimize quality of service while preserving privacy. Most of the approaches proposed in the literature to protect LBS privacy consider scenarios that can be easily mapped to the one depicted in Figure 2. It involves three main entities: • The User invokes or subscribes to location-based remote services that are going to be provided to her mobile device. 59 Figure 1: General privacy threat in LBS • The Location-aware Trusted Server (LTS) stores precise location data of all its users, using data directly provided by users’ devices and/or acquired from the infrastructure. • The Service Provider (SP) fulfills user requests and communicates with the user through the LTS. Figure 2: The reference scenario. In our model each request r is processed by the LTS resulting into a request r with the same logical components but appropriately generalized to guarantee user’s privacy. Requests, once forwarded by the LTS, may be acquired by potential attackers in different ways: they may be stolen from SP storage, voluntarily published by the trusted parties, or may be acquired by eavesdropping on the communication lines. On the contrary, the communication between the user and the LTS is considered as trusted, and the data stored at the LTS is not considered accessible by the attacker. Research directions. Our current research effort is focused on the problem of preventing the inference of the user’s identity; we call this problem the LBS identity privacy problem. A different research direction focuses on the problem of preventing the inference of the user’s sensitive information; when sensitive information is the specific user location or can be inferred 60 (a) single-issuer (b) multiple-issuers Figure 3: The static case from that information, this problem is called LBS location privacy problem and has been addressed, among others, in [5, 7]. In order to resolve the identity privacy problem, several contributions have proposed different techniques [4, 9, 6, 2]. The common idea is to enforce the issuer of a request to be anonymous. This means that an attacker must not be able to associate any request to its issuer with likelihood greater than a threshold value. A particular case studied in these papers is when the attacker can acquire a single request issued by the user. More specifically, this case assumes that: (i) the attacker is not able to link a set of requests i.e., to understand that the requests are issued by the same (anonymous) user; (ii) the attacker is not able to reason with requests issued by different users. In general, we can distinguish privacy threats according to two orthogonal dimensions: a) threats in static versus dynamic cases , b) threats involving requests from a single user (single-issuer case) versus threats involving requests from different users (multiple-issuer case). Figure 3(a) shows a graphical representation of the privacy threat in the static, single-issuer case. In all single-issuer cases, the LTS must ensure that an attacker cannot associate a request with its issuer with likelihood greater than a threshold value. Different papers addressed the LBS identity privacy problem in this case proposing different technical solutions. In [2] we presented a formal framework to model this problem; Moreover, in [8] we have compared and empirically evaluated the solutions proposed in the literature as well as new algorithms that we have devised. Example 1 shows that, in the multiple-issuer cases, users’ anonymity may not be sufficient to guarantee their privacy. Example 1 Suppose a user u issues a request generalized into r by the LTS. Assume that, considering r , an attacker is not able to identify u with likelihood greater than a value h in the set S of potential issuers. However, 61 if many of the users in S issue requests from which the attacker can infer the same sensitive information inferred from r , then the attacker can associate that sensitive information to u with likelihood greater than h. In the area of databases, the analogous privacy issue is known as ldiversity problem. In the area of LBS, the problem is depicted in Figure 3(b). In the multiple-issuer case the attacker can gain information from the requests issued by different users, and the static case imposes that a single request is considered for each user. We were probably the first to study this problem in the area of LBS; Our preliminary results can be found in [1]. In contrast with the static case, in the dynamic case it is assumed that the attacker is able to recognize that a set of requests has been issued by the same (anonymous) user. Several techniques exist to link different requests to the same user, with the most trivial ones being the observation of the same identity or pseudo-id in the requests. We call request trace a set of requests that the attacker can correctly associate to a single user. Figure 4 shows a graphical representation of the dynamic case. The corresponding techniques to preserve privacy are facing two problems. First, preventing the attacker from linking the requests (called linking problem); Indeed, the longer is a trace, the higher the probability of the issuer to loose her anonymity. Second, preventing the attacker from understanding the identity of the issuer (based, for example, on external knowledge about the position of users at different times) with likelihood greater than a threshold value. In [3] we introduced the notion of historical k-anonymity to formally model the dynamic, single-issuer case and we investigated how different techniques for solving the linking problem and the identity privacy problem can be combined for protecting the user’s privacy. From those preliminary results we are working towards a general formal model and privacy protection techniques for the dynamic case, eventually covering also the multiple-issuer case. References [1] Claudio Bettini, Sushil Jajodia, and Linda Pareschi. Anonymity and diversity in lbs: a preliminary investigation. In Proc. of the 5th International Conference on Pervasive Computing and Communication (PerCom). IEEE Computer Society, 2007. [2] Claudio Bettini, Sergio Mascetti, X. Sean Wang, and Sushil Jajodia. Anonymity in location-based services: towards a general framework. In Proc. of the 8th International Conference on Mobile Data Management (MDM). IEEE Computer Society, 2007. 62 Figure 4: The dynamic case [3] Claudio Bettini, X. Sean Wang, and Sushil Jajodia. Protecting privacy against location-based personal identification. In Proc. of the 2nd workshop on Secure Data Management (SDM), volume 3674 of LNCS, pages 185–199. Springer, 2005. [4] Marco Gruteser and Dirk Grunwald. Anonymous usage of location-based services through spatial and temporal cloaking. In Proc. of the 1st International Conference on Mobile Systems, Applications and Services (MobiSys). The USENIX Association, 2003. [5] Marco Gruteser and Xuan Liu. Protecting privacy in continuous location-tracking applications. IEEE Security & Privacy, 2(2):28–34, 2004. [6] Panos Kalnis, Gabriel Ghinta, Kyriakos Mouratidis, and Dimitri Papadias. Preserving anonymity in location based services. Technical Report B6/06, National University of Singapore, 2006. [7] Hidetoshi Kido, Yutaka Yanagisawa, and Tetsuji Satoh. An anonymous communication technique using dummies for location-based services. In Proc. of the International Conference on Pervasive Services (ICPS), pages 88–97. IEEE Computer Society, 2005. [8] Sergio Mascetti and Claudio Bettini. A comparison of spatial generalization algorithms for lbs privacy preservation. In Proc. of the 1st International Workshop on Privacy-Aware Location-based Mobile Services (PALMS). IEEE Computer Society, 2007. [9] Mohamed F. Mokbel, Chi-Yin Chow, and Walid G. Aref. The new casper: query processing for location services without compromising pri- 63 vacy. In Proc. of the 32nd International Conference on Very Large Data Bases (VLDB), pages 763–774. VLDB Endowment, 2006. 64 Bottom up approach to manage data privacy policy through the front end filter paradigm. Gerardo Canfora, Elisa Costante, Igino Pennino, Corrado Aaron Visaggio Research Centre on Software Technology University of Sannio – 82100 Benevento Abstract An increasing number of business services for private companies and citizens are accomplished trough the web and mobile devices. Such a scenario is characterized by high dynamism and untrustworthiness, as a large number of applications exchange different kinds of data. This poses an urgent need for effective means in preserving data privacy. This paper proposes an approach, inspired to the front-end trust filter paradigm, to manage data privacy in a very flexible way. Preliminary experimentation suggests that the solution could be a promising path to follow for web-based transactions which will be very widespread in the next future. Keywords: data privacy, front end trust filter. Introduction The number and the complexity of processes which are accomplished throughout the web are increasing. Confidential data are more exposed to be collected lawlessly by humans, devices or software. The actors involved are often autonomous systems with a high degree of dynamism [15]; negotiations are performed among multiple actors, and cross the boundaries of a single organization [10]. As a consequence, privacy of personal and confidential data is exposed to several threats [13]. Different technologies have been ideated in order to face this problem, such as: anonymization [16], fine grain access control (FGAC) [2], data randomization and perturbation [9]. These solutions show some limitations when applied in contexts characterized by high dynamism and a few opportunities to control data exchange: they are scarcely scalable, they cannot be used in untrustworthy transactions, or they propose too invasive data access mechanisms, which hinder flexibility. This discussion is largely achieved in the related work section. The realization and the adaptation of a data privacy policy is a process of transformation, which spans from the definition of strategies to properly protect data up to the design of a supporting technology which implements the established policies. Such process includes three main stages. At a first stage, a data privacy policy is described in natural language in a document which contains the rules to disclose sensitive data. At a second stage, the general policy must be refined in specific strategies, in order to understand which kinds of actions could be performed on certain categories of data by some categories of users, and under which conditions. Finally, the established strategies need to be implemented with a suitable technology ensuring that accesses to the data repository happen accordingly with the strategy. This paper proposes a three-layered approach which aims at facilitating the management of data privacy in such a scenario. The main purpose is to provide the data manager with the capabilities of: 65 1. translating the privacy policies, expressed in a natural language, into low-level protection rules, directly defined on database fields; 2. providing the database with an adaptive protection, which is able to change accordingly to: (i) the current state of the database, and (ii) to the knowledge that the user acquires by aggregating the information achieved throughout the submitted queries over time. The paper proceeds as follows: in the next section related work is discussed; in the following section the solution is presented. Thus, the results of the experimentation are provided; and, finally, conclusions are drawn. Related Work Different technologies have been proposed to preserve data privacy, but some of them, which could be properly adopted in many contexts, could be scarcely effective in highly dynamic systems. The W3C Consortium developed P3P [17]. It provides a method that permits a Web Site to codify within an XML file the purposes for which data are collected. It is based on confronting privacy preferences between information provider and requester. P3P is used by different web browsers and lets web site to express the privacy policy with a standard structure: the server according to this structure can choose if deliver data or not. P3P synthesizes the purposes, treatment modes and retention period for data, but it does not guarantee that data are used accordingly to the declared policies. Consequently it may be successful in trusted environments. Researchers of IBM proposed the model of Hippocratic database[1]: it supports the management of information sharing with third parties. It establishes ten rules for exchanging data; relying on these rules, queries are re-written, data are obfuscated and cryptography is in place, when needed. Hippocratic databases use metadata for designing an automatic model for privacy policy enhancement, named Privacy Metadata Schema. This technique degrades performances, as at each steps purposes and user authorization must be checked at each transaction. Memory occupation is a further matter, as the metadata could grow up fast. The fine grain access control (FGAC)[2], is a mechanism designed for a complete integration with the overall system infrastructure. Constructs which implement this method must: (i) assure that access strategies are hidden to users; (ii) minimize the complexity of policies; and (iii) guarantee the access to tables’ rows, columns, or fields. Traditional implementation of FGAC use static views. This kind of solution could be used only when constraints on data are few. Further solutions, like EPAL [3] and the one proposed in [14], allow actors of a transaction to exchange services and information within a trusted context. The trust is verified throughout the exchange of credentials or the verifications of permissions to perform a certain action. Anonymization techniques let organizations to retain sensible information, by changing values of specific table’s fields. The underlying idea is to make data undistinguishable, as happens in the kanonymity algorithm [16], throughout the perturbation of values within records. Another techniques require to make data less specific, as happens in the generalization [5]. This technique affects seriously data quality and may leave the released data set in vulnerable states. Further mechanisms of data randomization and perturbation [9] hinder the retrieval of information at individual level. These techniques are difficult to implement, as they are based on complex mathematics, and however are invasive both for data and applications. Cryptography is the most widespread technique for securing data exchange [8], even if it shows some limitations: high costs for governing distribution of keys, and low performances in complex and multi-users transactions. Definitions For a better understanding of this work it is necessary to give the following definitions: 66 x x A privacy policy defines the sensitive data whose access must be denied; it is captured as a set of purposes. A protection rule (pr) defines if the result set can be disclosed (Legal rule) or not (Illegal rule). For example, let’s consider the following rules: a. NO SELECT Fiscal_Code, Surname FROM Person; b. SELECT Age, Zip_Code FROM Person. The rule (a) is not legal and establishes that the couple Fiscal_Code – Surname cannot be disclosed. Otherwise it doesn’t explicitly deny the access to the single attributes. Vice versa the rule (b) makes attributes Age and Zip_Code of the table Person accessible whether in pairs or singularly. The state of the database is time dependent and it is defined by the informative content of the database. It can be modified by means of insert, delete and alter operations. Depending on the database state, the privacy policy could be enforced or made less restrictive, as vulnerabilities and threats to privacy preservation could rise or disappear. Approach Two complementary approaches could be followed in order to meet the goal: x Top-down, that derives a set of protection rules by the privacy strategy. x Bottom-up, that allows the rules definition from the analysis of vulnerability and the aggregation inference. The system acts like a filter between the user applications interrogating the database to protect and the database itself, captures the submitted queries, compares them with the protection rules and decide if they are to allow or to block. The Top-down approach suffers a major weakness: the rules can be eluded, by exploiting specific vulnerabilities of the database or, more simply, taking advantage of the flexibility of SQL that allows to write a single query in a lot of ways. Moreover, the growth of the user’s knowledge can entail the generation of new protection rules. The goal of the bottom-up approach is to solve these problems. Query Filtering The goal of the filtering is to establish if a query is: Start query q x is legal is attack x No Yes Legal (to allow), when it doesn’t disclose sensitive information; Illegal (to block) when it tries to access to protected data. Yes q is blocked q is allowed Stop Figure 1 – Filtering algorithm To make this possible it is necessary to evaluate if the handed out query (q, from here on) matches with a protection rule (pr, from here on). As showed in figure 1, the filtering process can be divided in three steps: 67 1. Query submission; 2. Search for a pr, belonging to the illegal catalog, which matches the q; if a correspondence is found, the query is blocked, otherwise it is proceeded with step 3; 3. Search for a matching of the q with a pr belonging to the legal catalog; if a correspondence occurs, the query is forwarded to the database. In order to recognize the correspondence between a query and a rule, the algorithm Result Matching has been formulated. Such a comparison is based on the interrogation result rather than on the syntax used to write it. By this way it’s guaranteed that more queries expressed in different ways and disclosing the same data, are considered equivalent and thus blocked, as well. When a user submits a query, the system evaluates if at least one rule that involves the same tables of the query exists; and than forwards the found rules and the query to the database, capture the result set of the rules and the query , and, finally, compare them. Analysis of Acquired knowledge If there is no matching between the query and the set of rules, the system must establish if the obtained result set can be disclosed on the base of the information already released. In order to make this decision, the system must estimate if the aggregation of the information that the user has acquired through the previous queries and the information released with the last query violate the established privacy policy. As a matter of fact, a sensitive information can be often composed by more information with a less sensibility degree. For instance, let’s consider the following illegal rule, which denies the spreading of the information about which patients are affected by Aids or Tuberculosis: - NO SELECT Diagnosis, Patient FROM Illness WHERE Diagnosis = ‘Aids’ OR Diagnosis = ‘Tuberculosis’; (r) and the submission of this two different queries combinations: x x { (q1) SELECT Diagnosis FROM Illness; (q2) SELECT Patient FROM Illness; } { (q3) SELECT Diagnosis FROM Illness ORDER BY Diagnosis; (q4) SELECT Patient FROM Illness ORDER BY Diagnosis; } As showed in figure 2, the combination of q3 and q4 is more dangerous than the combination of q1 e q2. The latter, in fact, allows to match the patient’s id to his illness, because the result sets are ordered by the same criteria. Conversely, q1 and q2 do not expose any sorting rationales. Figure 2 – Possible Resultset Aggregation 68 Log & Quarantine The knowledge given by q4 is harmful only if the information released by q3 has been already obtained and vice versa. That means that it is not compulsory to block both queries to avoid the violation of the rule r, but it is enough to block only the last submitted one. It is necessary to track the history of user’s interrogations over time in order to get a complete picture of the overall knowledge acquired by the user. However, all the queries forwarded to the database will be logged in a file together with the corresponding information about their success achieved or missed. When a query is submitted, if it does not match with any rule, i.e. it does not belong to the illegal catalog, the system evaluates if it can disclose sensitive information and alerts the administrator. To do this, the system combines the current query with the previous allowed ones (described in the log file), formulating a new query that represents the aggregation. If this query is not matched with an illegal rule, the current user query is allowed, otherwise it is suspended in a quarantine status. The whole filtering algorithm is described in figure 3. The administrator can decide for each single suspended query if it is to block or Figure 3 – Complete filtering algorithm to allow in the future, generating a new protection rule. An “in-vitro” experimentation has been carried out, in order to validate the approach, whose outcomes are encouraging and stimulated new directions for future research: the next steps consists of realizing a system for modeling the data domain from a privacy preservation perspective and a system to capture the knowledge acquired by each user over time, in order to limit exploits based on the inference. Experimentation The experimentation aims at evaluating the approach’s effectiveness, in order to estimate the robustness of the data protection offered, as the semantic flexibility of SQL could let cheating the adopted mechanisms to preserve data privacy; moreover, the experimentation is headed to estimate the performances degradation of the system, in terms of response time, while the catalogued rules’ set grows up. The figure 4 shows the databases used as experimental vitro. In order to test the effectiveness of the Result Matching algorithm, an experiment has been realized, which consisted of evaluating the percentage of blocked queries –which is expected to be the 100%- within a set of forwarded queries Figure 4- Experimental Vitro to the target database. 69 For each database have been formulated: x x x x 4 rules on 1 attribute of 1 table; 4 rules on 2 attributes of 1 table; 4 rules on 4 attributes of 1 table; 4 rules on 2 attributes of 2 tables. For each rule, 4 equivalent queries have been written and their effects have been observed. The matching algorithm proved to be well-built and particularly effective to face up to SQL flexibility. As a matter of fact the algorithm successfully achieved blocking the overall set of queries. The second part of the experiment helped analyze how the performances of the solution changed, in terms of response time, with correspondence to an increasing of both the rules’ number and of the Resultset size. It is important to recall that to make possible the result matching it is necessary to submit to the database both the query to analyze and the rules’ set against which the query is confronted, as not all the rules in the catalogue are involved when filtering a query. In order to carry out a more consistent experiment, rules that involve the same tables of the query have been formulated and catalogued. The following queries, with a growing number of attributes (and so Resultset size), is analyzed: x x x x x Query1: Query2: Query3: Query4: Query5: SELECT fiscal_code FROM person SELECT fiscal_code, name FROM person SELECT fiscal_code, name, surname FROM person SELECT fiscal_code, name, surname, birth_place FROM person SELECT fiscal_code, name, surname, birth_place, nationality FROM person All the protection rules refer to the table PERSON, that has the following schema: x PERSON(fiscal_code, name, surname, sex, birth_place, nationality) time(ms) For each query, the response times have been measured with correspondence to a catalogue with, respectively, 10 (5 legal and 5 illegal), 50 (25 legal and 25 illegal) and 80 (40 legal and 40 illegal) rules. Consider that all the queries were allowed with exception of Query4 and Query5 that were blocked when the catalogues containing 50 and 80 protection rules were used. The following graphs show the obtained results. It’s possible to observe that Query1, Query2 and Query3 have the same trend, Response Time that is: the response time increases with the growth of 120 the catalogued rules’ number, because they produced the 100 same outcomes, namely they 80 Query1 are allowed at all. Query2 As expected, the 60 Query3 performances seem to decrease Query4 proportionally with the growth Query5 40 of the catalogue’s size, but the 20 proportional factor could be not equal to one. In fact, 0 10 50 number of rules 80 70 Figure 4 – Response time corrispong to an increasng of the rules’ set corresponding to a 500%growth of the rule’s number, it was recorder a 25% increment of the response time. Moreover, corresponding to an 800% growth of the rules’ number, it was observed a 40% increment of the response time. Concerning Query4 and Query5, it’s possible to observe a different behavior: the response times are smaller then the previous ones, because they match with an illegal rule. This means that there is a fewer number of comparisons to accomplish. Compared Rules during Filtering 16 Number of submitted rules 14 12 Query1 10 Query2 8 Query3 Query4 6 As well as observed in figure 5, when the catalogue counted 80 rules, only 15 were actually used for comparison, which means that in the worst case less then 20% rules out of the catalogue size are effectively considered in the analysis. Qeury5 4 2 0 10 50 80 Number of catalogued rules Figure 5 – Effectivly compared rules Conclusion With the growing migration of services towards the net, privacy should be managed within environments characterized by high dynamism: multiple applications are able to access different data sources, without having in place trust-based mechanisms. As such scenarios foresee a high scalability and a loose control, existing solutions for data privacy management could be unfeasible, too costly or scarcely successful. This work introduces a novel approach to data privacy, inspired to the paradigm of front end trust filtering. According to this approach the data privacy is managed in a way which aims at reducing control on transactions exchanging data set, while keeping a high level of robustness in preserving data privacy. The proposed solution implements a bottom-up approach, which relies on the comparison of the result set produced by the forwarded query and the one containing the information which should be banned, accordingly to the established privacy policy. Furthermore, this solution helps discover new queries which could menace the privacy of data, but are not included in the catalogue’ rules, throughout the quarantine management policy. A preliminary experimentation was carried out in order to prove the effectiveness and the efficacy of the approach. It emerged that the system is able to successfully face the semantic flexibility of the SQL, and the degradation of performances with the growing of rules’ number is limited to the 20% for the worst case. As future work we are planning a larger experimentation in order to detect further weakness points of the solution and identify improvement opportunities. 71 References [1]. Agrawal R., Kiernan,J., Srikant R., and Xu Y., 2002, Hippocratic databases. In VLDB, the 28th Int’l Conference on Very Large Database. [2]. Agrawal R., Bird P., Grandison T., Kiernan J., Logan S., Rjaibt W., 2005 Extending Relational Database Systems to Automatically Enforce Privacy Policies. In ICDE’05 Int’l Conference on Data Engineering, IEEE Computer Society. [3]. Ashley P., Hada S., Karjoth G., Powers C., Schunter M., 2003. Enterprise Privacy Authorization Language (EPAL 1.1). IBM Reserach Report. (available at: http://www.zurich.ibm.com/security/enterprice-privacy/epal – last access on 19.02.07) [4]. Bayardo R.J., and Srikant R., 2003. Technology Solutions for Protecting Privacy. In Computer. IEEE Computer Society. [5]. Fung C.M:, Wang K., and Yu S.P., 2005. Top-Down Specialization for information and Privacy Preservation. In ICDE’05, 21st International Conference on Data Engineering. IEEE Computer Society. [6]. Langheinrich M.,2005. Personal privacy in ubiquitous computing –Tools and System Support. PhD. Dissertation, ETH Zurich. [7]. Machanavajjhala A., Gehrke J., and Kifer D., 2006. l-Diversity: Privacy Beyond k-Anonymity. In ICDE’06 22nd Int’l Conference on Data Engineering . IEEE Computer Society. [8]. Maurer U., 2004. The role of Cryptography in Database Security. In SIGMOD, int’l conference on Management of Data. ACM. [9]. Muralidhar, K., Parsa, R., and Sarathy R. 1999. A General Additive Data Perturbation Method for Database Security. In Management Science, Vol. 45, No. 10. [10].Northrop L., 2006. Ultra-Large-Scale System. The software Challenge of the Future. SEI Carnegie Mellon University Report (available at http://www.sei.cmu.edu/uls/ – last access on 19.02.07). [11].Oberholzer H.J.G., and Olivier M.S., 2005, Privacy Contracts as an Extension of Privacy Policy. In ICDE’05, 21st Int’l Conference on Data Engineering. IEEE Computer Society. [12].Pfleeger C.R., and Pfleeger S.L., 2002. Security in Computing. Prentice Hall. [13].Sackman S., Struker J., and Accorsi R., 2006. Personalization in Privacy-Aware Highly dynamic Systems. Communications of the ACM, Vol. 49 No.9.ACM. [14].Squicciarini A., Bertino E., Ferrari E., Ray I., 2006 Achieving Privacy in Trust Negotiations with an Ontology-Based Approach. In IEEE Transactions on Dependable and Secure Computing, IEEE CS. [15].Subirana B., and Bain M., 2006. Legal Programming. In Communications of the ACM, Vol. 49 No.9. ACM. [16].Sweeney L., 2002. k-Anonymity: A model for Protecting Privacy. In International Journal on Uncertainty, Fuzziness and Knowledge Based Systems, 10.. [17].Platform for Privacy Preferences (P3P) Project, W3C, http://www.w3.org/P3P/ (last access on January 2007). 72 Intrusion Detection Systems based on Anomaly Detection techniques Davide Ariu, Igino Corona, Giorgio Giacinto, Roberto Perdisci e Fabio Roli University of Cagliari DIEE Electrical and Electronic Engineering Department Piazza d’Armi, 09123 Cagliari (Italy) {davide.ariu, igino.corona, giacinto, roberto.perdisci, roli}@diee.unica.it 1 Introduction Statistical Pattern Recognition approaches are currently investigated to provide an effective tool for Intrusion Detection Systems (IDS) based on Anomaly Detection. In particular, our activities are mainly aimed at the study and development of statistical Pattern Recognition approaches, and Multiple Classifier Systems (MCS) for devising advanced techniques for detecting anomalies (i.e., potential intrusions) in the traffic over a TCP/IP network [1, 2]. These techniques showed also the ability of hardening Anomaly Detection Systems in the presence of malicious errors [3]. New methodologies for clustering alarm messages from various IDS have been also proposed [4]. Recently, anomaly detection techniques based on Hidden Markov Models (HMM) have been proposed for detecting intrusions by analysing the commands exchanged between hosts for a given application (e.g., FTP, SMTP, etc.) [6]. With the partnership of Tiscali S.p.A., this research activity produced an implementation of a module for Snort (the most important open source IDS, http://www.snort.org) that implemented an anomaly detector based on Hidden Markov Models. In particular, the module has been developed for the FTP (File Transfer Protocol), and the SMTP (Simple Mail Transfer Protocol) services. 2 State of the art Hidden Markov Models [5] have been successfully applied in a numer of pattern recognition applications, and only recently they have been also applied to Intrusions Detection problems. HMM have been used for Intrusion Detection thanks to their ability to model time-series using a stateful approach, where the role and meaning of the internal states are “hidden”. The vast majority of studies that proposed the HMM to implement IDS are related to Host Based systems, i.e., IDS that analyze the actions performed on a single host to detect attempts of intrusion. Only few works have proposed HMM for analysing network traffic, by representing the traffic at the packet level. The application of the Hidden Markov Models for structural inference in sequences of commands exchanged between host at the application level, appears very interesting and still unexplored. 73 3 The Proposed HMM-based IDS In order to perform anomaly detection by HMM, we propose to infer the structure of the sequences of legitimate commands for application protocols, e.g. the FTP and SMTP protocols. The basic assumption is that sophisticated attacks realized using these services, may exhibit “anomalous” sequences of commands exchanged between a host client and a host server. We use the term anomalous for those sequences that are structurally different from those that can be considered legitimate or normal. These anomalies may be caused either by the attacker trying to perform a number of exploits, or by the characteristics of the steps completed during an attack. It is easy to see that it is necessary to define the criterion upon which a sequence of commands should be considered as legitimate. In the following, two different techniques for the definition of legitimate sequences, are proposed. The technique described in section 3.1 is implemented in the module developed for Snort (and used in the reported experiments on SMTP traffic), while in section 3.2 the technique used in the reported experiments on the FTP traffic is described. Both techniques produce valid results and can be used alternatively. 3.1 Legitimate sequences are those that are accepted by the server Definition 1 (sequences accepted) A sequence of n commands SEQ = {c1 , c2 , c3 , . . . , cn } from one specific client to one specific server (in a connection) is considered accepted only if for every command ci in the sequence, there is a corresponding positive server response1 (i = 1, . . . , n). Following the definition (1), we require that a legitimate sequence of commands must necessarily be accepted by the server. If the server successfully replied to all the commands in the sequence, thus it means that such a sequence is in agreement with the protocol rules, and therefore (with the exception of implementation bugs) it is considered to be a legitimate sequence. For each service, e.g., FTP, and SMTP, an HMM is trained on a training set of accepted sequences using the Baum Welch algorithm. In the operational phase, each HMM assigns a value of probability of normality to each of the analyzed sequences of commands, thus “rating” the structural likeness between the observed sequence, and those supplied during the training phase. An alarm is then generated if the value of probability assigned to the sequence is smaller than a fixed decision threshold. Such a threshold is estimated on the training set and depends on the confidence of the hypothesis that all the training sequences are legitimate. 3.2 Legitimate sequences are those that are not filtered by a signature based IDS) In order to identify legitmate sequences of commands, the training traffic is first analysed by a signature based IDS, e.g. Snort. Then, the set of legitimate sequences is made up of all the sequences that have not raised alarms. This training set is then used to train an HMM (or an ensemble of HMMs) according to the same technique outlined in the previous section 3.1. 1 R(ci , statei ) is the function that determines the response ri of the server, having as inputs the i-th command ci , and the term statei , that is the state of the connection at time i, which depends on the initial state and on all the past i − 1 commands. 74 Table 1: Results attained on the FTP dataset using a set of HMM with 20 hidden states. Performances are evaluated in terms of the Area Under Curve (AUC), the False Alarm rate (FA), and the Detection Rate (DR). 4 Experimental Results In this section we present a summary of the experimental results on a set traffic data provided by Tiscali SpA. The traffic data is related to two protocols, namely the FTP, and the SMTP protocols. In particular, FTP data are related to sequences of commands generated by the users that upload/download their personal Web pages, while SMTP sequences are generated by the users sending/receiving email messages. 4.1 FTP traffic In the training phase, we chose to make up the dictionary of symbols (i.e., the set of commands) by including all the symbols in the training set. We then added a special symbol called NaS (Not a Symbol) in account of symbols in test data that were not present in the training set (further details can be found in [6]. The traffic in the test set is made up of normal sequences, as well as 20 attacks created by some automatic tools (e.g., IDS informer). The training set is made up of 32,000 sequences (subdivided into ten smaller subsets of 3,200 sequences), while the test set is made up of 8,000 sequences. As the performances of HMM are sensitive to the training set, and to the initial values of the parameters (usually randomly chosen), in this work we explored the performances attained by combining an ensemble of HMM, each one trained on different portions of the training set. To this end, we used three techniques for combining the outputs of HMM, namely the Arithmetic Mean, the Geometric Mean, and the Decision Templates. This solution allowed to attain low false alarm rates, and high detection rates [6]. Table 1 shows the performances of the best solution in a number of trials, where 100 HMM are trained, and each HMM has 20 hidden states. Results are evaluated in terms of the AUC (i.e., the area under the ROC curve), false alarm rate and detection rate. Reported results clearly show that high vaues of AUC can be attained by combining the HMM by the Decision Template technique. If the decision threshold is set to the value that produces the 1% false alarm rate on the training set, the false alarm rate on the test set is always smaller that 1%. Thus, the threshold estimated on the training set produce similar results on the test set. With 1% false alarm rate on the training set, the combination of HMM by the geometric mean provides the highest performances. 4.2 SMTP traffic The SMTP traffic provided by Tiscali has been divided into a training set, and two validation sets. Then, two test sets have been created by using each validation set, and the traffic generated 75 Test set I II Anomaly Detection Alarms Validation set Attack set Unknown Comand Unknown Comand Command Order Command Order 67 95 6 1 111 182 6 1 Table 2: Results attained on the SMTP dataset using a HMM with 10 hidden states. Performances are related to test sets I and II. The number of alarms generated by the signature-based IDS has been approximately 1/3 with respect to those generated by the HMM-based module. by the Nessus vulnerability scanner against a SMTP victim. The total traffic used for training is made up of 5,500 SMTP sessions, while the test traffic is made up of 5,500 legitimate sessions, and 22 attack sessions. Table 2 shows the higher ratio between the number of alarms raised and the related number of sessions in correspondence of the attacks set (7/22), than the same ratio computed in correspondence of legitimate traffic (162/5,500, and 293/5,500). Thus, it can be claimed that the proposed anomaly based module is able to discriminate between normal traffic, and attacks. It is worth noting, however, that a number of alarms were also raised by a signature-based IDS on legitimate sessions in the validation set. The HMM-based module allowed detecting attacks that involved the use of commands not in the training set (6 unknown commands on 7 alarms). In addition, HMM raised an alarm whenever the order of commands was suspect. Further analysis is needed to test this module with more sophisticated attack sequences, as Nessus does not actually complete an attack sequence, but it just aims at uncovering a vulnerabilty in the tested service. 5 Conclusions We showed how anomaly detection can be perfomed at the network level by stateful techniques based on HMM. This detection mechanism is very promising, and showed good tradeoff between detection rate and false positive rate. Further improvements are expected with SMTP/FTP commands content analysis. References [1] G. Giacinto, F. Roli, L. Didaci, Fusion of multiple classifiers for intrusion detection in computer networks, Pattern Recognition Letters, 24(12), 2003, pp. 1795-1803 [2] G. Giacinto, R. Perdisci, M. Del Rio, F. Roli, Intrusion detection in computer networks by a modular ensemble of one-class classifiers, Information Fusion (in press) [3] R. Perdisci, G. Gu, W. Lee, Using an Ensemble of One-Class SVM Classifiers to Harden Payload-based Anomaly Detection Systems. ICDM 2006 [4] Perdisci, R., Giacinto, G., Roli, F., 2006. Alarm clustering for intrusion detection systems in computer networks. Engineering Applications of Articial Intelligence 19 (2006) 429-438 [5] L.R. Rabiner , A tutorial on Hidden Markov Models and selected applications in speech recognition., Proc. of IEEE, vol.77(2), pp.257-286, February 1989. 76 [6] D. Ariu, G. Giacinto, and R. Perdisci, Sensing attacks in Computers Network with Hidden Markov Models Proc. of MLDM 2007, Leipzig (D), July 18-20, 2007 (in press) 77 A Statistical Network Intrusion Detection System D. Adami, C. Callegari, S. Giordano, and M. Pagano Dept. of Information Engineering, University of Pisa {d.adami,christian.callegari,m.pagano,s.giordano}@iet.unipi.it 1 Introduction In this paper we present the design, the implementation, and the validation of a network Intrusion Detection System (IDS) [1], based on anomaly detection techniques. The system, which has been designed as an original modification of well-tested approaches, relies on supervised learning techniques. Given a training dataset, the IDS is able to build a model of the normal behavior of the network. Mainly for this aspect, the system is named Self-Learning Intrusion Detection System (SLIDS)[2]. 2 SLIDS SLIDS is a software anomaly based IDS, composed of several modules. The modular implementation has been chosen, because it guarantees scalability and extensibility. The major feature of SLIDS architecture (see figure 1) is that, differently from most of the current state of the art IDSs, more than one approach for detecting anomalies has been used. In the following subsection we provide a description of the most important system modules. 2.1 TCP/Markov Module The TCP/Markov module is based on the idea that TCP connections can be modeled by Markov chains [3]. SLIDS calculates one distinct matrix for each application (identified on the basis of the destination port number). The module only considers a few fields of the packet headers, more precisely the IP destination address and the destination port numberto (to identify a connection), and the TCP flags (to identify the chain transitions). During the training phase, the module reconstructs TCP connections and associates a value Sp = syn + 2 · ack + 4 · psh + 8 · rst + 16 · urg + 32 · f in to each packet. Figure 1: SLIDS Architecture 78 Thus, each connection is represented by a sequence of symbols Si . These symbols can be considered as the states of a Markov chain. Hence, the modules calculates the transition probabilities matrix A, where aij = P [qt+1 = j|qt = i] = P [qt = i, qt+1 = j] P [qt = i] In the detection phase the TCP/Markov module uses a sliding window (dimension T ) mechanism to process the packets. Thus, when processing the packets, the module computes T symbols S = {SR+1 , SR+2 , · · · SR+T } and estimates the probability P [S|A], where A is the matrix obtained in the training phase. Actually, the system calculates the logarithm of the Likelihood Function (LogLF ) and its “temporal derivative” DW (where the default value for the parameter W is 3). LogLF (t) = T +R t=R+1 Log(aSt St+1 ) W 1 Dw (t) = LogLF (t) − LogLF (t − i) W i=1 A sequence of non-expected symbols produces a low probability: an anomaly determines a rapid decrease in the LogLF and a peak in the DW . If the DW is bigger than a threshold (set by means of Monte Carlo simulation), a security event is generated. The security event has an anomaly score, which is calculated as Log(DW ). This module is very efficient in terms of memory and processing requirements. Given the nature of TCP, the number of actual flags configurations, identified during the training phase, is usually less than ten, which implies the storage of a matrix composed of less than one hundred elements (400B) for each application. It is then necessary to store, during the detection phase, T bytes corresponding to the T flags values inside the sliding window (simulations have shown that a small T, around 30, is necessary to reveal an intrusion quickly). The ease in the computation of the Likelihood Function, clearly shows that this method can be applied to on-line detection. After this phase the packets are forwarded to the TCP/SLR Module. 2.2 TCP/SLR and ICMP/SLR Modules The SLR modules construct a protocol-specific rule-set, by analysing the first 64 bytes of each training packet [4], i.e. IP and TCP or ICMP headers plus some bytes of the payload. The first 64 bytes of the training packets, considered 2 by 2, are called attributes. An attribute is defined as Ai and its value is called vi . The training phase of this modules consist of constructing a random set of conditional rules of the form if A1 = v1 , A2 = v2 , . . . Ak = vk then Ak+1 ∈ V = {vk+1 , vk+2 , . . . vk+r } where A1 = v1 , A2 = v2 , . . . Ak = vk is the antecedent, while Ak+1 is the consequent. During the detection phase, the SLR modules analyze the first 64 bytes of each packet and check if they break some rules. If so, the modules calculate an anomaly score for each broken rule. The total anomaly score is the sum of all those calculated for each rule. As for the previous module, this approach is very efficient in terms of memory and processing requirements. Experimental results have shown that, after the pruning procedure, a total of about one hundred rules are generated, which means that the memory occupation is of the order of few KBs. Moreover, once the rule set has been constructed, the system works exactly as a rule-based IDS, which is suitable for on-line detection. 79 (a) Different modules (b) SLIDS Figure 2: Experimental Results 3 Performance Analysis In this section we discuss the tests carried out to evaluate the performance of SLIDS. The results highlight that the combined use of different modules detect more anomalies than classic IDSs. In particular, for TCP packets, the two approaches, TCP/SLR and TCP/Markov, detect different anomalies, thus the combined use of both of them takes to a significant improvement in the IDS performance. The experimental results, obtained using the 1999 DARPA evaluation data set [5], highlight that the combined use of different modules detect more anomalies than classic IDSs. The alarm threshold in SLIDS has been set, by means of Monte Carlo simulation, so as to obtain a false alarm rate of about 10%. Figures 2(a) and 2(b) show the results of our tests; the values inside the graph bars represent the number of detected/not detected attacks. The first figure shows attacks detected by the different modules, while Fig.6 shows the attacks revealed by SLIDS as a whole. As it can be seen, the use of several modules, working in parallel on the packets, allows us to obtain much better performance than using only a single approach. As an example, let us consider the Data attacks: the use of a single module allows to detect at most one half of the attacks, while the complete system is able to detect all the Data attacks. The same consideration is still valid for the detection rate of all the attacks: the maximum achievable with a single module (TCP/SLR) is 52.3%, while using the whole system it is about 78%. To evaluate the performance we used a Receiver Operating Characteristic (ROC), which plots true positive rate vs. false positive rate. Figure 3 shows that the system can achieve really good results, detecting the 80% of the attacks with only 10-15% of false alarms. Another test session has been carried out using actual traffic traces, captured within our testbed network in November 2005. This analysis was performed to evaluate the capability of SLIDS to face with current security attacks. The ROC curve obtained in this test session is analogous to the one shown in 3. The test session carried out on real traffic has confirmed the good performance obtained on the DARPA dataset. The system has detected the 80% of the attacks with a false alarm rate of 15%. 4 Conclusions In this paper we have presented the design and the implementation of SLIDS, an Anomaly based NIDS. Such IDS has been realized taking into account extensibility and flexibility features. The modularity of the system allows the network administrator to customize SLIDS, 80 Figure 3: Roc Curve according to the behavior of the network. Moreover, parallel use of several modules permits to identify a wide range of attacks. Acknowledgments This work was partially supported by the RECIPE project founded by MIUR. References [1] Denning D.E., An Intrusion-Detection Model IEEE, Transactions on Software Engineering, February 1987. [2] Adami, D., Callegari, C., Giordano, S., Pagano, M., Design, Implementation, and Validation of a Self-Learning Intrusion Detection System, IEEE/IST MonAM 2006. [3] Nong Ye, et al., Robustness of the Markov-Chain for Cyber-Attack Detection, IEEE, Transactions on Reliability, March 2004. [4] Mahoney, M., A Machine Learning Approach to Detecting Attacks by Identifying Anomalies in Network Traffic, [5] MIT, Lincoln Laboratory, http://www.ll.mit.edu/IST/ideval/ DARPA 81 evaluation intrusion detection, !"#$%'!* + /; ; ! " ## " $ &* " + < > ? @\^`{ >? ! ! ## " |&@|`}~~{@{ `` 82 * " * & @}{ * " &* & + + ><<? ! @}{ ! " & < " " " "$ "#$%% "$ &#'' * "$ $ 83 " "$ & +"$ "$ " " -+"#$%% +"$/6-"#$%% "$/ ">!&?7 *& +"$ % ∈ +"$%" 7 % > = ← * ? & [ > ] ← [ > ] " " 7 " ?<" " ?< @}{ ! <~ " " "& >* & 8 • • X • • X ! • Y[> ? • * • > ? * • >?? 84 " X X 9#Y[* 9 ; 9 < $ ~*~ ~^*^^\^ 97=>? # X X " $ $ * +$ + 97=>? ;Y[ " $ $ ~}~~*\^` ~~*}^` `~~*\^` ~~~~*~^` 97=>?<* " 8 ?< " " ?< 85 $ • ! • " ! >* " @}{ ?<&! <~ ?< " = = = ? + ?< " ! >* ?< ?& & + = & ?= ?< = & ? ?< = & @~{ @^{ " } " >* ?< ?%@ > & + + + * 86 & + $~~~~ ~~ " " $ & @~{ } ">* >? }~~ >? `^~~ >?}~~ >?`^~~ ! # ! ; ?< > ?< &! <~ ? } " ! # ?<\ J % + U J U ^\ ^ ^~ `^ }\\ ~} } Q }^} ^\ ^ ` \ `~ ! ; ]^ J % + U J U Q \ ^~ \` }\\ \` }^} \} \ ` `^ \ ~} ~ ^\ `~ " + `~ ?< `" ]^ 87 ^ ! ! < ?< " " ! < ?< ]^ 7 7 # # ; <' X Z' [ \' ]' ^ #_ ##' #;' #< #X' #Z #[ #\ #] #^ ; < X Z [ \ ] ^ #_ ## #; \$?< `\~\`\\ `\~} ~~* ~\* `\~\`\^ `\~} ~~* ~\*` `^} `\~} ~~* ~\*\ $ `\~} $ $ \ ?< +}^} ! 88 ` & " & * " > @{ $&+~~~ `j!|~*j* $ ++ >++?\>\?}~^ @{ ||$+ ¡~~~ @\{ @{ @^{ @{ @{ @}{ <<< <<``~~< | ¡ ``} <<<* <<`}~^~\* ~~~ ~ * ¢+ >~~~? ~~~ > } $ + *}~~+$ $~~> ! > $ >$?++ }^~\ ¡ ~~ > > * $ + $>$+$? ¡ ~~\ > | >*$ ++ *\* 89 Embedded Forensics: An Ongoing Research about SIM/USIM Cards Antonio Savoldi and Paolo Gubian Department of Electronics for Automation University of Brescia Via Branze 38, I25121 Brescia, Italy {antonio.savoldi,paolo.gubian}@ing.unibs.it Abstract The main purpose of this paper is to describe the real filesystem of SIM and USIM cards, pinpointing what the official standard reference does not say. By analyzing the full filesystem of such embedded devices, it is possible to find out a lot of undocumented files usable to conceal sensitive and arbitrary information that are unrecoverable with the standard tools normally used in a forensic field. In order to understand how it is possible to use SIMs/USIMs for data hiding purposes, the paper will present a tool capable of extracting the entire observable memory of these devices together with the effective filesystem structure. Keywords: filesystem, SIM/USIM cards, imaging tool, data hiding 1 Introduction There are many commercial tools used in order to extract and decode the raw data contained by SIM/USIM cards, although none of them is capable of revealing the real filesystem’s structure and, consequently, to discover the multitude of non-standard files which are hidden into such devices. This is what can be done with SIMBrush1 [7] [8], a new open source tool, developed in ANSI C for Linux and Windows platforms, aimed at extracting the observable portion of the filesystem of a SIM/USIM card. The real news for the digital forensics research is to know what is really concealed and, potentially concealable, in a standard SIM/USIM card’s filesystem, thus demonstrating that data hiding is possible in such devices. In the open source arena there are, in the authors’ knowledge, only two examples of such kind of tool. The first, TULP2G [11] [16], is a framework, developed by NIS (the Netherlands Institute of Forensics), implemented in C# and whose utility is in the recovery of data from electronic equipment such as cellular and SIM cards. The second example, BitPim [4], is a program that allows to manipulate, at the logical level, many CDMA phones branded LG, Samsung, Sanyo and other manufacturers. More information about mobile forensics tools can be found in [14]. Throughout the rest of the paper we will briefly describe what is notable for this tool and, how it is possible to reconstruct the entire SIM’s and USIM’s filesystem. After that, some users’ scenario will be shown with regard the presentation of notable data present in such devices. Finally, after having understood the real structure of the SIM/USIM filesystem, it will be shown how it is possible to use 1 The tool can be required by emailing to author by [email protected]. More information are available at http://www.ing.unibs.it/˜antonio.savoldi 90 non-standard locations to conceal arbitrary data, giving a practical demonstration of the effectiveness of the method. 2 SIM/USIM Filesystem SIM stands for Subscriber Identity Module and together with the Mobile Equipment, that is the user cellular phone, constitute the Mobile Station, which defines, in the GSM (Global System for Mobile Communications) system, the so called “end user part”. The evolution of such a pervasive system, which counts more than one billion of users in the world, is the UMTS (Universal Mobile Telecommunications System), which increases, with respect to the GSM counterpart, the bandwidth for data exchange. In this case we must consider, in the end user part of the network, the USIM, that is Universal Subscriber Identity Module, together with the Mobile Equipment. Substantially, there are not big differences between SIM and USIM, from the filesystem structure point of view, although the USIM contains more data, defined by the ETSI standard reference. Every SIM/USIM card is a smart card, standardized by ISO: in particular, SIMs are contact (as opposed to contactless) smart cards, which are specified by ISO standard 7816 [12]. They contain a microprocessor, three types of memory, which are RAM, ROM and EEPROM and, finally, some integrated logic to manage the security issues. The SIM can be considered as a black box interfacing with Mobile Equipment by a standard API (Application Program Interface). The filesystem is stored in an internal EEPROM and has a hierarchical structure with a root called Master File (MF). Basically, there are two types of files: directories, called Dedicated Files (DF) and files, called Elementary Files (EF). The main difference between these two types of files is that a DF contains only a header, whereas an EF contains a header and a body. The header contains all the metainformation that relates the file to the structure of the filesystem (available space under DF, number of EFs and DFs which are direct children, length of a record, etc.) and security information (access conditions to a file), whereas the body contains information related to the application for which the card has been issued. In an ordinary SIM/USIM card three types of EF are possible, namely transparent, linear fixed and cyclic [15]. As said previously, there are a lot of files in an ordinary SIM/USIM card which can be subdivided with regard to the subscriber, information about acquaintances, SMS traffic, subscriber, calls, the provider and the cellular system [15]. The operations allowed on the filesystem are coded into a set of commands [15] that the interface device (IFD), which is the physical device capable of interfacing with a SIM and setting up the communication session, issues to the SIM, then waiting for the responses and, among these, only few are really important in the SIMBrush tool. Some of this fundamentals commands are SELECT, GET RESPONSE, READ BINARY, READ RECORD and others related to the manage of the CHV1/CHV2 codes [15]. Thus, in SIMBrush’s core algorithm, only these commands are used, preserving in this way all integrity data in the filesystem, all data being extracted in read only access mode. It is interesting to note the access conditions which are indicated in the files header: these acts as constraints to the execution of commands which protect files from unauthorized manipulation, and only for the duration of their authorization. In particular these constraints, specified by some bytes in the header of each elementary file, are related to one set of specific commands issuable to a card, namely update, read, increase, rehabilitate and invalidate. As will be clear in the latter section of this paper, these access conditions, for non-standard files, play a key role to conceal arbitrary data in such files. It is also mentionable that a necessary condition to extract all the sensitive content from the SIM/USIM card is to have the PIN1 (CHV1) code. Indeed, it has been used only standard methods in order to extract all digital data. If such condition is not satisfied, it will be possible to recover only the meta-content of the filesystem. 91 3 The Filesystem Extraction Standard programs, like that developed by Shenoi [5], can extract only the standard elementary files starting from the selection rules defined in the reference standard [15]. First, the standards say that the filename is univocal and, for example, “3F00” identifies the root of the filesystem, which is the master file of a SIM/USIM card. Second, the SELECT command may be executed with any file specified with the relative ID with no restrictions. This leads to the opportunity to “brush” the logical ID space by issuing a SELECT command for each possible ID, from “0000” to “FFFF”, obtaining a warning from the SIM when the ID does not exist in the filesystem, or the header of the file when it does. In a SIM’s filesystem there is the concept of current file and current directory. The SIM’s files are hierarchically selectable with certain constraints as specified in the reference standard [15]. According with these rules, it is possible to select, for example, the file “6F3C” (SMS) by issuing two SELECT commands: the first to select the DF 7F10, which is the father of this EF, and the second to select the file, that is, in this case, the SMS elementary file. As is notable from the standard reference, the SIM filesystem has an n-ary structure and it is easy to extract the standard part of every SIM/USIM, just with only a few commands, by reading directly all the elementary files declared in the standard reference. This is, in the authors’ knowledge, the approach used in all commercial and open-source tools. With the objective to acquire all the observable memory of a SIM, that is the data accessible with standard methods, we must define the general selection rule by which it is possible to “brush” the entire logical address space of EEPROM. By following the reference standard rules, it is possible to reconstruct the entire filesystem tree. Presently, SIMBrush is able to extract the body of those files whose access conditions are ALW and CHV1/CHV2, and the latter case is possible only if the appropriate code is provided, that is when PIN1 (CHV1) or PIN2 (CHV2) are provided. The main algorithm is based on the construction of a binary tree, which is a suitable data structure for SIM card data, being this structure equivalent to an n-ary tree. We can explain the main elements of this pseudo-code: Procedure Build_Tree Expand_DF(PARENT_SET = 0, CURRENT_SET = {MF}, DF_SIBLINGS_SET = 0); End Procedure Expand_DF(PARENT_SET: NODE, CURRENT_SET: NODE, DF_SIBLINGS_SET: NODE) Select(CURRENT_SET); SELECTABLE_SET = Brush(CURRENT_SET); SONS SET = SELECTABLE_SET \ (MF_SET U CURRENT_SET U PARENT_SET U DF_SIBLINGS_SET); For each node N belonging to SONS_SET, Place_in_tree(N); If N equal DF Then Expand_DF(PARENT_SET = CURRENT_SET, CURRENT_SET = N, DF_SIBLINGS_SET = DF_SIBLINGS_SET \ {N}); End • Build Tree: this procedure initializes the parameters of the recursive function Expand DF. • Expand DF: is the recursive function that, starting from the filesystem root, brushes the ID space, searching all existing EFs and DFs and finding all sons of the current node, which are placed, dynamically, in a binary tree data structure. For each son, if this is an EF then it is placed in the data structure; otherwise, if it is a DF then the Expand DF function acts recursively, updating all interested sets. • NODE: defines the main data structure to store all filesystem’s data. • Select: sends a SELECT command to the SIM card. 92 • Brush: this function selects a Dedicated File, passed as argument, which becomes the current DF, and brushes the entire logical ID’s space, obtaining the SELECTABLE set related to such DF as a result. The SIMBrush tool produces as output one XML (eXtensible Markup Language) file, which contains the raw data; the next step is to decode these data in a comprehensible form suitable for the forensics practitioners to derive useful data that could become digital evidence after the analysis. This part is done by a second tool, written in Perl, whose function is to translate clearly the rough data into a suitable form, which is the output XML file. SIMBrush and the interpretation tool have been tested on several SIM and USIM cards of different sizes, providers and ranging between 16 Kbytes and 128 Kbytes. It is possible to extract every kind of data, among those defined in the reference standard [15]. 4 Conclusions In this paper we have made a survey about the real features of the SIM/USIM card filesystem. In the appendix it is possible to have a look at the hidden part of the filesystem of such embedded device, showing how it is possible to use such non-standard part for data hiding purposes. References [1] Autopsy Forensics Browser. autopsy/. Software available at: http://www.sleuthkit.org/ [2] Document Object Model. Paper available at: http://www.w3.org/DOM/. [3] GSM Phone Card Viewer. Software available at: http://www.linuxnet.com/ applications/files/gsmcard_0.9.1.tar.gz. [4] R. Binns. BitPim. Software available at: http://bitpim.sourceforge.net/. [5] G. Manes C. Swenson and S. Shenoi. Imaging and analysis of gsm sim cards. In IFIP International Federation for Information Processing, Springer Boston, pages 205–216, 2006. [6] Paraben Corporation. Sim Card Seizure. Software available at: http://www.parabenforensics.com/catalog/. [7] A. Savoldi F. Casadei and P. Gubian. Simbrush: An open source tool for gsm and umts forensics analysis. In Proceedings of Systematic Approaches to Digital Forensic Engineering, First International Work-shop, Proc. IEEE, pages 105–119, 2005. [8] A. Savoldi F. Casadei and P. Gubian. Forensics and sim cards: An overview. International Journal of Digital Evidence, 5, 2006. [9] Susteen Inc. DataPilot. Software available at: http://www.susteen.com/. [10] Netherland Forensics Institue. Card4Labs. forensischinstituut.nl/NFI/nl. Software available at: http://www. [11] Netherland Forensics Institue. Tulp2G, Forensic Framework for Extracting and Decoding Data. Software available at: http://tulp2g.sourceforge.net/. 93 [12] ISO. Identification Cards - Integrated Circuit Cards with Contacts. Paper available at: http://www.cardwerk.com/smartcards/smartcard_standard_ ISO7816.aspx. [13] A. Savoldi and P. Gubian. A methodology to improve the detection accuracy in digital steganalysis. In Proceedings of International Conference on Intelligent Information Hiding and Multimedia Signal Processing, Proc. IEEE, pages 373–377, 2006. [14] e-evidence info The Electronic Evidence Information Center. BitPim. Software available at: http://www.e-evidence.info/index.html. [15] ETSI TS 100 977 v8.3.0. Specification of the Subscriber Identity Module - Mobile Equipment (SIM - ME) interface. Paper available at: http://www.id2.cz/normy/gsm1111v830. pdf. [16] J. van den Bos and R. van der Knijff. Tulp2g an open source forensic software framework for acquiring and decoding data stored in electronic devices. International Journal of Digital Evidence, 4, 2005. Appendix 1 Following, it is possible have a look at the raw as well the translated data related to the contents of one SMS record. With the help of reference standards [15] it is easy to extract all the information about the SMS message, such as the time (date and hour) when the message has been sent, the length of SMS, the number of the sender, the number of the service center, and finally, the textual message. <content> 01 07 91 94 61 00 D4 64 13 D3 EB 69 FE 06 B5 7C 1A 74 D9 61 90 22 87 41 D9 4D 97 2F 58 0D 97 41 CD 0D BA 06 A8 FC DD CD 6F 50 </content> 93 00 14 FA DF 0D 0C F3 BF 44 30 A1 7E 98 33 50 B6 1B ED 42 34 71 41 0E 1E 20 EB 0D 85 70 DB 44 B2 41 AF 58 69 83 9F 72 D3 8A 28 40 D3 0C 9B F4 BF 9E 36 A6 BE 1A 6F C5 02 90 F3 83 FE 34 DD 1E 68 F5 06 44 77 72 00 60 37 E2 06 48 65 87 16 B7 A1 4D 3A FF 04 25 E8 F5 35 5E 79 E5 7B BB 20 36 05 FF 04 80 2C F2 C3 3E BA 65 C1 2C 54 41 4A FF 85 A0 0F 9C 78 87 0C 50 70 4F DA 2F BA FF <content> <Date>04 Jul 05</Date> <Hour>09.06.52</Hour> <Length_SMS>160</Length_SMS> <Number>4916</Number> <Number_Service_Center> 393358822000 </Number_Service_Center> <Status>01</Status> <Text> TIM avviso gratuito Da questo momento Maxi WAP ti regala 2 suonerie da scaricare entro il 31/08/05 da SuonerieMaxiWAP (in WAP di TIM/Promozioni). </Text> </content> Appendix 2 The Hidden Part of Filesystem The non-standard part of the SIM/USIM filesystem has been discovered by the authors by using SIMbrush [7] [8], created with the main purpose to acquire the entire contents of a smart card memory. An example of a partial filesystem present in a 128 Kbyte SIM card can be seen in table 2. Each row of the table is relative to a node of the n-ary tree of the SIM/USIM card filesystem. In this way we can manage the huge quantity of information regarding the meta-data in a compact way. We can see seven fields which refer to ID, standard name of an EF or DF, file type (MF,DF,EF), privileges, which are related to the constraints on the execution of a set of commands, as already said, structure of 94 file (transparent, linear fixed or cyclic), the field related to father of nodes, important to see the real structure of the n-ary tree, and finally, the size of the elementary files. By analyzing, for example, the non-standard elementary files under DF “5FFF”, namely the EFs ranging from “1F0C” to “1F3F”, it is easy to see that these files are modifiable with the Update command, because the privilege for this command is CHV1. This means that everyone having the PIN1 of the card is authorized to store arbitrary data by replacing the contents of the existing files. Clearly, this is the worst case scenario: indeed, it is always possible to modify the contents of these files, if the card is not protected with the CHV1 code. In the present case a SIM/USIM card can act as a covert storage channels, because the data hiding is possible by using concealed storage locations in the filesystem. The mentioned stego-object is represented by the SIM/USIM with the concealed message, which can be allocated in the non-standard part of the filesystem by using different strategies. Once a message is hidden into the SIM/USIM, it can be sent from Alice to Bob with a stego-key, thus acting as a covert channel. A representative diagram of what we have explained is shown in figure 1. Now, we are ready to see a possible framework usable to demonstrate that data hiding is possible in this kind of devices. Figure 1: Transmission of information by using an ordinary SIM/USIM card as cover-object. A Framework for Data Hiding As already explained, hiding data in SIM/USIM cards is based on the presence of a non declared part in the filesystem that can be used to store arbitrary data if the privileges permit this. We will see a possible methodology to perpetrate the data hiding, and subsequently, we will discuss best practices usable to recover the hidden message. A Possible Data Hiding Procedure In order to create the stego-object we need to embed the message in the cover-object, namely the SIM/USIM, by using a portion of the non-standard part of the filesystem. Here we present a possible scheme for this purpose. 95 • Extraction of the binary image by using SIMBrush: at this stage we need to deal with the important task of acquiring all the observable content from a SIM/USIM card. This is clearly possible for example by using the mentioned tool which is able to analyze the entire logical space of the EEPROM, discovering the non-standard part. • Creation of the File Allocation Table (FAT): having the complete set of headers related to the SIM/USIM filesystem, it is quite trivial to obtain the FAT, as shown in table 2. • Selection of the Writable Non-standard Part (WNSP): by inspecting the privileges regarding the Update command, it is possible to discover all non-standard files that are modifiable arbitrarily, in the worst case with the users’ privileges. • Allocation of the message in the WNSP: the message that is going to be concealed needs to be broken into many chunks, according to the dimension of the non-standard files that will be rewritten. At this stage, there are a lot of possible strategies that can be used. The selected non-standard files will constitute the steganographic key, usable to recover the hidden message. In order to understand this procedure we can analyze an example, by considering the FAT presented in table 2. In this case, by adding up all file dimensions, the total space occupied is 56887 bytes, whereas the non-standard part is 42859 bytes. The effective writable non-standard part (WNSP) is 16549 bytes, about 29,1% of the total engaged space. Some SIMs/USIMs analyzed have been reported in table 1. Table 1: WNSP regarding some of the analyzed SIM/USIM. # Provider Country EEPROM Phase Services WNSP NSP TES 1 2 3 4 5 6 7 8 TIM Vodafone BLU Omnitel Wind TIM TIM H3G Italy Italy Italy Italy Italy Italy Italy Italy 16KB 32KB 64KB 64KB 64KB 128KB 128KB 128KB 2 2 2+ 2+ 2+ 2+ 2+ 3 GSM GSM GSM GSM GPRS GPRS GPRS UMTS 0 0 0 0 96 16549 12478 107 151 531 21122 17427 4737 42859 25112 21290 6997 8743 31087 25689 22651 56887 45729 30826 Guidelines for Recovering the Hidden Message Having demonstrated that data hiding is possible in such devices, it is mandatory to trace some guidelines about which best practices can be used by the forensics practitioner in order to deal with this problem. Undoubtedly, the first thing is to understand that the actual tools belonging to the field of cellular forensics, whose aim is to extract the standard part, have a fundamental drawback, not being able of acquiring all the memory content. Having said this, in the authors’ opinion, it is important to alert the forensics community in order to fix this absence. If we assume that we have the complete SIM/USIM memory image, we can see how one can deal with the problem of the extraction of sensitive data from this device. • Extraction of the non-standard part from the image: this task is necessary in order to isolate all the potentially valuable data. • Application of the steganalysis methods: this is the most challenging step because it is unknown whether there are any concealed data in the non-standard part, or which coding has been used for the hiding purpose. 96 The latter step can be really time-consuming and is very similar to the problem of detecting a hidden message in an ordinary digital image [13]. A possible solution to solve this problem is to apply a brute force translation method, by decoding the various chunks of non-standard contents trying to see something intelligible. Table 2: A partial list of standard and non-standard files extracted from a TIM 128 Kbytes SIM card. ID 3F00 2F00 2F05 2F06 2FE2 2FE4 2FE5 2FFE 7F10 5F3A 4F21 4F22 4F23 4F24 4F25 4F26 4F30 4F3A 4F3D 4F4A 4F4B 4F4C 4F50 4F61 4F69 5FFF 1F00 1F01 1F02 1F03 1F04 1F05 1F06 1F07 1F08 1F09 1F0A 1F0B 1F0C 1F1E 1F1F 1F20 1F21 1F22 1F23 1F24 1F34 1F38 1F3D 1F3E 1F3F 1F40 Name MF NS ELP NS ICCID NS NS NS DFTELECOM NS NS NS NS NS NS NS SAI NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS NS File Type MF EF EF EF EF EF EF EF DF DF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF DF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF EF Privileges — ALW2 ,ALW,ADM,NEV,NEV,NEV ALW,CHV1,NEV,NEV,NEV ALW,NEV,NEV,NEV,NEV ALW,NEV,NEV,NEV,NEV ALW,NEV,NEV,NEV,NEV ALW,NEV,NEV,NEV,NEV CHV1,ADM,NEV,NEV,NEV — — CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,ADM,NEV,ADM,ADM CHV1,CHV1,NEV,CHV2,CHV2 CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM — ADM,ADM,NEV,ADM,ADM ADM,ADM,NEV,ADM,ADM CHV1,CHV1,NEV,NEV,NEV ALW,ADM,NEV,NEV,NEV ALW,CHV1,NEV,NEV,NEV ADM,ADM,NEV,ADM,ADM ADM,ADM,NEV,ADM,ADM CHV1,ADM,NEV,ADM,ADM CHV1,CHV1,NEV,NEV,NEV CHV1,CHV1,NEV,ADM,ADM ADM,ADM,NEV,ADM,ADM ADM,ADM,NEV,ADM,ADM CHV1,CHV1,NEV,CHV1,CHV1 CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,ADM,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM CHV1,CHV1,NEV,ADM,ADM ADM,ADM,NEV,ADM,ADM 2 Structure — linear fixed transparent linear fixed transparent transparent transparent transparent — — linear fixed transparent transparent transparent linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed — transparent transparent transparent linear fixed transparent linear fixed linear fixed transparent transparent transparent linear fixed transparent linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed linear fixed transparent transparent transparent transparent Father — 3F00 3F00 3F00 3F00 3F00 3F00 3F00 3F00 7F10 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 5F3A 7F10 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF 5FFF Size [byte] — 46 4 330 10 35 6 8 — — 500 4 2 2 500 1250 128 7000 75 39 70 70 1280 340 500 — 105 175 11 40 4 640 420 20 175 100 16 16 34 70 70 128 1280 340 500 1250 500 500 4 2 2 700 The sequence of privileges is related to, as explained in the paper, the execution of a defined set of commands issuable to a SIM card, namely Read, Update, Increase, Rehabilitate and, finally, Invalidate. 97 LAVORO: LE LINEE GUIDA DEL GARANTE PER POSTA ELETTRONICA E INTERNET Stefano Aterno Studio Legale Aterno Docente informatica giuridica La Sapienza Roma Abstract. Confine tra lecito e illecito nel controllo informatico e telematico dell’attività dei lavoratori. Oggi quante cose riesce a controllare il datore di lavoro ? Cosa può fare il datore di lavoro e cosa deve comunicare ai lavoratori. Il controllo delle email. Il difficile equilibrio tra la sicurezza informatica e la tutela del dipendente sul luogo di lavoro. Il divieto di controllo dell’attività lavorativa del dipendente e l’utilizzo dei dati a fini disciplinari. Keywords: dipendente; datore di lavoro; controllo attività lavorativa; controllo email; controllo navigazione internet; dati personali; violazione della corrispondenza; Statuto dei lavoratori. Policy aziendali. PACS: Replace this text with PACS numbers; choose from this list: http://www.aip.org/pacs/index.html Il Garante per la protezione dei dati personali, il 1 marzo di quest’anno ha emesso delle linee guida in materia di controllo del dipendente da parte del datore di lavoro. Il Garante ha dettato delle novità oppure era tutto già scritto e contenuto in norme, sentenze e Testi Unici? Innanzitutto va detto che il Garante per la protezione dei dati personali non ha emanato delle linee guida e basta. Il testo licenziato dall’Autorità è un vero e proprio provvedimento che sotto certi punti impone divieti e regole di comportamento attraverso l’adozione di accorgimenti tecnici. Nel resto, più diffusamente, emana le cd. linee guida (che altro non sono che dei meri consigli). I datori di lavoro pubblici e privati non possono controllare la posta elettronica e la navigazione in Internet dei dipendenti, se non in casi eccezionali. Tale assunto espresso dal Garante era già menzionato in molte norme e leggi dello Stato, non ultime quelle previste dallo Statuto dei Lavoratori e dal codice penale. Le linee guida riprendono un concetto ormai pacifico, e cioè che spetta al datore di lavoro definire le modalità d'uso di tali strumenti tenendo conto dei diritti dei lavoratori e della disciplina in tema di relazioni sindacali (si vedano, art. 4 statuto dei lavoratori e diverse sentenze della Cassazione in materia penale e di lavoro) Le esigenze di sicurezza informatica e telematica hanno imposto alle aziende un controllo puntuale e meticoloso del traffico e dei flussi informatici. Ciò ha consentito il monitoraggio di una mole di dati (connessioni WEB, email, file di log) direttamente riconducibili ai lavoratori. L’utilizzazione di tali dati o l’accesso a postazioni informatiche dei lavoratori ha portato a qualche segnalazione al Garante e ad alcune denunce penali. Vedremo come si sono conclusi alcuni processi. 98 L’utilizzo per finalità private o per scopi ludici (non aziendali) della posta elettronica (aziendale) o la connessione a siti web (non istituzionali e/o lavorativi) ha determinato il proliferare di dati personali e soprattutto sensibili. Dall'analisi dei siti web visitati si possono trarre informazioni anche sensibili sui dipendenti e i messaggi di posta elettronica possono avere contenuti a carattere privato. Partendo dal presupposto del concetto di bene aziendale e pertanto strumentale al raggiungimento degli scopi lavorativi si sta cercando una soluzione al problema del controllo di fatto sull’attività lavorativa del dipendente. Una delle soluzioni è la prevenzione e l’adozione di rigorose policy interne. La comunicazione e l’informazione a tutti i dipendenti delle regole e della policy dovrebbe portare ad una corretta ed equilibrata soluzione del problema. “Occorre prevenire usi arbitrari degli strumenti informatici aziendali e la lesione della riservatezza dei lavoratori” (Garante per la protezione dei dati personali) . L'Autorità prescrive innanzitutto ai datori di lavoro di informare con chiarezza e in modo dettagliato i lavoratori sulle modalità di utilizzo di Internet e della posta elettronica e sulla possibilità che vengano effettuati controlli. Il Garante vieta poi la lettura e la registrazione sistematica delle e-mail così come il monitoraggio sistematico delle pagine web visualizzate dal lavoratore, perché ciò realizzerebbe un controllo a distanza dell'attività lavorativa vietato dallo Statuto dei lavoratori. Viene inoltre indicata tutta una serie di misure tecnologiche e organizzative per prevenire la possibilità, prevista solo in casi limitatissimi, dell'analisi del contenuto della navigazione in Internet e dell'apertura di alcuni messaggi di posta elettronica contenenti dati necessari all'azienda. Il provvedimento raccomanda l'adozione da parte delle aziende di un disciplinare interno, definito coinvolgendo anche le rappresentanze sindacali, nel quale siano chiaramente indicate le regole per l'uso di Internet e della posta elettronica. Il datore di lavoro è inoltre chiamato ad adottare ogni misura in grado di prevenire il rischio di utilizzi impropri, così da ridurre controlli successivi sui lavoratori. Ma possiamo ritenere sufficiente ciò che dice il Garante a proposito del fatto che per quanto riguarda Internet è opportuno ad esempio individuare preventivamente i siti considerati correlati o meno con la prestazione lavorativa e utilizzare filtri che prevengano determinate operazioni, quali l'accesso a siti inseriti in una sorta di black list o il download di file musicali o multimediali ? Non si rischia forse di limitare l’ennesima libertà che era costituita dal navigare liberamente in Internet ovunque? E’ possibile che non ci siano altre soluzioni magari più severe e responsabilizzanti ma non certo limitative delle libertà fondamentali? Uno dei problemi che si pone con maggiore frequenza è il dovere controllare la posta elettronica in assenza del dipendente. Alcune soluzioni già adottate dalle aziende sono state riprese dalle Linee Guida. Sono sufficienti ? Alcune sono senz’altro importanti e fondamentali, altre sono foriere di ulteriori problemi. Vedremo insieme quali. Qualora queste misure preventive non fossero sufficienti a evitare comportamenti anomali, gli eventuali controlli da parte del datore di lavoro devono essere effettuati con gradualità. In prima battuta si dovranno effettuare verifiche di reparto, di ufficio, di gruppo di lavoro, in modo da individuare l'area da richiamare all'osservanza delle regole. 99 Solo successivamente, ripetendosi l'anomalia, si potrebbe passare a controlli su base individuale. Uno dei punti principali delle Linee Guida è il richiamo ad alcuni principi già richiamati nel codice privacy: - principio di necessità principio di pertinenza e di non eccedenza principio di correttezza e di informazione Al di là di tante norme e di tanti principi quali sono i rischi veri, concreti e reali che corrono i lavoratori e quali rischi altrettanto temuti corrono le imprese ed i singoli imprenditori ? 100 A Practical Web-Voting System Andrea Pasquinucci UCCI.IT, via Olmo 26, I-23888 Rovagnate (LC), Italy Abstract We present a reasonably simple web-voting system. Simplicity, modularity, voter’s trust and the requirement of compatibility with current web browsers lead to a protocol which satisfies a set of security requirements for a ballot system which is not complete but sufficient in many cases. We discuss the requirements, the usability problem and the threats of this protocol (a demo is on-line at [6]). 1 Introduction Recently E-voting has been one of the hottest topics in security and cryptography. In practice E-voting is traditional voting at a voting booth aided by a voting machine. Here instead we consider web-based voting, that is voting using a web-browser and posting the vote on a web-site. Web-voting has some intrinsic features which makes it inherently different from E-voting due to the delocalization of the voter. But if on one side delocalization of the voter can be seen as a weakness of web-voting, on the other hand it is also its strength, since with web-voting it is possible to express a vote practically anywhere and in any moment, potentially enlarging people participation to decision processes. Here we present a simple, modular protocol [14, 13, 12], based on sound and common cryptography and previous results in the literature [1, 9, 20, 3] which can be implemented as a web service compatible with current web browsers and not requiring any user education. Most of the cryptographic operations required in the protocol can be implemented with common tools like PGP or gnupg [15, 8, 11]. Before describing the protocol, its implementation and the threat analysis, it is important to consider two other aspects of web-voting: the requirements that a web-voting system should satisfy and the problems of user trust, user interface and usability. 2 Web-voting requirements Among the most important requirements often used in defining an electronic voting system [1, 9, 20, 7], our protocol satisfies the following: 1. Unreusability (prevent double voting): no voter can vote twice 2. Privacy: nobody can get any information about the voter’s vote 101 3. Completeness: all valid votes should be counted correctly 4. Soundness: any invalid vote should not be counted 5. Individual verifiability: each eligible voter can verify that her vote was counted correctly 6. Weak Eligibility: only eligible voters can get voting credentials from trusted authorities 7. User-friendliness: the system is easy (intuitive) to use Our protocol does not satisfy Eligibility (only Weak Eligibility) nor Receipt-freeness, both of which mean that a voter is able to prove to someone else how she has voted, and so to sell her vote. Our protocol does not satisfy also Incoercibility. Indeed, any voting system in which voters don’t express their vote in a controlled environment (the voting booth) cannot prevent coercion and vote-selling. In the case of web-voting, but the same is true for voting by normal mail, the voter can always take a picture, make a movie or vote in the presence of the person to whom the vote is sold. The voter can also sell the voting credentials, unless they are of biometric type. User-friendliness instead impacts other requirements because to obtain higher security it is often necessary to adopt advanced cryptographic algorithms and protocols. Unfortunately these cryptographic protocols often require the intervention of the voter and this means that the voter should understand something, albeit little, of the protocol itself. This is often impossible if the voter does not have a background in cryptography, even if she is a computer expert (see for example the recommendations in [19] in the case of E-voting). 3 The human factor Voting is not only a technical fact. It is most important that the voters trust the process and the result of the vote. In traditional voting the electoral committee and the verification of the procedures by the representatives of the competing parties, realize the controls necessary for the voter to trust the final result. In E-voting and web-voting, the voters, the electoral committee and the representatives of the competing parties must trust the software and hardware which implement the electronic voting system. As we all well know, the security of current electronic system is weak and for sure it is so perceived by everybody. We cannot expect that voters trust unconditionally a web-site for protecting their anonymity and counting correctly the votes. To make the voters trust an electronic voting protocol we need to give the voters themselves some way of controlling the correctness of the process, for example by giving them a receipt with which they can check that their vote has been counted correctly in the final tally. This of course can make vote-selling easier, but in web-voting this would be possible anyway. 102 We believe that today it is somehow more important to renounce to some properties of a web-voting protocol, but make the voter understand and trust the process. This can limit the applicability of a web-voting protocol, but it makes it easier to implement and to manage the associated risks. 4 The modular protocol Following the traditional voting procedure, and also to reduce the risk of having both authentication data and votes on the same system, it has been decided to divide the phases of authentication and vote and to realize them on different machines (for a similar approach see also [3]). In this way, only by the collusion of the managers of the two servers it is possible to associate a vote to a voter. The voter connects first with her web-browser to the authentication web-server (all web connections are encrypted with SSL/TLS) and she is requested her voting credentials, a username-password or client digital certificate. These credentials allow her to access the system. Then the voter presents a Secret Token which is unique per voter and per vote and that can be used only once. The authentication server creates then a Vote Authentication which is based on a random number and is encrypted with the public key of the vote server and digitally signed. Alternatively, blind signatures can be used [5, 3]. With blind signatures the random number of the Vote Authentication is created by the voter and the authentication server does not know its value. This reduces the risk of collusion of the authentication and vote server managers, but requires a custom built web-browser since current web-browsers do not support blind signatures and creates a (very small) possibility of two users creating the same Vote Authentication, in which case only one will be able to vote. The voter then connects to the vote web-server through an anonymizer service, this hides the IP address of the voters [4, 10]. As an extra anonymizing measure, a voter can also connect to the vote web-server using Tor [21, 17]. Notice that anyway all current browsers in default configuration leak some information about the voter, for example in the USER-AGENT field. The voter sends to the vote server the Vote Authentication and the vote server verifies that the digital signature is done by the authentication server and that the Vote Authentication has not already been used. If this is true the vote is registered and the Vote Authentication is marked as used. (All data written on the authentication and vote server is encrypted with the public key of the authentication and vote committee respectively.) The vote server then sends to the voter a cryptographic Vote Receipt which allow the voter to verify that her vote has been counted correctly in the final results. At the end of the voting time, all encrypted votes are sent to the vote committee which decrypts, counts the votes and posts the results and each vote with its vote receipt. Analogously the authentication committee decrypts all used Secret tokens and publishes its list. 103 5 Threats Besides the possibility of software bugs (the current implementation uses standard software like a linux distribution with kernel fortified with RSBAC [18], apache [2], php [16] and standard cryptographic libraries [8, 11]) or the possibility of collusion between the managers of the two servers, among the most difficult threats to counter are timing attacks and impersonation. In general we can divide the attacks in two classes: the attacks which try to modify or add votes, and the attacks against anonymity. The modification of a vote expressed by a voter will be discovered by the voter herself using the receipt. It is more difficult to discover if someone has voted in place of some absentees. Indeed it is improbable that absentees will check that their Secret Token has not been used, so it will be difficult to discover if someone having learned unused Secret Tokens, has used them. Notice that on the authentication server appear only the hashes of the secret tokens, so that the authentication manager cannot mount this attack. This is then a purely human attack, which cannot be prevented by electronic measures. Timing attacks are more subtle but also more difficult to mount. First of all, the protocol does not require that the voter votes just after having received her Vote Authentication, but this is what most people will do. On the vote server no times of the votes are recorded, so that all votes appear as if submitted at the same moment. On the other side, times are present in the protocol: the time of creation of the Vote Authentication is recorded and the vote receipt is built with a cryptographic routine using as one of the ingredients a precise time-stamp. In practice the only way of violating the anonymity of the vote is to match the information known to the authentication manager with that potentially known to the vote manager: for each vote the vote manager must record also its time (the system does not record it, so the vote server must be modified to record this information) and then match them with what recorded by the authentication manager. This attack requires the collusion of the two managers and the modification of one server. Acknowledgments We thank D. Bruschi and A. Lanzi for discussions. References [1] B. Adida. Advances in cryptographic voting systems. PhD thesis MIT, 2006. [2] Apache web server. http://httpd.apache.org/. [3] A Protocol for Anonymous and Accurate E-Polling, volume Lecture Notes in Computer Science 3416. Springer, 2005. Proceedings of E-Government: Towards Electronic Democracy, International Conference TCGOV 2005, Bolzano Italy 2005. 104 [4] D. Chaum. Untraceable electronic mail, return address and digital pseudonyms. Comm. ACM 24(2) 84, 1981. [5] D. Chaum. Security without identification; transaction systems to make big brother obsolete. Comm. ACM 28(19) 1030, 1985. [6] Ucci.it eballot. http://eballot.ucci.it. [7] E. Gerck. Voting system requirements. The Bell, 2001. [8] Gnupg. http://www.gnupg.org/. [9] D. Gritzalis, editor. Secure electronic voting. Kluwer Academic Publisher, 2002. [10] A.D. Rubin M.K. Reiter. Anonymous web transactions with crowds. Comm. ACM 42(2) 32, 1999. [11] Openssl. http://www.openssl.org/. [12] A. Pasquinucci. Implementing the modular eballot http://eballot.ucci.it/, http://arxiv.org/abs/cs.CR/0611067, 2006. [13] A. Pasquinucci. A modular eballot http://arxiv.org/abs/cs.CR/0611066, 2006. system. system v1.0. http://eballot.ucci.it/, [14] A. Pasquinucci. Some considerations on trust in voting. http://eballot.ucci.it, 2006. [15] Pgp. see e.g. rfc1991, rfc2440, rfc3156, http://www.pgp.com/. [16] Php. http://www.php.net/. [17] P. Syverson R. Dingledine, N. Mathewson. Tor: the second-generation onion router. http://tor.eff.org/, 2004. [18] Rsbac: Rule set based access control. http://www.rsbac.org/. [19] R.G. Saltman. Independent verification: essential actions to assure integrity in the voting process, preliminary review for nist. NIST, 2006. [20] B. Schneier. Applied Cryptography. Wiley, 1996. [21] The onion router. http://tor.eff.org/. 105