* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Intrusion Detection Systems and IPv6∗
Survey
Document related concepts
Wake-on-LAN wikipedia , lookup
Computer network wikipedia , lookup
SIP extensions for the IP Multimedia Subsystem wikipedia , lookup
Internet protocol suite wikipedia , lookup
Piggybacking (Internet access) wikipedia , lookup
Airborne Networking wikipedia , lookup
List of wireless community networks by region wikipedia , lookup
Wireless security wikipedia , lookup
Network tap wikipedia , lookup
Recursive InterNetwork Architecture (RINA) wikipedia , lookup
Computer security wikipedia , lookup
Deep packet inspection wikipedia , lookup
Peer-to-peer wikipedia , lookup
Cracking of wireless networks wikipedia , lookup
Transcript
Intrusion Detection Systems and IPv6∗ Arrigo Triulzi [email protected] Abstract In this paper we will discuss the new challenges posed by the introduction of IPv6 to Intrusion Detection Systems. In particular we will discuss how the perceived benefits of IPv6 are going to create new challenges for the designers of Intrusion Detection Systems and how the current paradigm needs to be altered to confront these new threats. We propose that use of sophisticated dNIDS might reduce the impact of the deployment of IPv6 on the current network security model. Keywords: 1 Intrusion Detection Systems, IDS, NIDS, Firewalls, Security, IPv6. Introduction The field of Intrusion Detection is generally divided into two large categories: Host-based Intrusion Detection Systems (HIDS ) and Network-based Intrusion Detection Systems (NIDS ). The HIDS label is often used for tools as diverse as anti-virus programs, the venerable UNIX syslog and intrusion detection software such as portsentry. The NIDS label is similarly abused extending from firewalls to network intrusion detection software proper such as Snort [8]. A further specialisation is that of “distributed” NIDS, or dNIDS, which addresses large NIDS systems distributed over a wide network or designed following a distributed-computing paradigm. An excellent example of the latter category is Prelude IDS [10]1 . We shall be discussing NIDS and in particular the rising need for dNIDS in the context of IPv6. Historically, long before the appearance of Snort and other tools of the trade, most of the world was using tcpdump [11] for intrusion detection on the network: a “down to the bare metal” packet dumping utility. The very first Network Intrusion Detection System, called Network Security Monitor, was written by Todd Heberlein and colleagues at UC Davis under contract with LLNL between 1988 and 1991 [2, 1]. It was then extended to create NID which found widespread use in the US military [5]. This was rapidly followed by Shadow, written by Stephen Northcutt and others during 1996 [3, 4, 6] again for use by the US military. From Shadow onwards there has been an explosion of free and commercial NIDS products which is beyond the scope of this brief introduction (the more historically minded reader will find more detailed information in [12]). There has been little effort to expand the current set of NIDS to support the IPv6 protocol, mainly due to a lack of demand. Despite years of forecasts of “doom and gloom” when discussing the famous exhaustion of IPv4 addresses there has been little uptake of IPv6 with the exception of countries such as Japan which have been very active in promoting it [13, 14]. This has meant that very little non-research traffic has actually travelled over IPv6 and hence the impetus for new attacks making use of IPv6 features has been absent. A trivial example is the author’s personal mail server and IDS web site [15] which has been reachable on IPv6 since December 2001: there has been only a single SMTP connection over IPv6 in over a year. This is to a mail server with an average of a thousand connections per day. At first glance this might indicate that there is little point in pursuing IDS under IPv6 but this should instead be thought of as an opportunity to be ready before the storm hits as opposed to catching up afterwards as in the IPv4 space. 2 IPv4 and IPv6 The discussions around IPv6 started a while back, despite what the current lack of acceptance might suggest, with the first request for white papers being issued in 1993 as RFC1550 [16] using the name ∗ or “Why giving a sentry an infinite block list is a bad idea”. they do define themselves as a “hybrid” IDS as they combine some HIDS facilities within a dNIDS design. 1 Although Security and Protection of Information 2003 15 IPng, for “IP New Generation”. By the end of 1995 the first version of the protocol specification was published as IETF RFC1883 [17] which formed the basis for the current IPv6 deployment. For a detailed description of the header formats the reader is referred to [21]. The key differences between IPv6 and IPv4 can be summarised briefly as: • • • • • • Simplified header, Dramatically larger address space with 128-bit addresses, Built-in authentication and encryption packet-level support, Simplified routing from the beginning, No checksum in the header, No fragmentation information in the header. We shall now take a critical look at the specific differences which are relevant to the IDS professional. 2.1 Simplified header The comparison between an IPv4 header and an IPv6 header is striking: the IPv6 header is cleaner, with fewer fields and in particular everything is aligned to best suit the current processors. The rationale behind this change is simple: memory is now cheap, packing data only means harder decoding on processors which assume data is aligned on 4 or 8 byte boundaries. The header is simplified by removing all the fields which years of experience with IPv4 have shown to be of little or no use. The best way to visualise the cleaner and leaner IPv6 basic header is to look at some tcpdump output representing the same transaction (an ICMP Echo Request packet) between the same two hosts over IPv4 and IPv6. 14:39:29.071038 195.82.120.105 > 195.82.120.99: icmp: echo request (ttl 255, id 63432, len 84) 0x0000 4500 0054 f7c8 0000 ff01 4c6e c352 7869 E..T......Ln.Rxi 0x0010 c352 7863 0800 1c31 3678 0000 3e5f 6691 .Rxc...16x..>_f. 0x0020 0001 1562 0809 0a0b 0c0d 0e0f 1011 1213 ...b............ 0x0030 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"# 0x0040 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&’()*+,-./0123 0x0050 3435 3637 4567 14:40:04.096138 3ffe:8171:10:7::1 0x0000 6000 0000 0010 3a40 3ffe 0x0010 0000 0000 0000 0001 3ffe 0x0020 0000 0000 0000 0099 8000 0x0030 bc5e 5f3e 2f77 0100 > 3ffe:8171:10:7::99: 8171 0010 0007 8171 0010 0007 60fe 4efb 0000 icmp6: echo request (len 16, hlim 64) ‘.....:@?..q.... ........?..q.... ..........‘.N... .^_>/w.. The first obvious difference is in the version nibble, a six instead of a four. We then notice how the IPv6 header appears to consist mainly of zeros. This is because the first eight bytes only contain essential data, everything else being relegated to so-called “extension headers” if need be. In this particular case the packet contains no extension headers, no flow label, simply the payload length, next header (indicating an ICMPv6 header), and hop limit (64, hex 40 in the packet dump). This is followed by the 128 bit addresses of source and destination comfortably aligned on 8-byte offsets making the header disassembly efficient even on 64-bit CPUs. ¿From an IDS perspective this is excellent because on modern CPUs taking apart the IPv4 header to detect subtle packet crafting is very inefficient due to the alignment of the data fields. With IPv6 the decomposition of the various fields of the header and extension headers can take place efficiently. This can only mean a gain in per-packet processing speed, an important measure when Gbit/s interfaces are brought into play. Performance is further enhanced by the lack of fragmentation information in the header: this means that the basic header does not need to be decoded for fragmentation information. Indeed, fragmentation is now dealt with by extension headers. This does not necessarily make the fragmentation attacks developed for IPv4 stacks obsolete (see [23] for a discussion of these attacks) as incorrect datagram reassembly can still take place. 16 Security and Protection of Information 2003 2.2 Larger address space At first glance the fact that IPv6 offers 2128 addresses might seem a blessing after all the problems with address exhaustion in IPv4. The indiscriminate allocation of IPv4 addresses, which were initially thought to be plentiful, brought us protocol violations such as Network Address Translation, at times also referred to as “masquerading”, and other “patches” to work around the problem. If the computers currently connected to IPv4 were moved over to an IPv6 infrastructure then in effect all that would happen is a renumbering of hosts. A consequence of IPv6 is to be more than a substitute of IPv4 and bring us closer to the concept of “ubiquitous networking”: the larger address space as an incitement to connecting everything to the Internet. A number of companies are already working on so-called “intelligent homes” and in particular Internet-connected home appliances. The deployment of IPv6 will make it possible for households to be assigned ample addresses for each of their appliances to be directly connected to the Internet and report on the state of the fridge, the washing machine and so on. Let us now wear the paranoid IDS implementor’s hat: to connect to the Internet each of these appliances must run an embedded operating system with a TCP/IP stack of some description. Furthermore it needs to have sufficient processing power to offer something like a simple web server for configuration and user interaction. Let us further imagine that there is a security flaw with this software which allows a remote user to take over control of the appliance and use it as a Distributed Denial of Service host (see [22] for an exhaustive discussion of DDoS). Assuming Internet-connected home appliances become widely used then we are effectively facing the possibility of an attack by an innumerable amount of simple systems which will not be trivial to patch2 . Furthermore IPv6 has, by design, simplified the way a device can find its neigbours on a network segment. A pre-defined link-local “neighbour discovery” procedure for allocating an IPv6 address using ICMPv6 has been drawn up which uses the 48bit Ethernet address as a differentiator between hosts . A NIDS now loses what little chance it had of discovering devices having a dynamically provided address as the allocation is no longer “public”, simply changing an Ethernet card can easily blind address-based rules3 . ¿From this we can draw the conclusion that a number of very useful features of IPv6 can seriously turn against the IDS community as it gains a foothold in the Internet and more devices, in particular embedded devices, are produced with IPv6 Internet connectivity. 2.3 Authentication and Encryption Authentication and encryption are available in IPv4 through the use of IPsec [18, 19] which has not been widely deployed until recently. The main field of application has been that of VPN tunnels and this limited interest has meant that until most router vendors had implemented it there were few users. Furthermore, these few were mainly users of implementations running on open source operating systems. The main factors blocking the widespread use of IPsec have been the complicated setup and key-exchange protocol which, although necessary for proper security, did require more knowledge than the average systems manager possessed. This has meant that the IDS community has been mainly concerned with channels protected by SSL or TLS (see [20] for a formal definition of TLS). For example an attack over HTTP which is visible to a NIDS installed between the source and the destination becomes invisible if transmitted over HTTPS as the standard “string in the payload” matching will fail. A number of solutions to the encrypted channel problem have been postulated: from session key sharing by the web server allowing “on-the-fly” decryption to server-side storage of keys for later off-line decryption of captured packets. One solution is to use ssldump [30] to decode SSL but of course you need control of both endpoints to obtain the session keys. The deployment of IPv6 has the potential to worsen the “encrypted attack problem” quite dramatically as IPv6 has had authentication and encryption built-in since its inception4 . One of the default extension 2 Remote updating of software does not improve the situation much as various incidents with Microsoft’s Windows Update facility have shown (see, amongst the many, [27]). 3 Note that DHCP is being extended to IPv6 as DHCPv6 but the availability of “neighbour solicitation” as default provides a far greater challenge. 4 One should note that a few features of IPv4 were rarely used as intended, for example Quality of Service and indeed, amongst the IP options, a “Security” option which is often referred to as “something used by the US DoD” and deals with classification. A quick look at a Unix system’s include file /usr/include/netinet/ip.h is a recommended read. Security and Protection of Information 2003 17 headers for IPv6 is the “IPv6 Authentication Header” (AH ) which is nothing other than the same AH mechanism as used in IPsec but for IPv6. As IPv6 deployment increases it will be interesting to see if, unlike IPsec, the AH mechanism is more widely used and, in particular, if trust can be moved from the weak “secure web server” model5 to the protocol layer for a large number of applications. This is a double-edged sword for the IDS community. It is very tempting to think that widespread availability of encryption and authentication improves security but all this does is move the goal posts. There is no guarantee that the traffic being carried by the authenticated and encrypted link is legitimate and furthermore it will now be illegible as far as the NIDS is concerned. The solution of “sharing the keys” as in the web server scenario becomes the nightmare of sharing the keys of every host on an IPv6 network with the NIDS. 3 Intrusion Detection Intrusion Detection is moving in two opposite directions: one is the “flattening” towards the low-end of the market with vendors attempting to sell “shrink-wrapped” IDS packages, the other is the “enterprise” attempting to pull together all security-related information for a large company. Both suffer from the same problem: customers are used to viewing IDS as “something which looks like an Anti-Virus”. This view is strengthened by the behaviour of most systems: you have rules, they trigger, you attempt to fix the problem and you upgrade or download the rules. The fundamental problem is not so much the perception of IDS as the perpetration of a dangerous methodology. Let us consider an example from a real life situation which is closest to IDS: the concept of a sentry at a checkpoint. When instructing a sentry one defines certain criteria under which something (be it a person or a vehicle of some description) is to be allowed to pass the checkpoint. One does not attempt to define all unauthorised behaviour or vehicles as the list is fundamentally infinite. This works quite well and improvements to the system are given by making the “pass rules” more stringent and well-defined. 3.1 White-listing – Describing normality So why do IDS systems (and Anti-Virus systems for that matter) attempt to define all that is bad? The answer is not as simple as one would wish: it is a mixture of historical development and the lure of “attack analysis”. Historically in computing Intrusion Detection has always been the alerting to something being amiss, for example “bad logins”. This was a good example of “white-listing” or the alerting on anything which was not known. Unfortunately “white-listing” has the disadvantage of generating a lot of messages and people started ignoring the output of syslog under Unix. As the alerting moved onto the network the idea somehow changed into the equivalent of an anti-virus (which at the time was already a well-developed concept for MS-DOS systems) losing the “white-listing” concept on the way. Let us now elaborate the second point: the analysis of a new attack or indeed the search for new attacks is fashionable, interesting and challenging. It is therefore a matter of pride to be the first to publish the rule which will catch a new attack. Given the choice a security analyst would much rather play with a new attack than spend his time laboriously analysing the network to write “white-listing” rules. If we consider the ruleset released with Snort version 1.9.0 (see [9]) we find definitions for 2321 rules. All of these rules are “alert” rules, defining traffic to be blocked. Then towards the end of January 2003 the Internet was swamped by a new worm targeting Microsoft SQL server installations (see [7] for an in-depth analysis). Anyone running the base Snort ruleset would have never noticed but neither would someone who had just updated his rules for the simple reason that no rule had yet been written. Furthermore there is no limit to this process: new attacks are published, new rules are written and added to the list. Hence there is no upper bound on the number of rules. How many sites really required remote access to their Microsoft SQL Server? Possibly a handful. So why was the IDS not instructed to alert on any incoming (or indeed, outgoing) connection to the relevant port? 5 The security of SSL certificates issued by so-called “trusted parties” has been sadly found lacking in the crucial step of verifying the identity of the person or company requesting a certificate. These failures in identity verification make it difficult to equate “trust” with SSL certificates despite the marketing efforts. 18 Security and Protection of Information 2003 The key paradigm shift which is required is precisely that NIDS move from a “flag what is known to be bad” mechanism to “flag what is not explicitly allowed”. This mimics closely what has happened in firewall design. Originally firewalls were setup to block specific ports, reflecting the academic nature of the Internet, they are now setup to block everything except what is deemed safe in most installations. The drawback of a white-listing setup is that the number of “false positives” will increase dramatically each time a network change takes place without informing the security team. This does have the beneficial side-effect of enforcing proper communication between the networking and security groups. There are also sites for which white-listing will be fundamentally impossible (such as an ISP’s co-location centre) until much more sophisticated “auto-whitelisting” software becomes available; but any site with a security policy, a small number of well defined external access points and internal routing points should be able to define suitable white-lists. Finally a further benefit: the number of rules required to define white-lists is limited and known. This allows a correct measurement of the performance of a NIDS under realistic conditions. As more NIDS are connected to high-speed networks the issue of packet processing speed becomes of paramount importance. The larger the set of rules which needs to be applied against every single packet the slower the NIDS will perform. If a NIDS is not dropping packets with 1000 rules it is not necessarily the case that it will continue doing so with double the number. In particular as more and more content-based rules are deployed (which require expensive string-matching) the performance can only decrease further. One solution is of course to throw hardware at the problem but that is only a palliative cure, a better solution is attack the problem at its root by deploying white-lists. 3.2 Ubiquity – Monitoring needs to be pervasive Once the paradigm has been shifted it needs to be completed. This entails the understanding that an isolated NIDS is of little use, just like a lone sentry asked to patrol an immense perimeter. For Intrusion Detection to be truly effective it is necessary to move from the sporadic installations designed to tick a box on the security policy to a proper monitoring system. There should be NIDS installations at every entry point into the network, be it external or indeed internal. There should be no situation in which there is a negative answer to the request “pull the data from that network segment”. Once a NIDS permeates the network it becomes possible to follow “alarm flows” and have more than the sporadic data points which a NIDS and maybe a few firewalls can offer. If ubiquitous monitoring is deployed with white-listing then it suddenly becomes possible to monitor abuse throughout the network and indeed “follow through” abuse in the network. It will no longer be possible for an internal issue to grow out of all proportions before it is noticed, often as it tried to “escape” via the firewalls. 3.3 Aggregating, correlating and reducing data The larger the enterprise the heavier the requirements from an ubiquitous NIDS deployment, in particular in terms of staffing needs. Once data is collected there is very little point in it being left on the collecting devices. The first requirement for an advanced NIDS deployment is to have a central “aggregation engine” which takes all the data from the sensors. This should probably be a database and most NIDS vendors these days offer this capability. Having the data in a central location means that it is now available for correlation: a perimeter-wide scan for open ports should most definitely not be reported as a number of individual events but as a single instance of multiple events. The instance is a “port scan”, the multiple events are the individual alerts issued by the sensors. This correlation can be made as sophisticated as required, for example correlating by originating subnet rather than single host or by correlating using routing “autonomous system” numbers. Once the data is correlated it can be reduced: if the scan is to multiple locations but the mechanism is identical there is no need to store all the individual packets. It is sufficient to store a representative packet and then reference the collecting sensors to preserve the entity of the event. Without these three crucial steps an advanced NIDS deployment can only fail under the weight of its own data and the expense in identifying and locating it across the whole network. Security and Protection of Information 2003 19 3.4 Network knowledge, advanced analysis and management Transforming the collected data is still not enough. The bane of all NIDS installations is the incidence of false positives. Attempting to reduce false positives can often lead to the implementation of rulesets which are dangerously restricted in scope. As we discussed previously white-listing can increase the incidence of false positives unless appropriate communication between the networking and security teams is in place. One solution to the false positive problem is to add knowledge of the network to the NIDS. A large number of products used by networking teams store information about the systems on the network from the system type to the operating system. This information should be fed into the NIDS to help it discriminate between attacks which should merely be logged as “informational” and those which instead require immediate action. A trivial example is that of a web site running Apache. It is definitely of interest to catalogue attacks against Microsoft’s IIS web server being used against this web site but clearly they do not pose much of a threat. Conversely an Apache attack should definitely command immediate attention. Even basic discrimination by “log level” can improve the quality of security analysis and response dramatically. It is much simpler to address a small number of problems which have been made immediately obvious by the high log-level than having to look for the same problems in the midst of thousands of irrelevant attacks. Once such a sophisticated system is in place then much more advanced analysis is possible. One interesting option is that of “Differential Firewall Analysis” [24] where the correctness of firewall operation is verified by means of NIDS placed on both sides of the system under monitoring. The rationale behind such an analysis is that the worst case scenario for a firewall is that it is breached via a flaw in its software. There will be no record in the logs of this breach but the intruder will have penetrated the system (or indeed, a user on the inside might be connecting to the outside) and will remain totally undetected. Differential Firewall Analysis attempts to alert to such flaws by verifying and correlating traffic on both sides of the firewall. Finally, how does one manage a system of this size? It is clear that there is no hope of managing rulesets individually: there needs to be a centralised rule repository from which rules for the sensors are “pushed” and activated in unison. This prevents those situations in which half of the sensors are running on the new ruleset and the other half on the older version rendering analysis impossible. Direct interaction with the individual sensors should be prevented with all management, from code updates to individually turning sensors on and off, controlled from a single centralised location where all state is kept. This is also a huge step towards 24x7x365 availability: if state is kept in a centralised location then it becomes much simpler to deploy replacement sensors or indeed fail-over to spares when failures occur. 3.5 Deploying Intrusion Detection suitable for IPv6 The introduction of IPv6 into an environment with a sophisticated NIDS deployment as the one we have been describing represents less of a worry. The most important hurdle is perhaps that of authentication and encryption: a sophisticated NIDS would want to at least verify the validity of the AH in each packet if not check the contents of the ESP. This is perhaps the least tractable of problems: despite the presence of hardware cryptographic acceleration cards and support for them in a number of open source operating systems (in particular OpenBSD, see [25, 26]) there is a noticeable difference between offering fast crypto support for SSL and SSH key generation and decrypting packets on the fly at megabit/s rates. 4 Conclusions Besides integrated cryptography there is little else to differentiate IPv6 from IPv4 technology from an NIDS point of view with the exception of the larger address set. It is the uses of IPv6 technology which present the greatest challenges as they might finally achive what used to be the golden grail a few years ago of “everything on the Internet” which IPv4 did not fulfil. This will make “white listing” of paramount importance as the definition of thousands of blocking rules based on source addresses will simply no longer be possible. Furthermore the proliferation of connected devices will make accurate, pervasive and timely monitoring a core necessity for any enterprise or large network. The single largest contribution to network security in a large environment is education. There is no substitute for generating awareness of dangers such as the blind opening of attachments or the installation 20 Security and Protection of Information 2003 of unauthorised software. It has also been the security industry’s greatest failure that, as more and more sophisticated tools became available, the issue of education has remained. The deployment of IPv6 will place increased pressure on the requirement for a paradigm change from the current localised solutions to a much more distributed system and in particular from the “anti-virus lookalike” to a system finally resembling a proper sentry. An NIDS should only ever be deployed as part of an information security policy that it needs to monitor, just like a sentry is part of a physical security policy. Similarly, as we do not hand a sentry a list of every person banned from crossing a checkpoint, we should not attempt to define rules for every possible type of “bad traffic”. We should instead concentrate on working out what traffic is allowed on a network and define everything else as bad. With IPv6 the size of the address range (2128 possible addresses) and the much more dynamic nature of IPv6 addressing would make blacklisting in firewalls an almost impossible exercise. It would be commendable if the current IPv6 test back-bone, 6Bone, started deploying dNIDS to see what challenges await us before widespread deployment of the new protocol. If we consider the trend towards large distributed computing (European Data Grid, Asian Data Grid and other similar projects, see [28, 29]) which will require more and more address space and network communication, then dNIDS will have to become the security monitoring solution. The amount of processing power and network bandwidth make these grids a formidable opponent should they fall into the wrong hands (and DDoS is precisely about creating “attack” grids). This threat means that in a distributed environment it is pointless to address monitoring at a few, disconnected, points on the grid. It has to be pervasive and ubiquitous, reporting centrally to the CERT responsible for that particular grid which needs to be able to take prompt and informed action before the problem spreads. It is perhaps surprising that NIDS design has come a full circle. Careful reading of the early papers on NID and Shadow describe nothing other than dNIDS systems with the exception of a centralised alert database and enterprise-class management facilities. It is the author’s belief that the reason for this is that, finally, the understanding that a NIDS is not anti-virus software by a different name is starting to diffuse in the industry. 5 Acknowledgements The author would like to thank the Programme Committee for the kind invitation to SPI2003 and Dr. Diana Bosio and Dr. Chris Pinnock for valuable comments to the text. References [1] Todd Heberlein. “NSM, NID and the origins of Network Intrusion Detection”. Private Communication, July 2002. [2] Todd Heberlein et al. “A Network Security Monitor”. Proceedings of the IEEE Computer Society Symposium, Research in Security and Privacy, pages 293-303, May 1990. [3] Stephen Northcutt. “The History of Shadow”. Private Communication, July 2002. [4] Stephen Northcutt and Judy Novak. Network Intrusion Detection. 2nd Edition, Chapter 11, pages 198-199 and 220-221. New Riders, 2001. [5] Stephen Northcutt and Judy Novak. Network Intrusion Detection. 2nd Edition, Chapter 11, page 198. New Riders, 2001. [6] Stephen Northcutt. Intrusion Detection Shadow Style. SANS Institute, 1999. [7] Marc Maiffret. “SQL Sapphire Worm Analysis”. eEye Digital Security, January 2003, http://www.eeye.com/html/Research/Flash/AL20030125.html. [8] Marty Roesch et al. Snort – The Open Source NIDS. http://www.snort.org/. [9] Marty Roesch et al. Snort – The Open Source NIDS. Release 1.9.0, October 2002, http://www.snort.org/dl/snort-1.9.0.tar.gz. [10] Yoann Vandoorselaere, Pablo Belin, Krzysztof Zaraska, Sylvain Gil, Laurent Oudot, Vincent Glaume and Philippe Biondi. Prelude IDS. http://www.prelude-ids.org/. Security and Protection of Information 2003 21 [11] tcpdump. http://www.tcpdump.org/. [12] Paul Innella. “The Evolution of Intrusion Detection Systems”. SecurityFocus, November 2001, http://www.securityfocus.com/infocus/1514. [13] “Overview of IPv6 Projects around the World”. IPv6 Forum, http://www.ipv6forum.org/navbar/links/v6projects.htm. [14] “UK IPv6 Resource Centre – Whois service”. Lancaster University and IPv6 Forum, http://www.cs-ipv6.lancs.ac.uk/ipv6/6Bone/Whois [15] Arrigo Triulzi. “IDS Europe”. https://ids-europe.alchemistowl.org/. [16] S. Bradner and A Mankin. “IP: Next Generation (INng) White Paper Solicitation”. IETF, December 1993, http://www.faqs.org/rfcs/rfc1550.html. [17] S. Deering and R Hinden. “Internet Protocol, Version 6 (IPv6) Specification”. IETF, December 1995, http://www.faqs.org/rfcs/rfc1883.html. [18] S. Kent and R. Atkinson. “IP Authentication Header”. IETF, November 1998, http://www.faqs.org/rfcs/rfc2402.html. [19] S. Kent and R. Atkinson. “IP Encapsulating Security Payload (ESP)”. IETF, November 1998, http://www.faqs.org/rfcs/rfc2406.html. [20] T. Dierks and C. Allen. “The TLS Protocol Version 1.0”. IETF, January 1999, http://www.faqs.org/rfcs/rfc2246.html. [21] Adolfo Rodriguez, John Gatrell, John Karas and Roland Peschke. TCP/IP Tutorial and Technical Overview. Chapter 17. IBM & Prentice Hall, October 2001, http://www.redbooks.ibm.com. [22] Dave Dittrich. “Distributed Denial of Service (DDoS) Attacks/tools”. University of Washington. http://staff.washington.edu/dittrich/misc/ddos/. [23] Jason Anderson. “An Analysis of Fragmentation Attacks”. SANS Reading Room, March 2001, http://www.sans.org/rr/threats/frag attacks.php. [24] Arrigo Triulzi, “Differential Firewall Analysis”. In preparation, February 2003. http://www.alchemistowl.org/arrigo/Papers/differential-firewall-analysis.pdf. [25] “Cryptography in OpenBSD”. The OpenBSD project. http://www.openbsd.org/crypto.html. [26] Theo de Raadt, Niklas Hallqvist, Artur Grabowski, Angelos D. Keromytis and Niels Provos. “Cryptography in OpenBSD: An Overview” in Proceedings of Usenix, 1999. http://www.openbsd.org/papers/crypt-paper.ps. [27] John Leyden. “Code Red bug hits Microsoft security update site”. The Register, July 2001. http://www.theregister.co.uk/content/56/20545.html. [28] “The European Data Grid Project”. CERN. http://eu-datagrid.web.cern.ch/eu-datagrid/. [29] “The Asian-Pacific Grid Project”. http://www.apgrid.org/. [30] Eric Rescorla. ssldump. http://www.rtfm.com/ssldump/ 22 Security and Protection of Information 2003 Until now, there is neither business nor scientific interest for file sharing peer-to-peer networking. The main interest came from (usually) illegal file sharing for movies or music. Hence, individuals develop most of those applications. Some of the developers are doing it really for free but others are doing it to earn some money. They are using a specific business model, which is called 'adware', i.e., software paid by advertisements. Indeed, the application is displaying advertisements (fetched from an Internet server) while the application is running. There is a big exception: Kazaa, which is not developed by an individual but by a company. The revenue of this company is also coming from advertisements displayed by the application. 2 Protocol description 2.1 A three tiers architecture Most of the peer-to-peer applications are using three tiers architecture: • seed hosts: a few hosts with static IP addresses or host names which maintain and serve the list of other peers that are using dynamic IP addresses; • search servers: multiple hosts with dynamic IP addresses, they maintain the list of the shared files; their main role is to allow search operations; • peer hosts: a huge amount of hosts with dynamic IP addresses, they share their local files and retrieves files from other peers. It can be seen that a few hosts, the seed hosts, must always be active and addressable. Those hosts are either hard coded in the application or can be configured by the user (using IRC, forums, web,….as the source of information). The search servers are usually elected by the protocol based on the CPU performance, the network connectivity, … For some applications, the peers and search servers can be collocated. While there are usually just a couple of seed hosts, there are hundreds of thousands of peer hosts. 2.2 Usual steps to execute When a new computer wants to join the peer-to-peer network, it usually takes the following steps: 1. seed: connect to a seed host to get the IP addresses of a couple of other peers (and possibly search servers); 2. register: publish its own IP address to the seed host; 3. connect: to a couple of other peers (and possibly search servers); 4. publish: the list of local files to search servers; 5. search: for desired files on other peers or on search servers; 6. retrieve: files from several peers in parallel to go faster. Of course, operations 5 and 6 can be repeated. Some protocols will not execute all those steps, i.e., Gnutella has no concept of search servers, so the step 4 does not exist. 2.3 Description of Gnutella While Gnutella1 is not the most popular protocol, it is the more distributed one; there are only two kinds of nodes: peers and seed hosts. This is the reason why only Gnutella is described in this paper. 1 Gnutella appears to be named after the famous chocolate brand! 24 Security and Protection of Information 2003 Gnutella protocol specification is publicly available and there is even a long-term project to publish this protocol to the IETF. There are also future extensions to handle the next generation of IP protocol: IPv6. Gnutella is implemented in several applications: Limewire, Bearshare, … both on MS-Windows and Linux operating systems. 2.3.1 Seed and registration Every Gnutella node (called servent from server and client) generates randomly its own unique identification. This identification is called ClientID. Upon start, the servent connects to user configured seed host using a protocol called gwebcache. The seed host will give 20 IP addresses of other active servents. 2.3.2 Connection The new servent will then connect to the other servents by using a TCP connection, each of the TCP connection will be identified by a connection id, conn-id. There is neither authentication nor confidentiality provided on the TCP connection. Since all servents are connected to all other servents through those persistent TCP connections, the Gnutella network is really a partially meshed network. As soon as the servent is connected to other servents, it will also listen to incoming TCP connection from newer servents. The Gnutella protocol is implement by a couple of messages. Those messages have their own random identification, msg-id, which is assumed to be unique in the network. Note: the message usually does not contain any information about the originator (no IP address, no name, no clientID, …) in order to provide anonymity. Every servent maintains a table of existing message identification and from which TCP connection it has been received: <msg-id, conn-id>. Upon receive of a message: • if the msg-id is not yet in the table, the message will be flooded, i.e., forwarded to all other TCP connections; a new entry <msg-id, conn-id> will be added to the table; • else, the message is forwarded to the connection associated to this msg-id. This is typically used for replies. Note: this is very similar to the learning bridge process of IEEE 802.1. As the topology is partially meshed, there are loops in the Gnutella network; hence, all messages include a time to live, TTL, field. This TTL is initialised at a small value by the message originator and decremented by each forwarding servent. As soon as the TTL reaches 0, the message is simply discarded. This simple technique prevents a message to travel forever in the network. 2.3.3 Search When a servent wants to search for files matching a pattern (like '*.avi' file specification), the search pattern is flooded over the network in a message. This servent will be called the consumer servent in this paper. All servents receiving and forwarding this search message will have to reply if they have files matching the search pattern. The reply message will be routed back to the source based on the <msg-id, conn-id> tables. This message contains the list of matching local files, the ClientID and IP address of the servent have the files. Those servents will be called remote servents in this paper. 2.3.4 File retrieval The consumer servent will first try to open a direct TCP connection to the remote servent(s). This connection will use a protocol very similar to HTTP. Obviously, if the remote servent is protected by a firewall preventing connection initiated from the Internet, this connection will fail. But, Gnutella has a way to bypass firewalls. This is the push message. This message is sent over the Gnutella network to the destination ClientID. Then, it is up to the remote servent to open a TCP connection to the consumer servent. This connection originates from the inside of the firewall and will usually be permitted. Security and Protection of Information 2003 25 Note: two servents protected by two firewalls will never be able to exchange files. 3 The threats This paper is not about the obvious copyright infringement when peer-to-peer is used to share copyrighted material. But, it will rather describe other security threats. 3.1 Contents sharing The configuration of peer-to-peer application includes the specification of the local file system directory to share. Some users are naïve enough to share their whole disk including confidential documents… This is worrying since peers outside of the firewall will be able to retrieve this document through the push message. Some applications were also poorly written and were sharing more than expected! E.g., Bearshare could be exploited by a directory traversal when file to be retrieved was called '..\..\config.sys' (the '..' is usually a short cut to change to the upper directory). The retrieved content can also contain virus and Trojans. Specially, when the user is dumb enough to execute a file retrieved anonymously from an unknown location! But, this also applies to non executable contents; e.g., a multi-media .ASF files can also dynamically open URL (which could download hostile java applet, …). 3.2 Worms Peer-to-peer network are usually well connected and fast. This is a target of choice for worms. There have been already a couple of worms for those networks. The spread mechanism is easy: every infected peer simply replies 'yes I have this file' for all search requests and sends the worm renamed as the searched file. Examples include Worm.Kazaa.Benjamin and Gnutella.Mandragore. 3.3 Covert channel Hackers use some Internet Relay Chat, IRC, networks to control remote trojanized PC. They use IRC networks as a covert channel to initiate denial of service attacks from those thousands of remote PC. As the peer-to-peer networks are even larger than IRC ones, it can be expected that they will shortly be used as a covert channel to control trojanized PC. 3.4 Bandwidth hog But, the major issue with peer-to-peer networking is about the network bandwidth utilization… This is specially applicable to schools and Universities where students are downloading numerous large movie files (typically about 600 Mbytes). The existing networks were not provisioning for such network application. This means congested links and a slower response time for all other applications. The Internet Service Providers are also complaining because their business model and network provisioning assumed that residential users (on ADSL or cable modems) were mainly downloading files while with peer-topeer, they are also uploading files. This is forcing some ISP to change their tariff structure and to go to a volume based tariff, which is heavier to manage than the previous flat rate. A large European ISP measured that 40% of its traffic was identified as peer-to-peer in July 2002. But, on the security, this also means that security devices have to handle much more traffic than expected. This is of particular importance for network intrusion detection systems, NIDS. NIDS are sniffing all traffic on a network and checks for attacks (for monitoring, alerting or pro-active prevention of attacks). As the NIDS algorithms are quite complex, a lot of NIDS will give up when the traffic is too high. So, they can miss a real attack in the mass of innocent peer-to-peer traffic. 26 Security and Protection of Information 2003 3.5 Application security The peer-to-peer applications have probably a couple of vulnerabilities (buffer overflow, directory traversal,…) due to bugs in the software code. When those vulnerabilities will be discovered and exploited through the peer-to-peer network, a remote attacker could gain control of thousands of hosts. Those hacked hosts could be used to initiate a denial of service attack or as stepping-stones to crack into other machines. The fact that the source code of most applications is not available makes things even more dangerous since nobody is able to check the security of the code. 3.6 Auto Update In order to make things simpler for their users, some peer-to-peer applications actually use the network to propagate the newer version of the applications. In some cases, the user is not even notified that a new version is installed in background. As the new version is rarely authenticated, it could be possible to inject a faked new version containing Trojans or back doors. This is specifically applicable to the adware where the adware engine is independent of the whole application. Hopefully, there are a couple of tools that can remove adwares from an existing host (notably lavasoft). 4 Blocking peer-to-peer networking? Nowadays, all peer-to-peer protocols can use dynamic and/or configurable TCP ports. This means that firewalls cannot block those applications when the user configures them on non default ports like port 80 (also used by HTTP). 4.1 Throttle strategy If the default ports are blocked by a firewall or by a packet filtering router, the user will notice that something is wrong. He will look on the web for a solution and will configure the application to use non default ports. So, the throttle strategy is simple: • do not block the default port [1]: so, user will keep using those ports; • but, rate limit the traffic on those ports to a very small amount of the available bandwidth It is pretty common for network device like routers and switches to throttle some traffic. This strategy addresses only the bandwidth threat. It does not prevent all other issues. It is mainly used by Universities and schools. 4.2 Block all strategy In the block all strategy, the security policy is stricter: block all traffic to the Internet and only open some well known ports (web, mail, …) and force the traffic to go through an application proxy. If the firewall allows only HTTP traffic, even if the peer-to-peer application is configured to use TCP port 80, the proxy will block the connection since the protocol is not fully HTTP compliant. This strategy is already in place like banks or other security savvy organizations. Peer-to-peer security issues in their premises do not impact them. Nevertheless, if their employees are using the same laptop computers at work (with such a strict security policy) and at home (where there is no security policy enforcement at all), the organization is at risk. Hence, there is a real need for intrusion detection and prevention in the internal network. Security and Protection of Information 2003 27 4.3 Block seeding The block seeding strategy relies on the only static information used by peer-to-peer applications: the static IP addresses (or host names) of the seed hosts. If the security policy prevents any traffic to the seed hosts, then the internal hosts cannot join the peer-to-peer network. The issue is of course to have a list of those seed hosts. For some applications like Kazaa or WinMX, it is easy. But, for Gnutella and eDonkey this is mostly impossible has there are numerous seed hosts and they are changing every week or so. So, this strategy is mostly hypothetical until some vendors offer as a service the list of those hosts (like the black list of spam relays). 4.4 Traffic pattern recognition The last technique is about traffic pattern recognition. As far as the author knows, there is no implementation of this technique yet. It is based on the fact that peer-to-peer communication has a typical pattern: • Long lasting TCP connections; • Single connections to remote hosts; • A lot of traffic. Routers and firewalls using techniques like Netflow or IP flow export [3] could spot the peer-to-peer hosts after a couple of minutes and either block or throttle the traffic addressed to this IP address. 5 Conclusions Peer-to-peer networks are a relatively new paradigm for communication. They offer resilience, throughput, anonymity, and performance. Some protocols, like Gnutella, are also well designed for their purpose. Besides the obvious copyright infringement issue, they also have some security threats. Alas, it is currently mostly impossible to identify some IP traffic as peer-to-peer to apply a specific security policy to it. The current mitigation techniques include: • Throttle strategy: do not try to block but rather throttle the traffic so that users will keep using the default ports; • Block all strategy: block all traffic except an explicit list of TCP ports, peer-to-peer protocols will be blocked; • Block seed hosts: block all IP traffic to the seed hosts but difficult to have the list of all seed hosts; • Traffic pattern recognition: recognize the typical traffic pattern and either block or throttle. In summary, this is still a research area since the peer-to-peer developers are also trying to modify the protocol to evade any policy enforcement. 28 Security and Protection of Information 2003 References [1] Ballard, J: File Sharing Programs/Technologies, in http://testweb.oofle.com/filesharing/index.htm, 2001.Seidel, E: The Audio Galaxy Satellite Communication Protocol, in http://homepage.mac.com/macdomeeu/dev/current/openag/agprotocol.html, 2001. [3] Quitteck, J., Zseby, T., Claise, B. and Zander, S., Requirements for IP Flow Information Export, draftietf-ipfix-reqs-09.txt, work in progress, IETF, 2003. [4] Pande, V, Folding@home Distributed Computing, in http://folding.stanford.edu/, Stanford University, 2002. [5] UC Berkley, SETI@Home Search for Extraterrestrial Intelligence, in http://setiathome.ssl.berkeley.edu/, 2003. [6] Global Grid Forum, in http://www.gridforum.org/L_About/about.htm, February 2002. Security and Protection of Information 2003 29