Download Intrusion Detection Systems and IPv6∗

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Net bias wikipedia , lookup

Wake-on-LAN wikipedia , lookup

Computer network wikipedia , lookup

SIP extensions for the IP Multimedia Subsystem wikipedia , lookup

Internet protocol suite wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Airborne Networking wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Wireless security wikipedia , lookup

Network tap wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Computer security wikipedia , lookup

Deep packet inspection wikipedia , lookup

Peer-to-peer wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Distributed firewall wikipedia , lookup

Transcript
Intrusion Detection Systems and IPv6∗
Arrigo Triulzi
[email protected]
Abstract
In this paper we will discuss the new challenges posed by the introduction of IPv6 to Intrusion Detection
Systems. In particular we will discuss how the perceived benefits of IPv6 are going to create new challenges
for the designers of Intrusion Detection Systems and how the current paradigm needs to be altered to
confront these new threats. We propose that use of sophisticated dNIDS might reduce the impact of the
deployment of IPv6 on the current network security model.
Keywords:
1
Intrusion Detection Systems, IDS, NIDS, Firewalls, Security, IPv6.
Introduction
The field of Intrusion Detection is generally divided into two large categories: Host-based Intrusion Detection Systems (HIDS ) and Network-based Intrusion Detection Systems (NIDS ). The HIDS label is
often used for tools as diverse as anti-virus programs, the venerable UNIX syslog and intrusion detection
software such as portsentry. The NIDS label is similarly abused extending from firewalls to network
intrusion detection software proper such as Snort [8]. A further specialisation is that of “distributed”
NIDS, or dNIDS, which addresses large NIDS systems distributed over a wide network or designed following a distributed-computing paradigm. An excellent example of the latter category is Prelude IDS [10]1 .
We shall be discussing NIDS and in particular the rising need for dNIDS in the context of IPv6.
Historically, long before the appearance of Snort and other tools of the trade, most of the world was
using tcpdump [11] for intrusion detection on the network: a “down to the bare metal” packet dumping
utility. The very first Network Intrusion Detection System, called Network Security Monitor, was written
by Todd Heberlein and colleagues at UC Davis under contract with LLNL between 1988 and 1991 [2, 1].
It was then extended to create NID which found widespread use in the US military [5]. This was rapidly
followed by Shadow, written by Stephen Northcutt and others during 1996 [3, 4, 6] again for use by the
US military. From Shadow onwards there has been an explosion of free and commercial NIDS products
which is beyond the scope of this brief introduction (the more historically minded reader will find more
detailed information in [12]).
There has been little effort to expand the current set of NIDS to support the IPv6 protocol, mainly
due to a lack of demand. Despite years of forecasts of “doom and gloom” when discussing the famous
exhaustion of IPv4 addresses there has been little uptake of IPv6 with the exception of countries such as
Japan which have been very active in promoting it [13, 14]. This has meant that very little non-research
traffic has actually travelled over IPv6 and hence the impetus for new attacks making use of IPv6 features
has been absent.
A trivial example is the author’s personal mail server and IDS web site [15] which has been reachable
on IPv6 since December 2001: there has been only a single SMTP connection over IPv6 in over a year.
This is to a mail server with an average of a thousand connections per day.
At first glance this might indicate that there is little point in pursuing IDS under IPv6 but this should
instead be thought of as an opportunity to be ready before the storm hits as opposed to catching up
afterwards as in the IPv4 space.
2
IPv4 and IPv6
The discussions around IPv6 started a while back, despite what the current lack of acceptance might
suggest, with the first request for white papers being issued in 1993 as RFC1550 [16] using the name
∗ or
“Why giving a sentry an infinite block list is a bad idea”.
they do define themselves as a “hybrid” IDS as they combine some HIDS facilities within a dNIDS design.
1 Although
Security and Protection of Information 2003
15
IPng, for “IP New Generation”. By the end of 1995 the first version of the protocol specification was
published as IETF RFC1883 [17] which formed the basis for the current IPv6 deployment. For a detailed
description of the header formats the reader is referred to [21].
The key differences between IPv6 and IPv4 can be summarised briefly as:
•
•
•
•
•
•
Simplified header,
Dramatically larger address space with 128-bit addresses,
Built-in authentication and encryption packet-level support,
Simplified routing from the beginning,
No checksum in the header,
No fragmentation information in the header.
We shall now take a critical look at the specific differences which are relevant to the IDS professional.
2.1
Simplified header
The comparison between an IPv4 header and an IPv6 header is striking: the IPv6 header is cleaner, with
fewer fields and in particular everything is aligned to best suit the current processors. The rationale behind
this change is simple: memory is now cheap, packing data only means harder decoding on processors
which assume data is aligned on 4 or 8 byte boundaries.
The header is simplified by removing all the fields which years of experience with IPv4 have shown to be
of little or no use. The best way to visualise the cleaner and leaner IPv6 basic header is to look at some
tcpdump output representing the same transaction (an ICMP Echo Request packet) between the same
two hosts over IPv4 and IPv6.
14:39:29.071038 195.82.120.105 > 195.82.120.99: icmp: echo request (ttl 255, id 63432, len 84)
0x0000
4500 0054 f7c8 0000 ff01 4c6e c352 7869
E..T......Ln.Rxi
0x0010
c352 7863 0800 1c31 3678 0000 3e5f 6691
.Rxc...16x..>_f.
0x0020
0001 1562 0809 0a0b 0c0d 0e0f 1011 1213
...b............
0x0030
1415 1617 1819 1a1b 1c1d 1e1f 2021 2223
.............!"#
0x0040
2425 2627 2829 2a2b 2c2d 2e2f 3031 3233
$%&’()*+,-./0123
0x0050
3435 3637
4567
14:40:04.096138 3ffe:8171:10:7::1
0x0000
6000 0000 0010 3a40 3ffe
0x0010
0000 0000 0000 0001 3ffe
0x0020
0000 0000 0000 0099 8000
0x0030
bc5e 5f3e 2f77 0100
> 3ffe:8171:10:7::99:
8171 0010 0007
8171 0010 0007
60fe 4efb 0000
icmp6: echo request (len 16, hlim 64)
‘.....:@?..q....
........?..q....
..........‘.N...
.^_>/w..
The first obvious difference is in the version nibble, a six instead of a four. We then notice how the IPv6
header appears to consist mainly of zeros. This is because the first eight bytes only contain essential data,
everything else being relegated to so-called “extension headers” if need be. In this particular case the
packet contains no extension headers, no flow label, simply the payload length, next header (indicating
an ICMPv6 header), and hop limit (64, hex 40 in the packet dump). This is followed by the 128 bit
addresses of source and destination comfortably aligned on 8-byte offsets making the header disassembly
efficient even on 64-bit CPUs.
¿From an IDS perspective this is excellent because on modern CPUs taking apart the IPv4 header to
detect subtle packet crafting is very inefficient due to the alignment of the data fields. With IPv6 the
decomposition of the various fields of the header and extension headers can take place efficiently. This
can only mean a gain in per-packet processing speed, an important measure when Gbit/s interfaces are
brought into play.
Performance is further enhanced by the lack of fragmentation information in the header: this means that
the basic header does not need to be decoded for fragmentation information. Indeed, fragmentation is
now dealt with by extension headers. This does not necessarily make the fragmentation attacks developed
for IPv4 stacks obsolete (see [23] for a discussion of these attacks) as incorrect datagram reassembly can
still take place.
16
Security and Protection of Information 2003
2.2
Larger address space
At first glance the fact that IPv6 offers 2128 addresses might seem a blessing after all the problems with
address exhaustion in IPv4. The indiscriminate allocation of IPv4 addresses, which were initially thought
to be plentiful, brought us protocol violations such as Network Address Translation, at times also referred
to as “masquerading”, and other “patches” to work around the problem.
If the computers currently connected to IPv4 were moved over to an IPv6 infrastructure then in effect
all that would happen is a renumbering of hosts. A consequence of IPv6 is to be more than a substitute
of IPv4 and bring us closer to the concept of “ubiquitous networking”: the larger address space as an
incitement to connecting everything to the Internet. A number of companies are already working on
so-called “intelligent homes” and in particular Internet-connected home appliances. The deployment of
IPv6 will make it possible for households to be assigned ample addresses for each of their appliances to
be directly connected to the Internet and report on the state of the fridge, the washing machine and so
on.
Let us now wear the paranoid IDS implementor’s hat: to connect to the Internet each of these appliances
must run an embedded operating system with a TCP/IP stack of some description. Furthermore it needs
to have sufficient processing power to offer something like a simple web server for configuration and user
interaction. Let us further imagine that there is a security flaw with this software which allows a remote
user to take over control of the appliance and use it as a Distributed Denial of Service host (see [22] for
an exhaustive discussion of DDoS). Assuming Internet-connected home appliances become widely used
then we are effectively facing the possibility of an attack by an innumerable amount of simple systems
which will not be trivial to patch2 .
Furthermore IPv6 has, by design, simplified the way a device can find its neigbours on a network segment.
A pre-defined link-local “neighbour discovery” procedure for allocating an IPv6 address using ICMPv6
has been drawn up which uses the 48bit Ethernet address as a differentiator between hosts . A NIDS
now loses what little chance it had of discovering devices having a dynamically provided address as the
allocation is no longer “public”, simply changing an Ethernet card can easily blind address-based rules3 .
¿From this we can draw the conclusion that a number of very useful features of IPv6 can seriously turn
against the IDS community as it gains a foothold in the Internet and more devices, in particular embedded
devices, are produced with IPv6 Internet connectivity.
2.3
Authentication and Encryption
Authentication and encryption are available in IPv4 through the use of IPsec [18, 19] which has not been
widely deployed until recently. The main field of application has been that of VPN tunnels and this limited
interest has meant that until most router vendors had implemented it there were few users. Furthermore,
these few were mainly users of implementations running on open source operating systems. The main
factors blocking the widespread use of IPsec have been the complicated setup and key-exchange protocol
which, although necessary for proper security, did require more knowledge than the average systems
manager possessed.
This has meant that the IDS community has been mainly concerned with channels protected by SSL
or TLS (see [20] for a formal definition of TLS). For example an attack over HTTP which is visible to
a NIDS installed between the source and the destination becomes invisible if transmitted over HTTPS
as the standard “string in the payload” matching will fail. A number of solutions to the encrypted
channel problem have been postulated: from session key sharing by the web server allowing “on-the-fly”
decryption to server-side storage of keys for later off-line decryption of captured packets. One solution is
to use ssldump [30] to decode SSL but of course you need control of both endpoints to obtain the session
keys.
The deployment of IPv6 has the potential to worsen the “encrypted attack problem” quite dramatically
as IPv6 has had authentication and encryption built-in since its inception4 . One of the default extension
2 Remote updating of software does not improve the situation much as various incidents with Microsoft’s Windows Update
facility have shown (see, amongst the many, [27]).
3 Note that DHCP is being extended to IPv6 as DHCPv6 but the availability of “neighbour solicitation” as default
provides a far greater challenge.
4 One should note that a few features of IPv4 were rarely used as intended, for example Quality of Service and indeed,
amongst the IP options, a “Security” option which is often referred to as “something used by the US DoD” and deals with
classification. A quick look at a Unix system’s include file /usr/include/netinet/ip.h is a recommended read.
Security and Protection of Information 2003
17
headers for IPv6 is the “IPv6 Authentication Header” (AH ) which is nothing other than the same AH
mechanism as used in IPsec but for IPv6.
As IPv6 deployment increases it will be interesting to see if, unlike IPsec, the AH mechanism is more
widely used and, in particular, if trust can be moved from the weak “secure web server” model5 to the
protocol layer for a large number of applications.
This is a double-edged sword for the IDS community. It is very tempting to think that widespread
availability of encryption and authentication improves security but all this does is move the goal posts.
There is no guarantee that the traffic being carried by the authenticated and encrypted link is legitimate
and furthermore it will now be illegible as far as the NIDS is concerned. The solution of “sharing the
keys” as in the web server scenario becomes the nightmare of sharing the keys of every host on an IPv6
network with the NIDS.
3
Intrusion Detection
Intrusion Detection is moving in two opposite directions: one is the “flattening” towards the low-end of
the market with vendors attempting to sell “shrink-wrapped” IDS packages, the other is the “enterprise”
attempting to pull together all security-related information for a large company.
Both suffer from the same problem: customers are used to viewing IDS as “something which looks like an
Anti-Virus”. This view is strengthened by the behaviour of most systems: you have rules, they trigger,
you attempt to fix the problem and you upgrade or download the rules. The fundamental problem is not
so much the perception of IDS as the perpetration of a dangerous methodology.
Let us consider an example from a real life situation which is closest to IDS: the concept of a sentry at a
checkpoint. When instructing a sentry one defines certain criteria under which something (be it a person
or a vehicle of some description) is to be allowed to pass the checkpoint. One does not attempt to define
all unauthorised behaviour or vehicles as the list is fundamentally infinite. This works quite well and
improvements to the system are given by making the “pass rules” more stringent and well-defined.
3.1
White-listing – Describing normality
So why do IDS systems (and Anti-Virus systems for that matter) attempt to define all that is bad?
The answer is not as simple as one would wish: it is a mixture of historical development and the lure of
“attack analysis”. Historically in computing Intrusion Detection has always been the alerting to something
being amiss, for example “bad logins”. This was a good example of “white-listing” or the alerting on
anything which was not known. Unfortunately “white-listing” has the disadvantage of generating a lot of
messages and people started ignoring the output of syslog under Unix. As the alerting moved onto the
network the idea somehow changed into the equivalent of an anti-virus (which at the time was already a
well-developed concept for MS-DOS systems) losing the “white-listing” concept on the way.
Let us now elaborate the second point: the analysis of a new attack or indeed the search for new attacks
is fashionable, interesting and challenging. It is therefore a matter of pride to be the first to publish the
rule which will catch a new attack. Given the choice a security analyst would much rather play with a
new attack than spend his time laboriously analysing the network to write “white-listing” rules.
If we consider the ruleset released with Snort version 1.9.0 (see [9]) we find definitions for 2321 rules.
All of these rules are “alert” rules, defining traffic to be blocked. Then towards the end of January 2003
the Internet was swamped by a new worm targeting Microsoft SQL server installations (see [7] for an
in-depth analysis). Anyone running the base Snort ruleset would have never noticed but neither would
someone who had just updated his rules for the simple reason that no rule had yet been written.
Furthermore there is no limit to this process: new attacks are published, new rules are written and added
to the list. Hence there is no upper bound on the number of rules.
How many sites really required remote access to their Microsoft SQL Server? Possibly a handful. So why
was the IDS not instructed to alert on any incoming (or indeed, outgoing) connection to the relevant
port?
5 The security of SSL certificates issued by so-called “trusted parties” has been sadly found lacking in the crucial step
of verifying the identity of the person or company requesting a certificate. These failures in identity verification make it
difficult to equate “trust” with SSL certificates despite the marketing efforts.
18
Security and Protection of Information 2003
The key paradigm shift which is required is precisely that NIDS move from a “flag what is known to
be bad” mechanism to “flag what is not explicitly allowed”. This mimics closely what has happened in
firewall design. Originally firewalls were setup to block specific ports, reflecting the academic nature of
the Internet, they are now setup to block everything except what is deemed safe in most installations. The
drawback of a white-listing setup is that the number of “false positives” will increase dramatically each
time a network change takes place without informing the security team. This does have the beneficial
side-effect of enforcing proper communication between the networking and security groups. There are
also sites for which white-listing will be fundamentally impossible (such as an ISP’s co-location centre)
until much more sophisticated “auto-whitelisting” software becomes available; but any site with a security
policy, a small number of well defined external access points and internal routing points should be able
to define suitable white-lists.
Finally a further benefit: the number of rules required to define white-lists is limited and known. This
allows a correct measurement of the performance of a NIDS under realistic conditions. As more NIDS are
connected to high-speed networks the issue of packet processing speed becomes of paramount importance.
The larger the set of rules which needs to be applied against every single packet the slower the NIDS
will perform. If a NIDS is not dropping packets with 1000 rules it is not necessarily the case that it
will continue doing so with double the number. In particular as more and more content-based rules
are deployed (which require expensive string-matching) the performance can only decrease further. One
solution is of course to throw hardware at the problem but that is only a palliative cure, a better solution
is attack the problem at its root by deploying white-lists.
3.2
Ubiquity – Monitoring needs to be pervasive
Once the paradigm has been shifted it needs to be completed. This entails the understanding that an
isolated NIDS is of little use, just like a lone sentry asked to patrol an immense perimeter.
For Intrusion Detection to be truly effective it is necessary to move from the sporadic installations designed
to tick a box on the security policy to a proper monitoring system. There should be NIDS installations
at every entry point into the network, be it external or indeed internal. There should be no situation in
which there is a negative answer to the request “pull the data from that network segment”.
Once a NIDS permeates the network it becomes possible to follow “alarm flows” and have more than
the sporadic data points which a NIDS and maybe a few firewalls can offer. If ubiquitous monitoring is
deployed with white-listing then it suddenly becomes possible to monitor abuse throughout the network
and indeed “follow through” abuse in the network. It will no longer be possible for an internal issue to
grow out of all proportions before it is noticed, often as it tried to “escape” via the firewalls.
3.3
Aggregating, correlating and reducing data
The larger the enterprise the heavier the requirements from an ubiquitous NIDS deployment, in particular
in terms of staffing needs.
Once data is collected there is very little point in it being left on the collecting devices. The first
requirement for an advanced NIDS deployment is to have a central “aggregation engine” which takes all
the data from the sensors. This should probably be a database and most NIDS vendors these days offer
this capability.
Having the data in a central location means that it is now available for correlation: a perimeter-wide
scan for open ports should most definitely not be reported as a number of individual events but as a
single instance of multiple events. The instance is a “port scan”, the multiple events are the individual
alerts issued by the sensors. This correlation can be made as sophisticated as required, for example
correlating by originating subnet rather than single host or by correlating using routing “autonomous
system” numbers.
Once the data is correlated it can be reduced: if the scan is to multiple locations but the mechanism
is identical there is no need to store all the individual packets. It is sufficient to store a representative
packet and then reference the collecting sensors to preserve the entity of the event.
Without these three crucial steps an advanced NIDS deployment can only fail under the weight of its
own data and the expense in identifying and locating it across the whole network.
Security and Protection of Information 2003
19
3.4
Network knowledge, advanced analysis and management
Transforming the collected data is still not enough. The bane of all NIDS installations is the incidence
of false positives. Attempting to reduce false positives can often lead to the implementation of rulesets
which are dangerously restricted in scope. As we discussed previously white-listing can increase the
incidence of false positives unless appropriate communication between the networking and security teams
is in place.
One solution to the false positive problem is to add knowledge of the network to the NIDS. A large
number of products used by networking teams store information about the systems on the network from
the system type to the operating system. This information should be fed into the NIDS to help it
discriminate between attacks which should merely be logged as “informational” and those which instead
require immediate action.
A trivial example is that of a web site running Apache. It is definitely of interest to catalogue attacks
against Microsoft’s IIS web server being used against this web site but clearly they do not pose much
of a threat. Conversely an Apache attack should definitely command immediate attention. Even basic
discrimination by “log level” can improve the quality of security analysis and response dramatically. It is
much simpler to address a small number of problems which have been made immediately obvious by the
high log-level than having to look for the same problems in the midst of thousands of irrelevant attacks.
Once such a sophisticated system is in place then much more advanced analysis is possible. One interesting
option is that of “Differential Firewall Analysis” [24] where the correctness of firewall operation is verified
by means of NIDS placed on both sides of the system under monitoring. The rationale behind such an
analysis is that the worst case scenario for a firewall is that it is breached via a flaw in its software. There
will be no record in the logs of this breach but the intruder will have penetrated the system (or indeed,
a user on the inside might be connecting to the outside) and will remain totally undetected. Differential
Firewall Analysis attempts to alert to such flaws by verifying and correlating traffic on both sides of the
firewall.
Finally, how does one manage a system of this size? It is clear that there is no hope of managing rulesets
individually: there needs to be a centralised rule repository from which rules for the sensors are “pushed”
and activated in unison. This prevents those situations in which half of the sensors are running on the
new ruleset and the other half on the older version rendering analysis impossible. Direct interaction
with the individual sensors should be prevented with all management, from code updates to individually
turning sensors on and off, controlled from a single centralised location where all state is kept. This is
also a huge step towards 24x7x365 availability: if state is kept in a centralised location then it becomes
much simpler to deploy replacement sensors or indeed fail-over to spares when failures occur.
3.5
Deploying Intrusion Detection suitable for IPv6
The introduction of IPv6 into an environment with a sophisticated NIDS deployment as the one we have
been describing represents less of a worry. The most important hurdle is perhaps that of authentication
and encryption: a sophisticated NIDS would want to at least verify the validity of the AH in each packet
if not check the contents of the ESP. This is perhaps the least tractable of problems: despite the presence
of hardware cryptographic acceleration cards and support for them in a number of open source operating
systems (in particular OpenBSD, see [25, 26]) there is a noticeable difference between offering fast crypto
support for SSL and SSH key generation and decrypting packets on the fly at megabit/s rates.
4
Conclusions
Besides integrated cryptography there is little else to differentiate IPv6 from IPv4 technology from an
NIDS point of view with the exception of the larger address set. It is the uses of IPv6 technology which
present the greatest challenges as they might finally achive what used to be the golden grail a few years
ago of “everything on the Internet” which IPv4 did not fulfil. This will make “white listing” of paramount
importance as the definition of thousands of blocking rules based on source addresses will simply no longer
be possible. Furthermore the proliferation of connected devices will make accurate, pervasive and timely
monitoring a core necessity for any enterprise or large network.
The single largest contribution to network security in a large environment is education. There is no
substitute for generating awareness of dangers such as the blind opening of attachments or the installation
20
Security and Protection of Information 2003
of unauthorised software. It has also been the security industry’s greatest failure that, as more and more
sophisticated tools became available, the issue of education has remained.
The deployment of IPv6 will place increased pressure on the requirement for a paradigm change from
the current localised solutions to a much more distributed system and in particular from the “anti-virus
lookalike” to a system finally resembling a proper sentry.
An NIDS should only ever be deployed as part of an information security policy that it needs to monitor,
just like a sentry is part of a physical security policy. Similarly, as we do not hand a sentry a list of every
person banned from crossing a checkpoint, we should not attempt to define rules for every possible type
of “bad traffic”. We should instead concentrate on working out what traffic is allowed on a network and
define everything else as bad. With IPv6 the size of the address range (2128 possible addresses) and the
much more dynamic nature of IPv6 addressing would make blacklisting in firewalls an almost impossible
exercise.
It would be commendable if the current IPv6 test back-bone, 6Bone, started deploying dNIDS to see
what challenges await us before widespread deployment of the new protocol. If we consider the trend
towards large distributed computing (European Data Grid, Asian Data Grid and other similar projects,
see [28, 29]) which will require more and more address space and network communication, then dNIDS
will have to become the security monitoring solution. The amount of processing power and network
bandwidth make these grids a formidable opponent should they fall into the wrong hands (and DDoS
is precisely about creating “attack” grids). This threat means that in a distributed environment it is
pointless to address monitoring at a few, disconnected, points on the grid. It has to be pervasive and
ubiquitous, reporting centrally to the CERT responsible for that particular grid which needs to be able
to take prompt and informed action before the problem spreads.
It is perhaps surprising that NIDS design has come a full circle. Careful reading of the early papers on
NID and Shadow describe nothing other than dNIDS systems with the exception of a centralised alert
database and enterprise-class management facilities. It is the author’s belief that the reason for this is
that, finally, the understanding that a NIDS is not anti-virus software by a different name is starting to
diffuse in the industry.
5
Acknowledgements
The author would like to thank the Programme Committee for the kind invitation to SPI2003 and Dr.
Diana Bosio and Dr. Chris Pinnock for valuable comments to the text.
References
[1] Todd Heberlein. “NSM, NID and the origins of Network Intrusion Detection”. Private Communication, July 2002.
[2] Todd Heberlein et al. “A Network Security Monitor”. Proceedings of the IEEE Computer Society
Symposium, Research in Security and Privacy, pages 293-303, May 1990.
[3] Stephen Northcutt. “The History of Shadow”. Private Communication, July 2002.
[4] Stephen Northcutt and Judy Novak. Network Intrusion Detection. 2nd Edition, Chapter 11, pages
198-199 and 220-221. New Riders, 2001.
[5] Stephen Northcutt and Judy Novak. Network Intrusion Detection. 2nd Edition, Chapter 11, page
198. New Riders, 2001.
[6] Stephen Northcutt. Intrusion Detection Shadow Style. SANS Institute, 1999.
[7] Marc Maiffret. “SQL Sapphire Worm Analysis”. eEye Digital Security, January 2003,
http://www.eeye.com/html/Research/Flash/AL20030125.html.
[8] Marty Roesch et al. Snort – The Open Source NIDS.
http://www.snort.org/.
[9] Marty Roesch et al. Snort – The Open Source NIDS. Release 1.9.0, October 2002,
http://www.snort.org/dl/snort-1.9.0.tar.gz.
[10] Yoann Vandoorselaere, Pablo Belin, Krzysztof Zaraska, Sylvain Gil, Laurent Oudot, Vincent Glaume
and Philippe Biondi. Prelude IDS.
http://www.prelude-ids.org/.
Security and Protection of Information 2003
21
[11] tcpdump. http://www.tcpdump.org/.
[12] Paul Innella. “The Evolution of Intrusion Detection Systems”. SecurityFocus, November 2001,
http://www.securityfocus.com/infocus/1514.
[13] “Overview of IPv6 Projects around the World”. IPv6 Forum,
http://www.ipv6forum.org/navbar/links/v6projects.htm.
[14] “UK IPv6 Resource Centre – Whois service”. Lancaster University and IPv6 Forum,
http://www.cs-ipv6.lancs.ac.uk/ipv6/6Bone/Whois
[15] Arrigo Triulzi. “IDS Europe”. https://ids-europe.alchemistowl.org/.
[16] S. Bradner and A Mankin. “IP: Next Generation (INng) White Paper Solicitation”. IETF, December
1993,
http://www.faqs.org/rfcs/rfc1550.html.
[17] S. Deering and R Hinden. “Internet Protocol, Version 6 (IPv6) Specification”. IETF, December 1995,
http://www.faqs.org/rfcs/rfc1883.html.
[18] S. Kent and R. Atkinson. “IP Authentication Header”. IETF, November 1998,
http://www.faqs.org/rfcs/rfc2402.html.
[19] S. Kent and R. Atkinson. “IP Encapsulating Security Payload (ESP)”. IETF, November 1998,
http://www.faqs.org/rfcs/rfc2406.html.
[20] T. Dierks and C. Allen. “The TLS Protocol Version 1.0”. IETF, January 1999,
http://www.faqs.org/rfcs/rfc2246.html.
[21] Adolfo Rodriguez, John Gatrell, John Karas and Roland Peschke. TCP/IP Tutorial and Technical
Overview. Chapter 17. IBM & Prentice Hall, October 2001,
http://www.redbooks.ibm.com.
[22] Dave Dittrich. “Distributed Denial of Service (DDoS) Attacks/tools”. University of Washington.
http://staff.washington.edu/dittrich/misc/ddos/.
[23] Jason Anderson. “An Analysis of Fragmentation Attacks”. SANS Reading Room, March 2001,
http://www.sans.org/rr/threats/frag attacks.php.
[24] Arrigo Triulzi, “Differential Firewall Analysis”. In preparation, February 2003.
http://www.alchemistowl.org/arrigo/Papers/differential-firewall-analysis.pdf.
[25] “Cryptography in OpenBSD”. The OpenBSD project. http://www.openbsd.org/crypto.html.
[26] Theo de Raadt, Niklas Hallqvist, Artur Grabowski, Angelos D. Keromytis and Niels Provos. “Cryptography in OpenBSD: An Overview” in Proceedings of Usenix, 1999.
http://www.openbsd.org/papers/crypt-paper.ps.
[27] John Leyden. “Code Red bug hits Microsoft security update site”. The Register, July 2001.
http://www.theregister.co.uk/content/56/20545.html.
[28] “The European Data Grid Project”. CERN. http://eu-datagrid.web.cern.ch/eu-datagrid/.
[29] “The Asian-Pacific Grid Project”. http://www.apgrid.org/.
[30] Eric Rescorla. ssldump. http://www.rtfm.com/ssldump/
22
Security and Protection of Information 2003
Until now, there is neither business nor scientific interest for file sharing peer-to-peer networking. The main
interest came from (usually) illegal file sharing for movies or music. Hence, individuals develop most of those
applications.
Some of the developers are doing it really for free but others are doing it to earn some money. They are using a
specific business model, which is called 'adware', i.e., software paid by advertisements. Indeed, the application
is displaying advertisements (fetched from an Internet server) while the application is running.
There is a big exception: Kazaa, which is not developed by an individual but by a company. The revenue of this
company is also coming from advertisements displayed by the application.
2 Protocol description
2.1
A three tiers architecture
Most of the peer-to-peer applications are using three tiers architecture:
•
seed hosts: a few hosts with static IP addresses or host names which maintain and serve the list of other
peers that are using dynamic IP addresses;
•
search servers: multiple hosts with dynamic IP addresses, they maintain the list of the shared files;
their main role is to allow search operations;
•
peer hosts: a huge amount of hosts with dynamic IP addresses, they share their local files and retrieves
files from other peers.
It can be seen that a few hosts, the seed hosts, must always be active and addressable. Those hosts are either
hard coded in the application or can be configured by the user (using IRC, forums, web,….as the source of
information).
The search servers are usually elected by the protocol based on the CPU performance, the network
connectivity, … For some applications, the peers and search servers can be collocated.
While there are usually just a couple of seed hosts, there are hundreds of thousands of peer hosts.
2.2
Usual steps to execute
When a new computer wants to join the peer-to-peer network, it usually takes the following steps:
1.
seed: connect to a seed host to get the IP addresses of a couple of other peers (and possibly search
servers);
2.
register: publish its own IP address to the seed host;
3.
connect: to a couple of other peers (and possibly search servers);
4.
publish: the list of local files to search servers;
5.
search: for desired files on other peers or on search servers;
6.
retrieve: files from several peers in parallel to go faster.
Of course, operations 5 and 6 can be repeated.
Some protocols will not execute all those steps, i.e., Gnutella has no concept of search servers, so the step 4 does
not exist.
2.3
Description of Gnutella
While Gnutella1 is not the most popular protocol, it is the more distributed one; there are only two kinds of
nodes: peers and seed hosts. This is the reason why only Gnutella is described in this paper.
1
Gnutella appears to be named after the famous chocolate brand!
24
Security and Protection of Information 2003
Gnutella protocol specification is publicly available and there is even a long-term project to publish this protocol
to the IETF. There are also future extensions to handle the next generation of IP protocol: IPv6. Gnutella is
implemented in several applications: Limewire, Bearshare, … both on MS-Windows and Linux operating
systems.
2.3.1
Seed and registration
Every Gnutella node (called servent from server and client) generates randomly its own unique identification.
This identification is called ClientID.
Upon start, the servent connects to user configured seed host using a protocol called gwebcache. The seed host
will give 20 IP addresses of other active servents.
2.3.2
Connection
The new servent will then connect to the other servents by using a TCP connection, each of the TCP connection
will be identified by a connection id, conn-id. There is neither authentication nor confidentiality provided on the
TCP connection. Since all servents are connected to all other servents through those persistent TCP connections,
the Gnutella network is really a partially meshed network.
As soon as the servent is connected to other servents, it will also listen to incoming TCP connection from newer
servents.
The Gnutella protocol is implement by a couple of messages. Those messages have their own random
identification, msg-id, which is assumed to be unique in the network. Note: the message usually does not contain
any information about the originator (no IP address, no name, no clientID, …) in order to provide anonymity.
Every servent maintains a table of existing message identification and from which TCP connection it has been
received: <msg-id, conn-id>. Upon receive of a message:
•
if the msg-id is not yet in the table, the message will be flooded, i.e., forwarded to all other TCP
connections; a new entry <msg-id, conn-id> will be added to the table;
•
else, the message is forwarded to the connection associated to this msg-id. This is typically used for
replies.
Note: this is very similar to the learning bridge process of IEEE 802.1.
As the topology is partially meshed, there are loops in the Gnutella network; hence, all messages include a time
to live, TTL, field. This TTL is initialised at a small value by the message originator and decremented by each
forwarding servent. As soon as the TTL reaches 0, the message is simply discarded. This simple technique
prevents a message to travel forever in the network.
2.3.3
Search
When a servent wants to search for files matching a pattern (like '*.avi' file specification), the search pattern is
flooded over the network in a message. This servent will be called the consumer servent in this paper.
All servents receiving and forwarding this search message will have to reply if they have files matching the
search pattern. The reply message will be routed back to the source based on the <msg-id, conn-id> tables. This
message contains the list of matching local files, the ClientID and IP address of the servent have the files. Those
servents will be called remote servents in this paper.
2.3.4
File retrieval
The consumer servent will first try to open a direct TCP connection to the remote servent(s). This connection
will use a protocol very similar to HTTP.
Obviously, if the remote servent is protected by a firewall preventing connection initiated from the Internet, this
connection will fail. But, Gnutella has a way to bypass firewalls. This is the push message. This message is sent
over the Gnutella network to the destination ClientID. Then, it is up to the remote servent to open a TCP
connection to the consumer servent. This connection originates from the inside of the firewall and will usually
be permitted.
Security and Protection of Information 2003
25
Note: two servents protected by two firewalls will never be able to exchange files.
3 The threats
This paper is not about the obvious copyright infringement when peer-to-peer is used to share copyrighted
material. But, it will rather describe other security threats.
3.1
Contents sharing
The configuration of peer-to-peer application includes the specification of the local file system directory to
share. Some users are naïve enough to share their whole disk including confidential documents… This is
worrying since peers outside of the firewall will be able to retrieve this document through the push message.
Some applications were also poorly written and were sharing more than expected! E.g., Bearshare could be
exploited by a directory traversal when file to be retrieved was called '..\..\config.sys' (the '..' is usually a short
cut to change to the upper directory).
The retrieved content can also contain virus and Trojans. Specially, when the user is dumb enough to execute a
file retrieved anonymously from an unknown location!
But, this also applies to non executable contents; e.g., a multi-media .ASF files can also dynamically open URL
(which could download hostile java applet, …).
3.2
Worms
Peer-to-peer network are usually well connected and fast. This is a target of choice for worms.
There have been already a couple of worms for those networks. The spread mechanism is easy: every infected
peer simply replies 'yes I have this file' for all search requests and sends the worm renamed as the searched file.
Examples include Worm.Kazaa.Benjamin and Gnutella.Mandragore.
3.3
Covert channel
Hackers use some Internet Relay Chat, IRC, networks to control remote trojanized PC. They use IRC networks
as a covert channel to initiate denial of service attacks from those thousands of remote PC.
As the peer-to-peer networks are even larger than IRC ones, it can be expected that they will shortly be used as a
covert channel to control trojanized PC.
3.4
Bandwidth hog
But, the major issue with peer-to-peer networking is about the network bandwidth utilization…
This is specially applicable to schools and Universities where students are downloading numerous large movie
files (typically about 600 Mbytes). The existing networks were not provisioning for such network application.
This means congested links and a slower response time for all other applications.
The Internet Service Providers are also complaining because their business model and network provisioning
assumed that residential users (on ADSL or cable modems) were mainly downloading files while with peer-topeer, they are also uploading files. This is forcing some ISP to change their tariff structure and to go to a volume
based tariff, which is heavier to manage than the previous flat rate. A large European ISP measured that 40% of
its traffic was identified as peer-to-peer in July 2002.
But, on the security, this also means that security devices have to handle much more traffic than expected. This
is of particular importance for network intrusion detection systems, NIDS. NIDS are sniffing all traffic on a
network and checks for attacks (for monitoring, alerting or pro-active prevention of attacks). As the NIDS
algorithms are quite complex, a lot of NIDS will give up when the traffic is too high. So, they can miss a real
attack in the mass of innocent peer-to-peer traffic.
26
Security and Protection of Information 2003
3.5
Application security
The peer-to-peer applications have probably a couple of vulnerabilities (buffer overflow, directory traversal,…)
due to bugs in the software code.
When those vulnerabilities will be discovered and exploited through the peer-to-peer network, a remote attacker
could gain control of thousands of hosts. Those hacked hosts could be used to initiate a denial of service attack
or as stepping-stones to crack into other machines.
The fact that the source code of most applications is not available makes things even more dangerous since
nobody is able to check the security of the code.
3.6
Auto Update
In order to make things simpler for their users, some peer-to-peer applications actually use the network to
propagate the newer version of the applications. In some cases, the user is not even notified that a new version is
installed in background.
As the new version is rarely authenticated, it could be possible to inject a faked new version containing Trojans
or back doors.
This is specifically applicable to the adware where the adware engine is independent of the whole application.
Hopefully, there are a couple of tools that can remove adwares from an existing host (notably lavasoft).
4 Blocking peer-to-peer networking?
Nowadays, all peer-to-peer protocols can use dynamic and/or configurable TCP ports. This means that firewalls
cannot block those applications when the user configures them on non default ports like port 80 (also used by
HTTP).
4.1
Throttle strategy
If the default ports are blocked by a firewall or by a packet filtering router, the user will notice that something is
wrong. He will look on the web for a solution and will configure the application to use non default ports.
So, the throttle strategy is simple:
•
do not block the default port [1]: so, user will keep using those ports;
•
but, rate limit the traffic on those ports to a very small amount of the available bandwidth
It is pretty common for network device like routers and switches to throttle some traffic. This strategy addresses
only the bandwidth threat. It does not prevent all other issues. It is mainly used by Universities and schools.
4.2
Block all strategy
In the block all strategy, the security policy is stricter: block all traffic to the Internet and only open some well
known ports (web, mail, …) and force the traffic to go through an application proxy. If the firewall allows only
HTTP traffic, even if the peer-to-peer application is configured to use TCP port 80, the proxy will block the
connection since the protocol is not fully HTTP compliant.
This strategy is already in place like banks or other security savvy organizations. Peer-to-peer security issues in
their premises do not impact them.
Nevertheless, if their employees are using the same laptop computers at work (with such a strict security policy)
and at home (where there is no security policy enforcement at all), the organization is at risk. Hence, there is a
real need for intrusion detection and prevention in the internal network.
Security and Protection of Information 2003
27
4.3
Block seeding
The block seeding strategy relies on the only static information used by peer-to-peer applications: the static IP
addresses (or host names) of the seed hosts. If the security policy prevents any traffic to the seed hosts, then the
internal hosts cannot join the peer-to-peer network.
The issue is of course to have a list of those seed hosts. For some applications like Kazaa or WinMX, it is easy.
But, for Gnutella and eDonkey this is mostly impossible has there are numerous seed hosts and they are
changing every week or so.
So, this strategy is mostly hypothetical until some vendors offer as a service the list of those hosts (like the black
list of spam relays).
4.4
Traffic pattern recognition
The last technique is about traffic pattern recognition. As far as the author knows, there is no implementation of
this technique yet.
It is based on the fact that peer-to-peer communication has a typical pattern:
•
Long lasting TCP connections;
•
Single connections to remote hosts;
•
A lot of traffic.
Routers and firewalls using techniques like Netflow or IP flow export [3] could spot the peer-to-peer hosts after
a couple of minutes and either block or throttle the traffic addressed to this IP address.
5 Conclusions
Peer-to-peer networks are a relatively new paradigm for communication. They offer resilience, throughput,
anonymity, and performance. Some protocols, like Gnutella, are also well designed for their purpose.
Besides the obvious copyright infringement issue, they also have some security threats. Alas, it is currently
mostly impossible to identify some IP traffic as peer-to-peer to apply a specific security policy to it.
The current mitigation techniques include:
•
Throttle strategy: do not try to block but rather throttle the traffic so that users will keep using the
default ports;
•
Block all strategy: block all traffic except an explicit list of TCP ports, peer-to-peer protocols will be
blocked;
•
Block seed hosts: block all IP traffic to the seed hosts but difficult to have the list of all seed hosts;
•
Traffic pattern recognition: recognize the typical traffic pattern and either block or throttle.
In summary, this is still a research area since the peer-to-peer developers are also trying to modify the protocol
to evade any policy enforcement.
28
Security and Protection of Information 2003
References
[1]
Ballard, J: File Sharing Programs/Technologies, in http://testweb.oofle.com/filesharing/index.htm,
2001.Seidel,
E:
The
Audio
Galaxy
Satellite
Communication
Protocol,
in
http://homepage.mac.com/macdomeeu/dev/current/openag/agprotocol.html, 2001.
[3]
Quitteck, J., Zseby, T., Claise, B. and Zander, S., Requirements for IP Flow Information Export, draftietf-ipfix-reqs-09.txt, work in progress, IETF, 2003.
[4]
Pande, V, Folding@home Distributed Computing, in http://folding.stanford.edu/, Stanford University,
2002.
[5]
UC Berkley, SETI@Home Search for Extraterrestrial Intelligence, in http://setiathome.ssl.berkeley.edu/,
2003.
[6]
Global Grid Forum, in http://www.gridforum.org/L_About/about.htm, February 2002.
Security and Protection of Information 2003
29