Download FIFO Buffer for iNetmon to Monitor Gigabit Network

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
FIFO Buffer for iNetmon to Monitor Gigabit Network
R. Sureswaran, S.J. Choi & B. Rahmat
School of Computer Sciences
Universiti Sains Malaysia
11800 Penang, Malaysia
E-mail: [email protected]
Abstract. iNetmon is software application which provide an intelligent tool to
assist network and system administrators by providing intelligent information
for preventive measures to be taken so that damages as a result of system or
network down time that can be very costly is minimized. Such Real-time
network analysis helps to detect and resolves network faults and performance
problems quickly. It even has the power to analyze multi-topology, multiprotocol networks automatically. This is heavily system dependent so the
implementation is quite different in the various operating systems. The packet
capture section of the kernel should be quick and efficient because it must be
able to capture packets even on high-speed LANs with heavy traffic, and able to
limit losses of packets and using a small amount of system resources. It should
also be general and flexible in order to be used by different types of applications
like analyzers, network monitors, network test applications etc. The user-level
capture application receives packets from the system, interprets and processes
them, and outputs them to the user in a comprehensible and productive way. It
should be easy to use, system independent, modular and expandable, in order to
support the greater number of protocols. Moreover it should be allowed to
increase the number of decoded protocols in a simple way. These features are
essential because of the high number of network protocols currently available
and the speed they change, so completeness and expandability are important.
Basically, buffer overflow occurs when a program or process tries to store more
data in a buffer (temporary data storage area) than it was intended to hold. Since
buffers are created to contain a finite amount of data, the extra information which has to go somewhere - can overflow into adjacent buffers, corrupt or
overwrite the valid data held by the buffer. For real time application such as
iNetmon, it is required to handle the minimum of 1488000 packets per second
for 1GB Ethernet. At this alarming data rate, data must be buffered before it can
be process or else it will be lost. So to limit packet loss, the driver should be
able to store the incoming packets in a buffer because the user-level application
could not be ready to process them at their arrival. Buffered packets will be
transferred as soon as the application is ready. This is where the storage buffer
comes into place to solve this problem. Storage buffer is a memory buffer,
which when it fills, new data overwrites the oldest data in the buffer.
2
R. Sureswaran, S.J. Choi & B. Rahmat
1 Introduction
Computer networks and telecommunication technologies are nowadays used in a wide
range of applications. The success of the Internet brought networking in every house
and company, and every day new applications and technologies are created. In this
scenario, the power and complexity of computer networks are growing every day.
This enhances the possibilities of the final user, but makes harder the work of who has
to design, maintain and secure a network.
For this reason there is an increasing need of tools which is able to analyze,
diagnose and test the functionality and the security of networks. These tools, in order
to perform their work, it must be able to obtain the data transiting on a network,
capturing it while the network is working. The capture process consists in obtaining,
listening on the network, every transiting frame, independently from its source or
destination. The great number of transmission techniques and communication
protocols complicate this task. Moreover, performance is very important in order to
capture from fast networks at full speed without any data loss.
There are two main methods to capture data from a network: the first method is
based on the use of dedicated hardware, while the second makes use of the hardware
of a normal PC or workstation connected to the communication channel. In this
second method, the network adapter of the computer is used to obtain the frames from
the network, and the software carries on a large amount of the capture process.
Our solution to the problem is Real-Time Network Monitoring software called
iNetmon. Normally the software solution has lowest performance, particularly on
slow machines, but it is cheaper, easier to modify and upgrade. For this reason it is
widely adopted on the most used network architectures, where the performance of
dedicated hardware is not needed.
Due to low performance on this solution, problems arise. We found that the
capturing process is faster then decoding process although both of these processes are
threaded. As soon as the capture process catches up with the decode process, undecoded packets are over written by the newly captured one, which means that
packets are lost and data analysis is inaccurate.
The proposed solution to the problem is to use a buffer to store the packets receive
from the capture process while decode process completes the un-decoded packets.
The main goal of the project is to develop a suitable buffer for iNetmon in order the
solve iNetmon packet loss problem. The buffer needs to have the following
characteristics:
a) Able to arrange the incoming packets according to the packets arrival time.
b) Able to expand and shrink to accommodate different network situation.
c) Read and Write operation should be relatively fast.
Furthermore, detail analysis on the performance of the buffer will be done in order
to evaluate the usefulness of the solution. The analysis involves measuring how well
the buffer responds to different network load. Besides that, the speed of the capturing
process catches up with the decoding process will be determined and the optimum
FIFO Buffer for iNetmon to Monitor Gigabit Network
3
network load that the capturing process and the decoding process can run at the same
rate would be determined as well.
2 Background Study on Methodology
2.1 Capturing Method
There are various approaches to capture network traffic for network monitoring
application such as iNetmon for the monitoring of the network. The two common
approaches are the passive and active approaches. Both have their values and should
be regarded as complementary, in fact they can be used in conjunction with one
another. Basically, the major difference between two approaches is that active
approach requires having network communication to gather data on network
condition, where as passive approach does not participate in any network
communication, it is just quietly listening to all network traffic that pass through it.
2.1.1 Active Capturing Approach
This is for network devices with an internal RMON (Remote MONitoring) probe, the
RMON Extension™ gives you access to the information gathered internally by the
network devices or for network devices with internal SNMP (Simple Network
Management Protocol) agents, the SNMP Extension™ gives you access to SNMP
alerts (traps) and current status of the network devices. They can be built into other
devices such as routers, switches or end node hosts. Active monitor polled the devices
are periodically and information is collected (in the case of SNMP devices the data is
extract from Management Information Bases (MIB)) to assess network performance
and status.
There are problems over the approaches mentioned above, SNMP 1 only reports
whether devices are functioning properly. This can prove to be too vague and doesn’t
pinpoint the problem-causing device. Industry has attempted to define a new set of
protocols called SNMP 2 that can provide additional information upon recognition of
a problem. However, standardization efforts have not been successful.
Although, RMON can prove to be a very useful network-monitoring tool, it is still
plagued with various problems besides high costs. Even with the introduction of
RMON 2, a problem with incompatibility between vendor implementations has still
not been alleviated. Because of this RMON tool vendors have been adding propriety
extensions to their products to make them more attractive to network managers who
are demanding more functionality. There is still much risk of becoming dependent on
a single vendor.
4
R. Sureswaran, S.J. Choi & B. Rahmat
2.1.2 Passive Capturing Approach
Packet Driver is a low level-capturing component for network monitoring application,
which interacts directly with the Network Interface Card (NIC) running in the
Operating system kernel level, provides interface for user level application as shown
in the diagram below.
Fig. 1. A simple architecture for a passive capturing approach
The packet drivers is the major component of passive approach, basically it
provides high level Application Program Interface for capturing wireless packet (A
set of routines provides to network monitoring Application such as iNetmon to direct
the performance of packet capturing procedures by a computer's operating system).
One disturbingly powerful aspect of packet drivers is their ability to place the hosting
machine's network adapter into "promiscuous mode." Network adapters are running in
promiscuous mode receive not only the data directed to the machine hosting the
sniffing software, but also all the traffic on the physically connected local network.
The passive approach does not increase the traffic on the network for the
measurements. It also measures real traffic. However, the polling required to collect
the data and the traps and alarms (SNMP and RMON) all generate network traffic,
which can be substantial. Further the amount of data gathered can be substantial
especially if one is doing flow analysis or trying to capture information on all packets.
The passive approach is extremely valuable in network trouble-shooting, however
it is limited in it ability emulate error scenarios or isolating the exact fault location.
That’s why an advanced network traffic monitoring application like iNetmon will
work over the problem this approach faces.
FIFO Buffer for iNetmon to Monitor Gigabit Network
5
2.1.2.1 Non-BPF Method
To capture packets transferred by a network, a capture application needs to interact
directly with the network hardware. For this reason the operating system should offer
a set of capture primitives to communicate and receive data directly from the network
adapter. Goal of these primitives is basically to capture the packets from the network
(hiding the interaction with the network adapter), and transfer them to the calling
programs. This is heavily system dependent so the implementation is quite different in
the various operating systems. The packet capture section of the kernel should be
quick and efficient because it must be able to capture packets also on high-speed
LANs with heavy traffic, limiting losses of packets and using a small amount of
system resources. It should also be general and flexible in order to be used by
different types of applications (analyzers, network monitors, network test applications
etc).
The user-level capture application receives packets from the system, interprets and
processes them, and outputs them to the user in a comprehensible and productive way.
It should be easy to use, system independent, modular and expandable, in order to
support the greater number of protocols. Moreover it should be allowed to increase
the number of decoded protocols in a simple way. These features are essential
because of the high number of network protocols currently available and the speed
they change, so completeness and expandability are important.
WinDump.exe is only the upper part of a packet capture stack that is composed by
a module that runs at the kernel level and one that runs at user level. These two
modules have different purposes and are independent and isolated one from another.
The first runs at ring 0 on Intel based machines, while the second runs at ring 3 like a
normal Windows program. The kernel part is Windows specific and it is very
different according to various Windows flavors. The user-level part is very similar to
the UNIX implementation and it is the same under Win95 and WinNT. The next
figure shows the structure of the capture stack from the network adapter to an
application like WinDump.
At the lowest level there is the network adapter. It is used to capture the packets
that circulate in the network. During a capture the network adapter usually works in a
particular mode (‘promiscuous mode’) that forces it to accept all the packets instead
of the ones directed to it only. Packet Capture Driver is the lowest level software
module of the capture stack. It is the part that works at kernel level and interacts with
the network adapter to obtain the packets. It supplies the applications a set of
functions used to read and write data from the network at data-link level.
PACKET.DLL works at the user level, but it is separated from the capture
program. It is a dynamic link library that isolates the capture programs from the driver
providing a system-independent capture interface. It allows WinDump to be executed
on different Windows flavors without being recompiled. The pcap library, or libpcap,
is a static library that is used by the packet capture part of the WinDump program. It
uses the services exported by PACKET.DLL, and provides to WinDump a higher
6
R. Sureswaran, S.J. Choi & B. Rahmat
level and powerful capture interface. Notice that it is statically linked with WinDump,
i.e. it is part of the WinDump.exe executable file.
The user interface is the higher part of the WinDump program. It manages the
interaction with the user and displays the result of a capture. We will now describe
these modules, their behavior and the architectural choices that affect them.
Fig. 2. A Non BPF-packet capture mechanism design
A basic network capture driver can be quite simple. It needs only to read the packets
from the network driver and copy them to the application. However, in order to obtain
acceptable performances, substantial improvements need to be done to this basic
structure. The most important are:
 To limit packet loss, the driver should be able to store the incoming packets in a
buffer because the user-level application could not be ready to process them at
their arrival. Buffered packets will be transferred as soon as the application is
ready.
 In order to minimize the number of context switch between the application (that
runs in user mode) and the driver (that runs in kernel mode), it should be possible
to transfer several packets from the buffer to the application using a single read
call.
FIFO Buffer for iNetmon to Monitor Gigabit Network

7
The user-level application must receive only the packets it is interested in,
usually a subset of the whole network traffic. An application must be able to
specify the type of packets it wants (for example the packets generated by a
particular host) and the driver will send to it only these packets. In other words
the application must be able to set a filter on the incoming packets, receive only
the subset of them that satisfy the filter. A packet filter is simply a function with a
Boolean returned value applied on a packet. If the returned value is TRUE, the
driver copies the packet to the application. Otherwise the packet is ignored.
The implementation of these features and the architecture of the driver were inspired
by the BSD Packet Filter (BPF) of the UNIX kernel, of which we make a brief
description in the next paragraph.
2.1.2.2 BSD Packet Filter
To allow such tools to be constructed, a kernel must contain some facility that gives
user-level programs access to raw, unprocessed network traffic.
The BSD Packet Filter (BPF) uses a new, register-based filter evaluator that is up to
20 times faster than the original design. BPF also uses a straightforward buffering
strategy that makes its overall performance up to 100 times faster than Sun’s NIT
running on the same hardware.
The performance increase is the result of two architectural improvements: BPF
uses a re-designed, register-based ‘filter machine’ that can be implemented efficiently
on today’s register based RISC CPU. CSPF used a memory-stack-based filter
machine that worked well on the PDP-11 but is a poor match to memory-bottlenecked
modern CPUs. BPF uses a simple, non-shared buffer model made possible by today’s
larger address spaces. The model is very efficient for the ‘usual cases’ of packet
capture.
BPF has two main components: the network tap and the packet filter. The network
tap collects copies of packets from the network device drivers and delivers them to
listening applications. The filter decides if a packet should be accepted and, if so, how
much of it to copy to the listening application. Figure 3 illustrates BPF’s interface
with the rest of the system. When a packet arrives at a network interface the link level
device driver normally sends it up the system protocol stack. But when BPF is
listening on this interface, the driver first calls BPF. BPF feeds the packet to each
participating process’ filter. This user-defined filter decides whether a packet is to be
accepted and how many bytes of each packet should be saved. For each filter that
accepts the packet, BPF copies the requested amount of data to the buffer associated
with that filter. The device driver then regains control. If the packet was not addressed
to the local host, the driver returns from the interrupt. Otherwise, normal protocol
processing proceeds.
8
R. Sureswaran, S.J. Choi & B. Rahmat
Fig. 3. BPF’s interface with the rest of the system
Since a process might want to look at every packet on a network and the time
between packets can be only a few microseconds, it is not possible to do a read
system call per packet and BPF must collect the data from several packets and return
it as a unit when the monitoring application does a read. To maintain packet
boundaries, BPF encapsulates the captured data from each packet with a header that
includes a time stamp, length, and offsets for data alignment.
Because network monitors often want only a small subset of network traffic, a
dramatic performance gain is realized by filtering out unwanted packets in interrupt
context. To minimize memory traffic, the major bottleneck in most modern
workstations, the packet should be filtered ‘in place’ (e.g., where the network
interface DMA engine put it) rather than copied to some other kernel buffer before
filtering. Thus, if the packet is not accepted, only the host refers those bytes that were
needed by the filtering process.
FIFO Buffer for iNetmon to Monitor Gigabit Network
9
2.2 Buffer Method
2.2.1 Link List
A linked list is a chain of structs or records called nodes. Each node has at least two
members, one of which points to the next item or node in the list! These are defined
as Single Linked Lists because they only point to the next item, and not the previous.
Those that do point to both are called Doubly Linked Lists or Circular Linked Lists.
The advantage of a linked list is that it can grow or shrink, and it contains -exactly
the number of entries you put in. It’s easy to get at the first entry, and it’s easy to
iterate from the front to the back. We could use it to hold anything we wanted. The
only drawback is that each record must be an instance of the same structure. This
means that we couldn't have a record with a char pointing to another structure holding
a short, a char array, and a long. Again, they have to be instances of the same
structure for this to work. Another cool aspect is that each structure can be located
anywhere in memory; each node doesn't have to be linear in memory.
Accessing entries in the middle is not a strong point of lists. Furthermore as the list
gets longer the searching time for a specific node will increase dramatically. If these
features meet your needs, fine. If not, you might prefer a doubly linked list—one with
references to both the previous and the next links. The iteration for such a list can
only be able to move forward, not backward. The trade-off is more data for each link
and more code for managing both links.
Fig. 4. A simple link list
2.2.2 Circular Buffer
A circular buffer is a block of memory that has two associated pieces of data: a front
index and a rear index, both of which refer to certain index in the block of memory.
Typically, circular buffers are used to implement queues, or FIFO (first-in, first-out)
buffers. When a sample is enqueued in the circular buffer, it is stored at the location
of the rear index. When a sample is dequeued in the circular buffer, the value to
which the front index points is returned. The buffer is called circular because when
the front and rear indices reach the logical end of the buffer, each index is simply
reset to point to the first location in the buffer.
10
R. Sureswaran, S.J. Choi & B. Rahmat
Fig. 5. A Circular Buffer
The buffer is usually implemented as an array, and pointers in this array indicate
the positions of reader and writer. When one of these pointers reaches the end of the
buffer, it swaps back to the start of the buffer and continues from there. So, data is
lost when the rear pointer overtakes the front pointer, this happens when write process
is faster than the read process; data is read multiple times if the front pointer overtakes
the rear pointer, this happen when the read process is faster than the write process. It's
straightforward to use a lock to avoid these situations. In that case, the lock makes the
buffer blocking. A lock can also be set on each data item in the buffer, in order to
avoid concurrent access of the same data.
Two common options for buffers (especially in real-time applications) are:
 Locking in memory. The memory used for the buffer should not be swapped out
of the physical RAM.
 Buffer Half Full interrupt. The reader and/or writer tasks can both raise a
software interrupt if the buffer is more than half full or half empty. This interrupt
will then wake up the other part of the IPC such that it can take the appropriate
actions to prevent the buffer from overflowing or getting empty.
Locking is some form of mutual exclusion to synchronize concurrent operation
where it protect the consistency of the concurrent data structure by allowing only one
process (the holder of the lock of the data structure) at a time to access the data
structure and by blocking all the other processes that try to access the concurrent data
structure at the same time. Basically, we can see problems such as priority inversion
FIFO Buffer for iNetmon to Monitor Gigabit Network
11
dead lock scenarios and performance bottleneck. The time that a process can spend
blocked while waiting to get access to the critical section can form a substantial part
of the algorithm execution time.
For an advanced circular buffer, instead of using one single shared memory array,
a swinging buffer uses two or more. The writer fills up one of the buffers, while the
reader empties another one. Every time one of the tasks reaches the end of its buffer,
it starts operating on a buffer that the other task is not using.
Beside array, circular buffer can also be implemented as a circular link list. This
type of data structure might be useful, for example, if you wanted to model a token
ring or FDDI network but not for real time application.
Fig. 6. Circular link list implementation for a circular buffer
2.2.3 FIFO Queue
Fig. 7. A basic FIFO queue
This data structure is organized as "First In, First Out", or "FIFO" - the first data
that enters into the data structure is also the first data that will be removed. This is
how a line of people waiting in the Post Office behaves, etc. It is also called a FirstCome-First-Served (FCFS) queue.
Data is written into the tail of the queue and read from the head of the queue.
When an enqueue command is given, the input data is loaded into the queue which is
currently the tail of the queue. This is the newest data. When a dequeue command is
given, data is read from the queue which is currently the head of the queue. The head
contains the oldest data. The position of the head and tail and status of empty queue
must be maintained. The FIFO queue can be implemented as a circular buffer, a RAM
12
R. Sureswaran, S.J. Choi & B. Rahmat
with two ring counters for the head and the tail selection, or as a flow-through FIFO
in which the FIFO must be filled completely and then emptied completely.
Basically is loss-free, non-blocking (except for the writer when the FIFO is
completely full, or for the read when it is completely empty), exclusive (the writer
and reader are shielded from each other's accesses to the data in the FIFO), 1-to-1
(i.e., only one writer and one reader), and buffered (i.e., FIFOs put data in a queue,
where the writer adds data on one end, and the reader reads it at the other end).
Fig. 8. Data flow of the FIFO buffer
The doubly linked list (Fig. 7) would be well suited to model a queue. Data is
inserted always as the new first node (the node that "Tail" points to), and is always
removed from the end of the list (the node pointed to by "Head"). Note that we do not
need the operations of insertion or removal of a node in the middle of the list. It is
also useful in storing queued requests which should be responded to in the order of
their arrival. It is also useful in digital signal processing where outputs are computed
as linear combinations of a finite set of previous inputs.
Since it is also linked list, there by it have capabilities like it can grow or shrink
accordingly, and it contains ­exactly the number of entries you put in. It’s easy to get
at the first entry and access the last entry, and it’s easy to iterate from the front to the
back and back to from.
With all these features, FIFO queue using doubly link list is the better choice
against the other. Firstly, simple link list has only one pointer to the head node and
adding node to the end of the list is computational expensive because each time we
need to traverse from head to insert it. As for circular buffer, packets loss will happen
when writing operation is faster than the reading operation is lock is not introduce. If
lock is implemented, it introduces new problems such as priority inversion dead lock
scenarios and performance bottleneck. Besides that, if process holding a lock is
preempted, any other processes waiting for the lock is unable to perform any useful
work until the process that holds the lock is schedule. It also tends to produce a large
amount of memory contention and locks become hot memory spots.
Finally, FIFO queue is non-blocking. It has significant advantage over lock based
one: It avoids lock convoys and contentions.
FIFO Buffer for iNetmon to Monitor Gigabit Network

13
It provides high fault tolerance and eliminates deadlock scenarios, where two or
more task is waiting for lock held by each others.
 It does not give priority inversion scenarios.
The only drawback of a doubly link list FIFO buffer is that sequential search is
necessary to look up an element by value. The search can be slow if the list is long.
Since search is not required in iNetmon, FIFO queue is considered as the most
suitable candidate to solve iNetmon packet loss problem.
3 Proposed Design Methodology / Framework
3.1 Main Architecture Implemtation
14
R. Sureswaran, S.J. Choi & B. Rahmat
3.2 FIFO Queue Buffer
3.2.1 Writing Packet into Buffer
void CCircularBuffer::Write(CPacket * writePacket)
{
If (FIFO Buffer is empty)
Create FIFO buffer;
Else
Add writePacket to the Tail of the buffer;
}
FIFO Buffer for iNetmon to Monitor Gigabit Network
3.2.2 Reading Packet into Buffer
CPacket * CCircularBuffer::Read()
{
CPacket * readPacket;
If (FIFO buffer is empty){
return a NULL packet pointer;
}
Else {
Get the head node and point the head
pointer to the next node.
Return the head node.
}
}
15
16
R. Sureswaran, S.J. Choi & B. Rahmat
4 Conclusions
FIFO Queues are ubiquitous in real time programs, and their performance is a matter
of major concern. In this report, I have presented a FIFO queue algorithm that is
simple, non-blocking, practical, and fast.
This work is part of a larger project called iNetmon that seeks to evaluate the
tradeoffs among alternative data structures. Structures under consideration include
List, Circular Buffer and FIFO queue.
The objective of iNetmon is to create an intelligent and centralize tools to assist
network and system administrators by anticipating and giving intelligent information
for preventive measures to be taken so that damages as a result of system or network
down time that can be very costly is minimized. Without an effective and efficient
data structure to store incoming data as fast as 1Gigabits per second, iNetmon will be
a cripple software which only able to monitor network during low load and while the
network doesn’t have any problem.
The proposed solution using FIFO queue has shown its ability and seem to be well
suited to solve iNetmon problem but the major problem that needs to be solved by
iNetmon is a much faster decoding operation because this is where the bottleneck
happens. A Decode operation which able to process data rate as fast as 1Gigabits per
second is needed or else the computer will soon run out of memory as the capture
operation’s speed catches up with decode operation’s speed.
Future research on this topic includes analysis on other data structure, which can
be a possible solution to iNetmon on packet loss problem.