Download Payload Attribution via Hierarchical Bloom Filters

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Lag wikipedia , lookup

Recursive InterNetwork Architecture (RINA) wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Computer network wikipedia , lookup

Asynchronous Transfer Mode wikipedia , lookup

Net bias wikipedia , lookup

Wake-on-LAN wikipedia , lookup

List of wireless community networks by region wikipedia , lookup

Zero-configuration networking wikipedia , lookup

Airborne Networking wikipedia , lookup

IEEE 1355 wikipedia , lookup

Distributed firewall wikipedia , lookup

Network tap wikipedia , lookup

Packet switching wikipedia , lookup

Deep packet inspection wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

UniPro protocol stack wikipedia , lookup

Transcript
ForNet: A Distributed Forensic Network
Kulesh Shanmugasundaram
http://isis.poly.edu/projects/fornet/
Security Fails….
What Mechanisms Do We Have?

Logging mechanisms and audit trails

Alerts/Logs from security components:


Host Logs:




Logs only perceived security threats
First thing to get disabled upon intrusion
An insider would rather use a host without any logs
Mobility, wireless networks create new problems
Packet Logs:



Usually at the edges hence blinded easily
Can’t keep data for long…
E.g: Infinistream, NFR, NetWitness, SilentRunner
What do we need?


A reliable mechanism for analysis and
attribution
An effective response model




Current response model is mostly manual
Response time is in days or weeks
Digital evidence disappears quickly
Tools that complement security components


Need some safety net to fall back
Forensics is that net
Our Solution…



Securely collect, store, disseminate, and
process synopsis of network traffic.
In other words a device analogous to a
surveillance camera for the network. Even
better – a co-operating network of such
surveillance cameras.
Goal of Project ForNet: development of tools,
techniques, and infrastructure to aid rapid
investigation and identification of cyber crimes
ForNet: A Unified, End-to-End Approach to Network
Forensics
ForNet Domain: A domain covered by
single monitoring and privacy policies.
Forensic Server: Responsible for
archiving synopses, query processing &
routing, enforcing monitoring, security
policies, for the domain.
SynApp: equipped routers or hosts.
Primary function is to create
synopses of network traffic. May
have limited query processing and
storage component as well.
Design Goals and Trade-Offs
1.
2.
3.
4.
5.
4.
5.
6.
7.
“Complete” & “Correct” Evidence
Longevity
Succinctness
Accessibility of Evidence
Security & Privacy of Evidence
Ubiquity
Incremental Deployment
Modular
Scaleable Design
Research Challenges
• High-speed packet capture
• Storage systems
• Databases
• GUI
• Legally admissible evidence
• Chain of custody
• Privacy
Legal Issues
• Secure architecture
• Attacks on ForNet
• Use of ForNet
Systems
• Identification of events
• Architecture of ForNet
• Query Routing
• Fault Tolerance
Networking ,
Distributed
Systems
Security
Theory
• Design of Synopses
• Quantitative Analysis
Components of ForNet

Architectural components can be grouped
under following functions:
1.
Data Collection
SynApp
2. Data Retention
Monitoring Policy
3. Data Dissemination
Query Language
Panorama
Forensic Server
Privacy Policy
Architecture of a SynApp
Network Stream
Synopses Sent To
Forensic Server
Network Filter
HBF
Monitoring Synopses
Policy
Library
Neoflow
Configuration
Manager
Histograms
Synopsis Controller
Buffer
Manager
Synopses in ForNet
Data
Type
Link

Protocol Layer
IPs,ports,
protocol
 TOS, TTL
Link Content

audio,
video…

Potentia
l
Evidenc
e

Packet
Type of payload

IM Keywords
 Content

packet sizes
 inter-arrival
times
Mappings
MAC to IP
 DNS to IP
 VirtualHost to
IP
 AS-Name to IP

Aggregates
#Packets
 Delays etc.


Connection record
Synopse  Neoflow
s
 Header-dump

Neoflow-NG*
 HBF
 Packet-dump

MAC Records
 DNS Records
 HTTP Records*
 BGP Records*

Histogram
*
 SCBF

* Currently being implemented
Architecture of Forensic Server
Query From Response To
An Analyst
The Query
Security
Manager
Steppingstones
Find Zombies
Configuration
Manager
Scan Detection
BGP Tables
XML
Generator
Histogram
Query
Planner
DNS Records
Privacy
Manager
Neoflow
Query
Parser
HBF
Archive
Manager
Forensic Server

Forensic Server has the following functions:




Data archiving
Query processing
Policy management and enforcement
Data Archiving

Periodically receives data from SynApps & archives them on
disk.
Forensic Server (cont…)

Query Processor:

XML-based query language


Does coalescence of synopses



Given a query can locate appropriate synopses
Generate and execute a query plan to answer the query
Distributed query processing


No relational databases (uses Berkeley DB)
Currently being implemented to talk to nearby forensic servers
to answer/corroborate queries/evidence
Policy Enforcer:

Two types of policies in ForNet:


Monitoring Policy– informs user of what is being monitored
Privacy Policy – informs user of how/with whom data is shared
etc.
Architecture of Panorama

A user interface for:



Building queries
Data mining
Visualization
Queries,
Various Commands to
Forensic Server
Chart Generator
Communication
Manager
User
Interface
Manager
XML
Parser
Plug-in
Manager
Network Graphs
3D Link Maps
Summary Views
Data Cleansing
Miners
Capturing Evidence on The Internet

Internet is a gigantic state machine:



To support forensics we need to know its precise
state at any given time!
Therefore, we need to keep track of state
transitions
State transitions can be characterized by:
Links/connections between nodes
2.
Link content
3.
Protocol mappings and aliases
4.
Aggregates generated by state transitions
Archiving raw traffic for this information is an overkill!
1.
What is a Synopsis?
Properties of a Good Synopsis

Contains enough data to answer certain classes of
queries



Contains enough data to quantify confidence of its
answers


I’m 99.37% sure bug.poly.edu sent “XYZ”
Have small memory footprint and easy to update



Who sent payload “xyz”?
What did host bug.poly.edu send?
Need 20GB/day to keep 1TB/day of raw network data
Need to compute two hashes per packet
Resource requirements are tunable
Advantages of Using Synopses
Can retain potential evidence for months!




Succinct representation of raw data makes it possible to transfer
network data to disks
Sharing/transferring raw data over network is impossible but
synopsis can be moved to remote sites
Query processing would be expensive with raw data
 What’s the frequency of traffic to port 80 in the past week?
(raw data vs. a histogram)
Easily adaptable to various resource requirements
 Can adopt the size, processing requirements of a Bloom
Filter based on various hardware resources and network
load
Allows for cascading different techniques!
Cascading of Synopses
Core
Enterprise
(40Gbps)
Edge(100Mbps)
Subnet
(1Gbps)
(50Mbps)
SCBF
Simple CR
Topology maps
CR-BF
Topology maps
Payload Sampling
MAC-layer maps
Neoflow
HBF
DNS mappings
Packet Digests & Bloom Filters


Snoeren et. al. used it successfully in SPIE for single
packet traceback (“Hash-Based IP Traceback”)
Space Efficient:



16-bits per packet (m/n=16) and 8 hashes (k=8)
false positive (FP) = 5.74 x 10-4
No false negatives!
However, suppose we don’t have packets.


We only have some excerpts of payload
Don’t know where the excerpt was aligned in the packet
Extend Bloom Filters to support excerpt/substring matchi
Block-based Bloom Filter
P=
s
create s-byte
blocks of payload
P=
p0
p1
p2
…
…
p(n/s)
append blocks’
Offset (in payload)
P=
(p0|0)
(p1|s)
(p2|2s)
…
…
Insert each block into a Bloom Filter
(p(n/s)|n/s)
Q=
s
Q=
q0
create s-byte blocks
of query string
q1
q2
…
…
q(l/s)
Try all possible offsets
(q0|0)
H1(q0|0)=1
H2(q0|0)=1
H3(q0|0)=0
(q0|s)
q0=p1
(q1|2s)
q1=p2
(q2|3s)
q2=p3
X
“q0q1q2” was seen
in a payload at
offset ‘s’
BBF =
(p0|0)
(p1|s)
(p2|2s)
(p2|3s)
…
(p(n/s)|n/s)
P1 =
A
B
R
A
C
A
P2 =
C
D
A
B
R
A
(A|0)
(B|s)
(R|2s)
(A|3s)
(C|4s)
(A|5s)
(C|0)
(D|s)
(A|2s)
(B|3s)
(R|4s)
(A|5s)
(A|0)
(B|s)
(R|2s)
(A|3s)
(C|4s)
(A|5s)
(C|0)
(D|s)
(A|2s)
(B|3s)
(R|4s)
(A|5s)
BBF =
“Offset Collisions”
For query strings: “AD”, “CB”, “DR”, “AA” etc. BBF
falsely identifies them as seen in the payload!
Because BBF cannot distinguish between P1 and P2
Hierarchical Bloom Filter
• An HBF is basically a set of BBF for geometrically
increasing sizes of blocks.
level-2
level-1
(p0p1p2p3|0)
(p0p1|0)
level-0 (p0|0)
P=
(p1|s)
(p2p3|2s)
(p2|2s)
…
(p(n/s-1)p(n/s)|(n/s-1))
…
(p(n/s)|n/s)
Hierarchical Bloom Filter
level-2
level-1
(q0q1q2q3|0)
(q0q1|0)
level-0 (q0|0)
(q1|s)
(q2q3|2s)
(q2|2s)
(q3|3s)
(q(l/s-1)q(l/s)|(l/s-1))
…
(q(n/s)|l/s)
Q=
• Querying is similar to BBF.
• Matches at each level can be confirmed a level above.
Adapting an HBF for ForNet



So far an HBF can attest for the presence of a
bit-string in payloads
We need to tie this bit-string to a source
and/or destination hosts
Our Approach:



Similar to tying an offset to a block/bit-string
In addition to inserting (block||offset) also insert
(block||offset||hostid)
Hostid could be (srcIP||dstIP)
Coalescence of Synopses

HBF requires:



Connection Record/Neoflow


Source IP, destination IP, excerpt
But where do we get Source IP, Destination IP
Given two time intervals can give us list of source, destination IPs
Coalesce these synopses
Connection
record/Neoflow
Who sent “DEADBEEF”
between time T1 and T2?
Query
Processor
Give all connections between
time T1 and T2?
(source, destination) IPs of connections
between time T1 and T2?
Did (source, destination) IP send
“DEADBEEF” between time T1 and T2?
HBF
YES/NO
ForNet in Intranet Usage

Investigation based on payload characteristics




Investigation based on connection characteristics



Detection of zombies
Detection of malware based on connection pattern
Investigation based on aggregate characteristics



Determine victims of worm, trojans and other malware.
Detection of potential victims of phising and spyware
Source of IP theft
Insider abuse
Network resource abuse
Etc Etc …
ForNet Deployed on Internet

Investigation based on payload characteristics



Investigation based on connection
characteristics


Traceback based on partial content of single
packets
Source of malware, worms, etc.
Stepping stone detection
Investigation based on aggregate
characteristics

Attack attribution
Current Status

Implemented a PC based SynApp device for placement
within an intranet.


Implemented Forensics Server with simple querying
capabilities.





Connection records, HBF, DNS, NewFlow, Mappings
Current Forensic Server has 1.3TB of storage with over 3
months worth of data from the edge-router and two subnets
Normal bandwidth consumption of network is about a 1 – 2
TB/day
Synopses reduces this traffic to about 20GB/day
A 4TB Forensic Server will take over operations in July
Implemented Panorama (GUI Client)
Tracking MyDoom

Recorded all email traffic for a week



When signatures became available we used them to
query the system



Using HBF and raw traffic
Was not aware of MyDoom during this collection
To find hosts that are infected in our network
How the hosts were infected
Some statistics:

679 hosts originated at least one copy of the virus


52 of which were in our network
These hosts sent out copies of the virus to 2011 hosts outside
our network boundary
Analyzing MyDoom Infections…
MyDoom’s Weekly Progress…
Neoflow-NG

Keeps track of everything Neoflow has plus:

Individual packet sizes


Inter-arrival times of packets


Be able to recall the packet sizes in the same sequence
Be able to recall the arrival times of individual packets
Content type of flow


Be able to recall types of content carried by flows
Audio, Video, Plain-Text, Encrypted, Compressed etc.
Investigating Network Resource
Abuse
Total Network Bandwidth
Composition
Detection of Network Proxies
Zombie Detection

Are they any zombies in Poly network? And if
so, how do we identify them?

Criterion: Hosts that portscan and use IRC at likely
zombies.
 Tabulate the amount of IRC activity for each
host.
 Tabulate the amount of portscan activity for each
host.
 For each host: ZOMBIE_LIKELIHOOD =
PORTSCANS * IRC_FLOWS
 Sort all hosts by their ZOMBIE_LIKELIHOOD.
 List the top 200 hosts.
Work currently in progress…
Identification of useful network events


Network is the virtual crime scene that holds evidence in
the form of network events
Developing efficient synopses




Handling connection oriented & connectionless traffic
Techniques for payload attribution (esp. IRC/IM)
Automatic cascading of various synopsis techniques
Analysis and mining techniques



Zombie detection
Stepping stone detection
Work currently in progress…

Integration of information from synopses across
networks


Development of a protocol for secure communication of
various ForNet components
Query processing and storage of synopses




A query language transparent of various underlying synopsis
techniques
A query manage system to interpret the language for the
underlying database
Various storage and garbage collection strategies for
collected-SynApps
Storage and query processing infrastructure for Forensics
Servers
Research Challenges

Integration of information from synopses across
networks



Query processing and storage of synopses



Real power of ForNet is realized when information from
SynApps is fused to answer queries
Development of a protocol for secure communication of
various ForNet components
A query language transparent of various underlying synopsis
techniques
A query management system to interpret the language for
the underlying database
Analysis

A frame-work to compare effectiveness/trade-offs of
synopses
Attacks on ForNet

Resource Exhaustion:


Malicious Transformation:


Create packets of length (blocksize – 1)
Stuffing:



Flood the network with random bits of data
Stuff every other block with application dependent escape characters
For smaller blocks we can try to guess for larger blocks it is not possible!
Exploiting Collisions:

Hash collisions



Packet collisions


Very unlikely for strong hash functions
We use a random seed for every HBF so it makes it more difficult
A possibility in Block-based Bloom Filters but not in HBFs
Streaming transformations:

Encryption, compression
For more information visit:
http://isis.poly.edu/projects/fornet/