Download Zhichun_Thesis_propo.. - Northwestern University

Document related concepts

Parsing wikipedia , lookup

Post-quantum cryptography wikipedia , lookup

Digital signature wikipedia , lookup

Transcript
RAIDM: Router-based Anomaly/Intrusion
Detection and Mitigation
Zhichun Li
EECS Deparment
Northwestern University
2008-04-29
Thesis Proposal
Outline
•
•
•
•
•
Motivation
RAIDM System Design
Finished Work
Proposed Work
Research Plan
2
Motivation
Attackers
Botnets
Worms
3
Motivation
• Network security has been recognized as
the single most important attribute of their
networks, according to survey to 395
senior executives conducted by AT&T.
• Many new emerging threats make the
situation even worse.
RAIDM Network-based attack defense system
4
Network Level Defense
• Network gateways/routers are the vantage
points for detecting large scale attacks
• Only host based detection/prevention is
not enough for modern enterprise
networks.
– Enterprises might not only want to reply on
their end user for security protection
– User might not want to stop their work to
reboot machines or applications for applying
patches.
5
Outline
•
•
•
•
•
Motivation
RAIDM System Design
Finished Work
Proposed Work
Research Plan
6
Research Questions
• How can we achieve online anomaly
detection for high-speed networks?
• How can we respond to zero-day
polymorphic worms in their early stage?
• Given vulnerabilities, how to protect the
high-speed networks from exploits,
accurately and efficiently?
• How can we provide quality information for
network situational awareness?
7
System Framework
Sent out for
aggregation
Reversible
k-ary sketch
monitoring
Local
sketch
records
Remote
aggregated
sketch
records
Sketch based
statistical anomaly
detection (SSAD)
Part I
Sketchbased
monitoring
& detection
Part III
Streaming
packet
data
Signature
matching
Content-based
engines
signature matching
Token Based Signature
Generation (TOSG)
Protocol semantic
signature matching
To unused IP
blocks
Data path
Length Based Signature
Generation (LESG)
Network
Situational
Awareness
Honeynets/
Honeyfarms
Control path
Modules on
the critical
path
Modules on
the non-critical
path
Part II
Polymorphic
worm
signature
generation
Part IV
Network
Situational
Awareness
8
Current Status
• Part I: Sketch based monitoring & detection
– Result in [Infocom06,ToN,ICDCS06]
• Part II: Polymorphic worm signature generation
– Result in [Oakland06,ICNP07]
• Part III: Signature matching engines
– Work in progress, will be focus of this talk
• Part IV: Network Situational Awareness
– Work in process
9
Outline
•
•
•
•
•
Motivation
RAIDM System Design
Finished Work
Proposed Work
Research Plan
10
Part I: Sketch based monitoring &
detection
• Reversible Sketches (include for completeness)
– Use intelligent hash function design to recover the aggregated
value of a series (key,value) updates for the popular keys.
– Publications:
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta,
Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and
Gokhan Memik, Reversible sketches: Enabling monitoring and
analysis over high speed data streams, in the IEEE/ACM
Transaction on Networking, Volume 15, Issue 5, Oct, 2007
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta,
Elliot Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and
Gokhan Memik, Reverse Hashing for High-speed Network
Monitoring: Algorithms, Evaluations, and Applications, in the
Proc. Of IEEE INFOCOM 2006 (252/1400=18%)
11
Part I: Sketch based monitoring &
detection
• Sketch-based Anomaly Detection
– Build anomaly detection engines based on
reversible sketches to detect horizontal scan,
vertical scan, and TCP SYN flooding attacks.
– Further proposed 2D sketches to differentiate
the different types of attacks.
– Publications
– Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level
Intrusion Detection Approach for High-speed Networks, In Proc.
Of IEEE International Conference on Distributed Computing
Systems (ICDCS) 2006 (75/536=14%) (Alphabetical order)
12
Part II: Polymorphic worm signature
generation
• TOSG (Token-Based Signature Generation)
– Use token (substring) conjunction as the signature for
polymorphic worms
– Advantage
• Do not require protocol knowledge or the information about the
vulnerable program
• Fast and noise tolerant
• Have analytical attack resilience bound under certain assumptions.
– Limitation
• Do not have good attack resilience to the deliberate noise injection
attack [Perdisci 2006]
– Publication
Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao, Hamsa:
Fast Signature Generation for Zero-day Polymorphic Worms with Provable
Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006
(23/251=9%)
13
Part II: Polymorphic worm signature
generation
• LESG (Length-Based Signature Generation)
– Propose to use a set of field lengths of the protocol of
vulnerable program as signatures.
– Mainly work for buffer overflow worms
– Advantage:
• Fast and noise tolerant
• Have analytical attack resilience bound under certain
assumptions
• The bound hold under all the recently proposed attacks.
– Publication
Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and Attackresilient Length Signature Generation for Zero-day Polymorohic Worms, in the Proc.
of IEEE International Conference on Network Protocols (ICNP) 2007 (32/220=14%)
14
Outline
•
•
•
•
•
Motivation
RAIDM System Design
Finished Work
Proposed Work
Research Plan
15
Proposed Work
• Part III: Signature Matching Engine
– NetShield, a protocol semantic vulnerability
signature matching engine. (focus on this talk)
– Report
Zhichun Li, Gao Xia, Yi Tang, Ying He, Yan Chen and Bin Liu,
NetShield : Towards High Performance Network-based Semantic
Signature Matching
16
Proposed Work
• Part IV: Network Situational Awareness
– Botnet Inference:
• Infer scan properties based on honeynet traffic: trend, uniform,
hitlist, and collaboration
• Extrapolate the global scan scope and global number of bots based
on limited local observation. Can be used to detect target attacks.
• Report
Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards
Situational Awareness of Large-Scale Botnet Events using
Honeynets
– P2P Misconfiguration Diagnosis
• Found P2P misconfiguration traffic is one of the major source of
Internet background radiation
• eMule P2P misconfiguration is due to byte ordering
• For BitTorrent, we found anti-P2P company deliberately inject bogus
peers
• Report
Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic,
P2P Doctor: Measurement and Diagnosis of Misconfigured Peer-toPeer Traffic
17
NetShield Overview
• Goal
• Feasibility Study: a Measurement
Approach
• High Speed Parsing
• High Speed Matching for Large Rulesets
• Preliminary Evaluation
• Discussion
18
Signature Matching Engine
• Accuracy (especially for IPS)
– False positive
– False negative
• Speed
• Coverage: Large ruleset
Regular
Expression
Vulnerability
Accuracy
Poor
Much Better
Speed
Good
Good
Coverage
Good
Good
19
Reason
RE
Shield
X
Cannot express
Can express
exact condition
exact condition
Regular expression is not power enough
to capture the exact vulnerability condition!
20
Feasibility Study
• Protocol semantic can help (Shield project
[SIGCOMM04])
• How much for NIDS/IPS?
– Given a NIDS/NIPS has a large ruleset
– What percent of the rules can use protocol
semantic vulnerability signature to improve?
21
Measure Snort rules
• Semi-manually classify the rules.
– First by CVEID
– Manually look at each vulnerability
• Results
– 86.7% of rules can be improved by protocol semantic
vulnerability signatures.
– 9.9% of rules are web DHTML and scripts related
which are not suitable for signature based approach.
– On average 4.5 Snort rules reduce to one
vulnerability signature
– Binary protocols have large reduction ratio than text
based protocols.
22
Towards high speed parsing
• Protocol parsing problem formulation
– Given a PDU and the previous states from
previous PDU, output the set of fields which
required by matching.
• Observation
• Parsing State Machine
23
Observation
• PDU  parse tree
• Leaf nodes (basic
fields ) are integer
or string
• Vulnerability
signature mostly
based on basic
fields
Only need to
PDU
array
parse out
the field related to signatures
24
Parsing State Machine
• Studied eight popular protocols: HTTP,
FTP, SMTP, eMule, BitTorrent, WINRPC,
SNMP and DNS and vulnerability
signatures.
• Protocol semantics are context sensitive
• Common relationship among basic fields.
State
State
State
Sequential
Branch
Loop
Derive
(a)
(b)
(c)
(d)
25
Example for WINRPC
• Nodes
• States: S1 .. Sn
• 0.61 instruction/byte for BIND PDU
merge3
S1-16
Bind-ACK
S0
Header
S1
8
1
3
1
rpc_vers
1 rpc_ver_minor
1
ptype
pfc_flags
packed_drep
2 frag_length
6
merge1
1
4
S0
S4
Bind
20S4
merge2
ncontext
padding
ID
n_tran_syn
padding
16
UUID
4 UUID_ver
tran_syn
2
1
1
Bind-ACK
S2 ‹- 0
S3 ‹- ncontext
Bind
S2++
S2£S3
26
High speed matching
•
•
•
•
Problem formulation
Observation
Candidate Selection Algorithm
Algorithm Refinement
27
Matching Problem Formulation
• Data presentation
– For all the vulnerability signartures we studied we
need integers and strings
– Integer operator: ==, >, <
– String operator: ==, match_re(.,.), len(.),
• Buffer constraint
– The string fields could be too long to buffer.
– Influence whether we can change the matching order
• Field dependency
– Array (e.g., DNS_questions, or RR records)
– Associate array (e.g., HTTP headers)
– Mutual exclusive fields.
28
Matching Problem Formulation (2)
• PDU level protocol state machine
– For complex stateful protocols
– For most stateful protocols the state machine
is quite simple
BIND
request
BIND-ACK
request
CALL
request
error
CALL-ACK
request
WINRPC example
29
Matching problems (cont.)
• Example signature for Blaster worm
• Single PDU matching problem (SPM)
• Multiple PDU matching problem (MPM)
30
Single PDU Matching
• Suppose we have n signatures, each is defined
on k matching dimensions (matchers)
– Matcher is a two tuple (field, operation) or four tuple
for the associate array elements.
– For example:
• (Filename, RE)
• (Version, Range_check)
– Version > 3
– Version == 1
• k is all possible matchers for the n signatures.
31
Table Representation
• We use a n×k table to represent the rules.
k matchers
matcher j
n row
signatures
Sig i
*
32
Requirement for SPM
•
•
•
•
Large number of signatures n
Large number of matchers k
Large number of “don’t cares”
Cannot reorder the matchers arbitrarily (buffer
constraint)
• Field dependency
– Array
– Associate Array
– Mutually exclusive Fields.
33
Compare to packet classification
• Similarity: both problem define on k matching
dimensions and allow wildcards
• Differences:
–
–
–
–
Large k and large number of “don’t cares”
Buffer constraint
Regular expression matcher
Field dependency
• Related work on packet classification
–
–
–
–
Exhaustive search
Decision tree
Tuple space
Divide and Conquer (Decomposition)
34
Difficulty
• A more complex problem than packet
classification
• Packet classification theoretical worst case
bound
– Based on computational geometry
– O ((logN)k-1) worst case time or O (Nk) worst
case memory
• Solution: use the characteristics from real
traces
35
Observation
• Observation 1: most matchers are good.
– After matching against them, only a small number of
signatures can pass (candidates).
– String matchers are all good, most integer matchers
are good.
– We can buffer the bad matchers to change the
matching order
• Observation 2: real world traffic mostly does not
match any signature. Actually even stronger in
most case no matcher will match any rule.
• Observation 3: the NIDS/IPS will report all the
matched rules regardless the ordering. Differ
from firewall rules.
36
Basic idea
• Decide the matcher order at precomputation, buffer the bad ones to the
end if possible
• When a PDU comes, match again each
matcher (column) for all the signatures
simultaneously and get the possible
candidates for next step
• Combine the candidate sets together to
get the final matched signatures
37
Match single matcher
•
•
•
•
Integer range checking: Binary search tree
String exact matching: Trie
String regular expression matching: DFA.
String length checking: Binary search tree
38
Candidate Selection for SPM
• Basic algorithm: pre-computation
ER1
Good Matcher 1
Don’t care of Good Matcher 1
ER1
ER2
Don’t care of both
Extended by
Good Matcher 2 Good Matcher 1 & 2
..
.
ER1
ER2
ER3
ER4
...
Don’t care of all Good
Matcher 1 to n
39
Matching Illustration
PDU={Method=POST, Filename=fp40reg.dll, VARs: name="file"; value~".*\.\./.*",
Headers: name="host"; len(value)=450}
RB1: 1 2 3
S1= {3}
1 2 3
4 5 6
A2 RB1:
candidates
B2RB2:
candidates
S2 = S1 A2+B2 = {3} {}+{6} = {}+{6} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7
S3 = S2 A3+B3 = {6} {}+{} = {6}+{} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7 RB4: 8
S4 = S3 A4+B4 = {6} {4}+{} = {6}+{} = {6}
RB1: 1 2 3
RB2: 4 5 6
RB3: 7 RB4: 8 RB5: 9
S5 = S4 A5+B5 = {6} {6}+{} = {6}+{} = {6}
40
Matching Illustration
• Compute the
operations
– Explicit calculation
• Based on a n×k Bitmap decide the whether an element in Si
requires next matchers.
• For those requires next matchers, search whether it is also in
Ai+1
– Implicit calculation (for bad matchers)
• Do not calculate Ai+1 , since it could be large
• Check whether the candidates in Si can match matcher (i+1)
sequentially
• When buffer bad matchers to the end, the B will be small.
41
Refinement
• SPM improvement
– Allow negative conditions
– Handle array case
– Handle associate array case
– Handle mutual exclusive case
– Report the matched rules as early as possible
• Extend to MPM
– Allowing checkpoints.
42
Results
• Traces from Tsinghua Univ. (TH) and Northwestern Univ.
(NU)
• After TCP reassembly and preload the PDU in memory
• For DNS we only evaluate parsing.
• For WINRPC we have 45 vulnerability signatures which
covers 3,519 Snort rules
• For HTTP we have 791 vulnerability signatures which
covers 941 Snort rules.
43
Discussion
• Currently we found the candidate selection
algorithm works well in practice
• Further thoughts
– How to rely more on hardware assistance?
• TCAM?
• Use bitmap to express set operations?
– Whether we can consider the traffic statistics
to further improve efficiency?
44
Outline
•
•
•
•
•
Motivation
RAIDM System Design
Finished Work
Proposed Work
Research Plan
45
Publications
•
•
•
•
•
•
Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic
Worms, in the Proc. of IEEE ICNP 2007.
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reversible sketches: Enabling monitoring and analysis over high speed
data streams, in the IEEE/ACM Transaction on Networking, Volume 15,
Issue 5, Oct, 2007
Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with
Provable Attack Resilience, in Proc. of IEEE Symposium on Security and
Privacy, 2006
Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust
Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM
SIGCOMM LSAD 2006
Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion
Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS
2006
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reverse Hashing for High-speed Network Monitoring: Algorithms,
Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 46
Research Time Plan
• Apr 2008 – Jun 2008:
– Finish remaining experiments of network situational
awareness
• Sep 2008 – Mar 2008:
– Refine the vulnerability signature matching algorithm
– Fully implement, deploy and evaluate the Netshield
prototype
– Prepare job application and interview
• Apr 2009 – Jun 2009:
– PhD dissertation writing
– Thesis Defense
47
Q&A
Thanks!
48
Backup
49
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
51
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
52
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for a large number
of vulnerability Signatures.
• Evaluation
• Conclusions
53
Outline
• Motivation
• Feasibility Study: a measurement
approach
• Problem Statement
• High Speed Parsing
• High Speed Matching for massive
vulnerability Signatures.
• Evaluation
• Conclusions
54
Limitations of Regular Expression
Signatures
Signature: 10.*01
1010101
10111101
Internet
Traffic
Filtering
X
X
11111100
Our network
00010111
Polymorphism!
Polymorphic attack (worm/botnet)
might not have exact regular
expression based signature
55
What we do?
• Build a NIDS/NIPS with much better accuracy
and similar speed comparing with Regular
Expression based approaches
– Feasibility: Snort ruleset (6,735 signatures) 86.7%
can be improved by vulnerability signatures.
– High speed Parsing: 2.7~12 Gbps
– High speed Matching:
• Efficient Algorithm for matching massive vulnerability rules
• HTTP, 791 vulnerability signatures at ~1Gbps
56
Problem Formulation
• Parsing problem formulation
– Given a PDU and the protocol specification as
input, output the set of fields which required
by matching.
57
Publications
•
•
•
•
•
•
Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic
Worms, in the Proc. of IEEE ICNP 2007.
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reversible sketches: Enabling monitoring and analysis over high speed
data streams, in the IEEE/ACM Transaction on Networking, Volume 15,
Issue 5, Oct, 2007
Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with
Provable Attack Resilience, in Proc. of IEEE Symposium on Security and
Privacy, 2006
Zhichun Li, Yan Chen and Aaron Beach, Towards Scalable and Robust
Distributed Intrusion Alert Fusion with Good Load Balacing, in Proc. of ACM
SIGCOMM LSAD 2006
Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion
Detection Approach for High-speed Networks, In Proc. Of IEEE ICDCS
2006
Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot
Parons, Yin Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik,
Reverse Hashing for High-speed Network Monitoring: Algorithms,
Evaluations, and Applications, in the Proc. Of IEEE INFOCOM 2006 58
Current Status
•
Part I: Sketch based monitoring & detection
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reversible sketches:
Enabling monitoring and analysis over high speed data streams, in the IEEE/ACM
Transaction on Networking, Volume 15, Issue 5, Oct, 2007
– Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Elliot Parons, Yin
Zhang, Peter Dinda, Ming-Yang Kao, and Gokhan Memik, Reverse Hashing for
High-speed Network Monitoring: Algorithms, Evaluations, and Applications, in the
Proc. Of IEEE INFOCOM 2006 (252/1400=18%)
– Yan Gao, Zhichun Li and Yan Chen, A DoS Resilient Flow-level Intrusion Detection
Approach for High-speed Networks, In Proc. Of IEEE International Conference on
Distributed Computing Systems (ICDCS) 2006 (75/536=14%)
(Alphabetical order)
•
Part II: Polymorphic worm signature generation
– TOSG: Zhichun Li, Manan Sanghi, Brian Chavez, Yan Chen and Ming-Yang Kao,
Hamsa: Fast Signature Generation for Zero-day Polymorphic Worms with Provable
Attack Resilience, in Proc. of IEEE Symposium on Security and Privacy, 2006
(23/251=9%)
– LESG: Zhichun Li, Lanjia Wang, Yan Chen and Zhi (Judy) Fu, Network-based and
Attack-resilient Length Signature Generation for Zero-day Polymorohic Worms, in
the Proc. of IEEE International Conference on Network Protocols (ICNP) 2007
(32/220=14%)
59
Current Status
• Part III: Signature matching engines
– Work in progress, will be focus of this talk
– Zhichun Li, Gao Xia, Yi Tang, Jian Chen, Ying He, Yan Chen
and Bin Liu, NetShield : Towards High Performance Networkbased Semantic Signature Matching, in submission
• Part IV: Network Situational Awareness
– Work in process
– Zhichun Li, Anup Goyal, Yan Chen and Vern Paxson, Towards
Situational Awareness of Large-Scale Botnet Events using
Honeynets, in preparation
– Zhichun Li, Anup Goyal, Yan Chen and Aleksandar Kuzmanovic,
P2P Doctor: Measurement and Diagnosis of Misconfigured
Peer-to-Peer Traffic, in submission
60