Download Masters Project Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Masters Project Defense
Investigating Techniques For Identifying Thread Behavior and Evaluating
Alternative Automatic Classification Methods in a Realistic Middleware System
Robert C Broeckelmann Jr.
29 Nov 2007
Where We Started…
• Began meeting with Dr. Gill in September, 2006.
• Officially, started working in January, 2007.
• Explored how an OS Scheduler could be extended to
determine, before a scheduling decision, if
– a thread is displaying an undesirable behavior
– operating outside of a predefined range
• Can information available to an IDS system be fed to an OS
scheduler?[22,23,24,25,26]
2
Where We Went…
• What other information is available to make
such decisions?
• How do we gather & process this information?
• How Do You classify the High-Level Function
of Threads based upon this data?
• Practical use in Industry.
3
Original Test Environment
• Spent Spring Semester building a test environment.
– Several dead ends.
• Original Test Environment consisted of:
– VMWare Workstation 5.x[10]
– Fedora Core 6[11]
– Custom build of KURT Linux 2.6.18,.19,.20/STREAMs[12]
– Custom Linux Kernel build 2.6.18[13]
– Java 1.5[1,3,5,8,9]
– Strace[14]
• KURT Linux incapable of capturing System Calls per lwp out-of-the-box.
• Explored using strace on Linux—problems with high-thread counts.
4
Final Test Environment
• Final Test environment
– 2 Dell PCs(2 CPU, 2GB memory)
– OpenSolaris 2.11[28]
– Java 1.6[4,5,6,8,9]
– JBoss 3.2.8b[15]
– MySQL v5.0 Community Server[29]
– JMeter v2.2[31]
– Java PetStore eCommerce application (J2EE Spec v1.3).[32]
– PetStore configuration adaption for JBoss [33]
• Had to move to Java 1.6.0 that ships with OpenSolaris 2.11 in order to
utilize plug points with DTrace
• This Masters Project completed using almost entirely Open Source tools.
– Note, OpenSolaris, DTrace released under the OpenSolaris Binary
License & CDDL (OSI approved) license[34,35].
– Java 1.6 is not Open Source. JDK 1.7 will be Open Source[36].
5
What information is available?
• Available Information
– System Calls[41]
– File Descriptor, I/O SysCall patterns[42]
– CPU utilization
• Traditional (User, Kernel, Idle, I/O Wait)[30]
• Micro-State Accounting information[30]
• Other information is available, limiting scope.
• Must be gathered with minimal overhead.
6
Gathering Information
• Each type of data has a tool of choice
– System Calls -> DTrace/DTruss[7,27,37,38]
– Traditional CPU Utilization-> vmstat,
prstat[40,43,44]
– Micro-State Accounting -> prstat[36,40]
• This project focuses on the use of System Call
sequences (broken down per thread).
7
Practical Uses
• The techniques developed here could have a
practical use in industry.
• For example, a System Administrator or
Performance Engineer managing/monitoring a
complex J2EE installation.
– Such as BEA Weblogic, IBM Websphere, or Redhat
JBoss[45,46,47]
– Similar, multithreaded-middleware environment
8
Our Approach
• Original Goal: build an OS Scheduler with an ability to distinguish between
a thread whose behavior is desirable and one that is undesirable.
• Chose to focus on a prerequisite.
– What data do we gather?
– Techniques for gathering and processing that data.
• Focused on the classification of threads within a multithreaded process by
data that can be gathered
– on a per-thread basis at run-time.
– efficiently
• First step towards building this enhanced OS Scheduler.
9
Previous Work
•
Work in the areas of OS Security research and IDS systems
has used system call analysis heavily[19,20,21,48,49,43,42].
– SubDomainTM:Parsimonious Server Security
– Improving Host Security with System Call Policies
– Traps and Pitfalls: Practical Problems in System Call Interposition
Based Security Tools
– A Secure Environment for Untrusted Helper Applications Confining
the Wily Hacker
– Ostia: A Delegating Architecture for Secure System Call Interposition
10
Classifying Thread Functions
• Using System Call information, I created a
method to visually classify a thread’s function.
• Experimented with different machine-learning
algorithms to try to accurately predict thread
function.
11
The Method For Classifying
Threads
• Basis for a new method that can be used to
classify threads and determine if they are
behaving correctly.
• Produces a visual finger print of a thread’s
behavior.
• Produces a representation of run-time
characteristics that would otherwise be
difficult to analyze, visualize, & bring together.
12
Subject Of The Method – Modern
Middleware
• Modern (especially Java-based)
middleware involves one or more processes
with many threads & moving pieces.
• Capturing the behavior of a single thread or
interaction between the constituent pieces
can be challenging.
• We used JBoss 3.2.8SP1 as a representative
piece of modern middleware for this project.
13
JBoss Internals
• JMX Micro-Kernel Architecture
– All the major J2EE subsystems are JMX beans.
• Jboss 3.x fully supports J2EE 1.3 Spec.
– Used for 3.2.8SP1 maturity and the Java PetStore
application version used.
– For more information, see [2].
• Note, Jboss 5.x is a complete architectural
redesign.
14
High-Level Classification Of
Threads In A JBoss J2EE Container
15
Data Gathering & Processing
• Tools used to gather data during a load test
– DTrace[7]
– DTrace Toolkit[37]
– Dtruss[38]
– Bash Shell Scripting[39]
– GNU Tools[18]
– Prstat[40]
• Tools used to process data after a load test
– GNUPlot[17]
– Bash Shell & other GNU tools[18]
– RapidMiner[16]
16
Thread groups I am studying.
• Thread Groups(collection of threads that perform
similar functions)
–
–
–
–
–
–
–
–
–
–
–
HTTP Processor
JMS Thread(3)
JMS Session Workers
Connection Consumer
JBoss MQ Cache Reference Softener
Scanner Thread
Young GC Threads
Old Gen GC Thread
JIT Compiler
HSQLDB Timer
TimeOut Factory Thread
• Why were the other threads left out?
– Couldn’t capture thread type via a Java Thread
Dump.
– Insufficient number of System Calls made by
thread during load test.
17
Result 1 – SysCall Graphs
• Hypothesis:
– We can use OS data (such as system call usage) to
build a graphical representation (histogram) that
uniquely identifies each type of thread (Thread
Group).
• Result:
– For many thread types, yes. Classifying system call
sequences using Thread Dumps shows that there
is an identifiable pattern of System Calls in many
thread types.
18
Data Processing—SysCall Graphs
•
•
•
•
•
•
Split into individual threads.
Replace system call names with #'s
Produce frequency counts
Build GNUPlot files
Generate PNG images
Generate HTML page
19
How To Map Threads & Graphs
• The Sun JVM has the ability to pause all
threads to print for each
– full call stack
– thread description
– native lwpid.
• Several thread dumps were captured during
load tests.
• Matched LWPIDs to NIDs(Native IDs) in Thread
Dump.
20
Graph Format
• 3-Dimensional
– X – Time*
– Y – System Call Type
– Z – Frequency
• *Relative time-frames are not represented.
21
Cache Reference Softener/Connection Consumer/Young GC Thread
HSQLDB Timer/HTTP Processor/TimeOutFactory
22
JIT Compiler/JMS Thread(3)/Session Worker
Old GC Thread/Scanner Thread
23
Results
• Using these results we are able to categorize many of
the threads with the SysCall Graphs.
• From there, we were able to compare SysCall Graphs
within a single run and between different runs.
• Visually-recognizable pattern for each of the Thread
Types that we are looking at.
– This pattern holds for threads of the same type in each
run.
– This pattern holds for threads of the same type in different
runs.
24
Comparisons between Runs:
Connection Consumers/
JMS Session Workers/
HTTP Processor
25
Statistical Analysis of Data
• Tried Nearest Neighbor on the actual sequence using
Euclidean & Nominal measures—Unsuccessful.
– Different length sequences
• Experimented with Hyper Planes—Unsuccessful.
• Experimented with 1st Order Markov Chains—Unsuccessful.
• Tried NN on SysCall counts of a thread using Euclidean
Measure.
– Greatest success
– Not perfect
26
Result 2 – Nearest Neighbor
• Hypothesis 2:
– We can apply machine learning techniques to
predict the different thread types using the data
we have gathered.
• Result:
– Using Nearest Neighbor on the system call counts
we can partially do this.
27
Data Processing(Result 2)
• RapidMiner Data Files[16]
– Define an ARFF model definition file
– Define an AML test data definition file.
– Put test data into a space-delimited file.
– Define Nearest Neighbor XML file
• Produces a RapidMiner model file.
– Define ModelLoader XML file.
• Loads a model file and test data.
• Forms predictions regarding test data.
– Produces a data file that lists predictions and confidence
values for each row in data file.
28
Results
Thread Type
# of
threa
ds
Run1
Correct
Run3
Correct
Run2 Correct
Run4
Correct
Run5
Correct
Cache Reference
Softener
1
1
1
1
1
1
Connection Consumer
6
0
0
0
0
0
HSQLDB Timer
1
0
1
1
1
1
HTTP Processor
9
8
8
8
8
8
JIT Compiler
2
2
2
2
2
2
JMS Thread(3)
10
10
10
10
10
10
Old GC Thread
1
0
1
1
1
1
Scanner Thread
1
0
1
1
1
1
Session Worker
15
6
15
15
15
15
Timeout Facctory
1
0
0
0
0
1
Young GC Thread
2
2
2
2
2
2
29
What It Did NOT Accurately Predict
• No Connection Consumer threads accurately
predicted with Nearest Neighbor.
– System Call counts very similar to other threads.
• One HTTP Processor thread mispredicted.
– This thread handled very little traffic. As a result
its system call counts were significantly different.
• Shows shortcomings of Nearest Neighbor (Euclidean
Distance Measure) algorithm for our purposes.
30
Future Directions
• Rth-Level Markov Chain modeling of system
call sequences to accurately predict Thread
Functions[48,54].
• Using Micro-State Accounting data to
fingerprint/predict thread types[36].
31
Questions?
• Thank you.
32
Reference
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
JavaTM 2 Platform Standard Edition 5.0 API Specification. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007
<http://java.sun.com/j2se/1.5.0/docs/api/>
Research Project: An Analysis of JBoss Architecture. Liu, Jenny. 29 Apr. 2002. School of Information Technologies, University of Sydney. 13 Jan. 2007
<http://www.huihoo.org/jboss/jboss.html.>
JDKTM 5.0 Documentation. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/>
Java™ Platform, Standard Edition 6 API Specification. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/api/>
HotSpot Runtime Overview. OpenJDK Project. 15 Apr 2007. <https://openjdk.dev.java.net/hotspot/docs/RuntimeOverview.html>
JDKTM 6 Documentation. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/>
OpenSolaris Community: Dtrace. OpenSolaris. Sun Microsystems Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/>
The Java Language Specification, Third Edition. 1 Jan 2005. Gosling, James. Joy, Bill. Steele, Guy. 13 Jan. 2007
http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html
The JavaTM Virtual Machine Specification Second Edition. 1999. Lindholm, Tim. Yellin, Frank. 13 Jan. 2007
<http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html>
Workstation 5 User’s Manual. 16 Sept. 2005. Vmware, Inc. 13 Jan. 2007. < http://www.vmware.com/pdf/ws5_manual.pdf >
Fedora Project – Fedora Core 6. 22 Oct 2006. RedHat, Inc. 13 Jan. 2007. <http://www.fedoraproject.org/>
KU System Programming. The University of Kansas. 13 Jan. 2007 <http://wiki.ittc.ku.edu/kusp_wiki/index.php/Main_Page>
The Linux Kernel Archive. 12 Jan 2007. Linux Kernel Organization, Inc. 13 Jan 2007 <http://www.kernel.org/>
Strace Project. 13 Jan 2007. Strace Project <http://sourceforge.net/projects/strace/>
JBoss Admin Development Guide. 2004. JBoss, Inc. 13 Jan 2007. <http://docs.jboss.org/jbossas/admindevel326/html/>
33
Reference
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
Mierswa, I. and Wurst, M. and Klinkenberg, R. and Scholz, M. and Euler, T., Yale (now: RapidMiner): Rapid Prototyping for Complex Data Mining
Tasks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2006), 2006.
gnuplot homepage. 15 Apr 2007. Williams, Thomas. Kelley, Colin.<http://www.gnuplot.info>
The GNU Operating system - the GNU project - Free Software Foundation - Free as in Freedom - GNU/Linux. 15 Apr 2007. Free Software
Foundation. <http://www.gnu.org>
Design and Performance of Configurable Endsystem Scheduling Mechnaisms
The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial Interference.
Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks.
SubDomainTM:Parsimonious Server Security
Improving Host Security with System Call Policies
Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools
A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker
Ostia: A Delegating Architecture for Secure System Call Interposition
Solaris Dynamic Tracing Guide. 5 Sep. 2005. Sun Microsystems, Inc. 1 Apr 2007 http://docs.sun.com/app/docs/doc/817-6223 OpenSolaris v2.11
Home at OpenSolaris.org. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/>
MySQL 5.0 Reference Manual. MySQL AB. 1 Apr 2007. <http://dev.mysql.com/doc/refman/5.0/en/manual-info.html>
Solaris Internals CPU/Processor. 15 July 2007. Solaris Internals. 1 Nov. 2007. <http://www.solarisinternals.com/wiki/index.php/CPU/Processor>
34
Reference
31.
32.
33.
34.
35.
36
37
38
39
40
41.
42.
43.
44.
45.
JMeter: Users Manual. 1 Jun. 2006. Apache Jakarta Project. 15 Apr 2007 < http://jakarta.apache.org/jmeter/usermanual/intro.html />
Java Pet Store Demo 1.3.2. 4 Aug. 2003. Sun Microsystems, Inc. 13 Jan 2007 <http://java.sun.com/blueprints/code/jps132/docs/index.html>
Java Petstore Tutorial. MobileFish. 13 Jan 2007
<http://www.mobilefish.com/tutorials/petstore_1_3_2/petstore_1_3_2_quickguide_jbossmysql.html>
OpenSolaris Binary License. 4 Nov. 2005. Sun MicroSystems. 1 Apr. 2007. <http://opensolaris.org/os/licensing/opensolaris_binary_license/>
COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL). 24 Jan 2004. Sun Microsystems, Inc. 15 Apr. 2007
<http://www.sun.com/cddl/cddl.html>
The GNU General Public License, Version 2. 1 Jun. 1991. Free Software Foundation. 1 Nov 2007
<http://www.fsf.org/licensing/licenses/info/GPLv2.html>
OpenSolaris Community: Dtrace. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace>
DTraceToolkit at OpenSolaris.org. 1 Jun 2005. Sun Microsystems, Inc. 1 Apr 2007
<http://www.opensolaris.org/os/community/dtrace/dtracetoolkit>
Bash Reference Manual. 15 Jul. 2002. Free Software Foundation. 1 Apr 2007 <http://www.gnu.org/software/bash/manual/bashref.html>
prstat(1M). 4 Jan. 2001. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-0211/6m6nc673u?a=view>
man pages section 2: System Calls. 4 Oct 2005. Sun Microsystems, Inc. 1 Apr 2007. <http://docs.sun.com/app/docs/doc/816-5167?l=en>
S. Zanero, Unsupervised Learning Algorithms for Intrusion Detection, Ph.D. Thesis, DEI Politecnico di Milano, 2006
The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial interference
vmstat(1M). 20 Dec. 2004. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-5166/6mbb1kqjv?a=view>
BEA Weblogic Server 10.0. 13 Dec. 2006. BEA Systems, Inc. 1 Nov 2007 <http://edocs.bea.com/wls/docs100/index.html>
35
Reference
46.
47.
48.
49.
WebSphere Application Server documentation. 29 May 2006. IBM Inc. 1 Nov 2007
<http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/welcome
_base.html>
JBoss.org: Community Documentation. 2004. Redhat, Inc. 13 Jan 2007 <http://labs.jboss.com/projects/docs/>
Markov Chain paper
Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks
36