Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Masters Project Defense Investigating Techniques For Identifying Thread Behavior and Evaluating Alternative Automatic Classification Methods in a Realistic Middleware System Robert C Broeckelmann Jr. 29 Nov 2007 Where We Started… • Began meeting with Dr. Gill in September, 2006. • Officially, started working in January, 2007. • Explored how an OS Scheduler could be extended to determine, before a scheduling decision, if – a thread is displaying an undesirable behavior – operating outside of a predefined range • Can information available to an IDS system be fed to an OS scheduler?[22,23,24,25,26] 2 Where We Went… • What other information is available to make such decisions? • How do we gather & process this information? • How Do You classify the High-Level Function of Threads based upon this data? • Practical use in Industry. 3 Original Test Environment • Spent Spring Semester building a test environment. – Several dead ends. • Original Test Environment consisted of: – VMWare Workstation 5.x[10] – Fedora Core 6[11] – Custom build of KURT Linux 2.6.18,.19,.20/STREAMs[12] – Custom Linux Kernel build 2.6.18[13] – Java 1.5[1,3,5,8,9] – Strace[14] • KURT Linux incapable of capturing System Calls per lwp out-of-the-box. • Explored using strace on Linux—problems with high-thread counts. 4 Final Test Environment • Final Test environment – 2 Dell PCs(2 CPU, 2GB memory) – OpenSolaris 2.11[28] – Java 1.6[4,5,6,8,9] – JBoss 3.2.8b[15] – MySQL v5.0 Community Server[29] – JMeter v2.2[31] – Java PetStore eCommerce application (J2EE Spec v1.3).[32] – PetStore configuration adaption for JBoss [33] • Had to move to Java 1.6.0 that ships with OpenSolaris 2.11 in order to utilize plug points with DTrace • This Masters Project completed using almost entirely Open Source tools. – Note, OpenSolaris, DTrace released under the OpenSolaris Binary License & CDDL (OSI approved) license[34,35]. – Java 1.6 is not Open Source. JDK 1.7 will be Open Source[36]. 5 What information is available? • Available Information – System Calls[41] – File Descriptor, I/O SysCall patterns[42] – CPU utilization • Traditional (User, Kernel, Idle, I/O Wait)[30] • Micro-State Accounting information[30] • Other information is available, limiting scope. • Must be gathered with minimal overhead. 6 Gathering Information • Each type of data has a tool of choice – System Calls -> DTrace/DTruss[7,27,37,38] – Traditional CPU Utilization-> vmstat, prstat[40,43,44] – Micro-State Accounting -> prstat[36,40] • This project focuses on the use of System Call sequences (broken down per thread). 7 Practical Uses • The techniques developed here could have a practical use in industry. • For example, a System Administrator or Performance Engineer managing/monitoring a complex J2EE installation. – Such as BEA Weblogic, IBM Websphere, or Redhat JBoss[45,46,47] – Similar, multithreaded-middleware environment 8 Our Approach • Original Goal: build an OS Scheduler with an ability to distinguish between a thread whose behavior is desirable and one that is undesirable. • Chose to focus on a prerequisite. – What data do we gather? – Techniques for gathering and processing that data. • Focused on the classification of threads within a multithreaded process by data that can be gathered – on a per-thread basis at run-time. – efficiently • First step towards building this enhanced OS Scheduler. 9 Previous Work • Work in the areas of OS Security research and IDS systems has used system call analysis heavily[19,20,21,48,49,43,42]. – SubDomainTM:Parsimonious Server Security – Improving Host Security with System Call Policies – Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools – A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker – Ostia: A Delegating Architecture for Secure System Call Interposition 10 Classifying Thread Functions • Using System Call information, I created a method to visually classify a thread’s function. • Experimented with different machine-learning algorithms to try to accurately predict thread function. 11 The Method For Classifying Threads • Basis for a new method that can be used to classify threads and determine if they are behaving correctly. • Produces a visual finger print of a thread’s behavior. • Produces a representation of run-time characteristics that would otherwise be difficult to analyze, visualize, & bring together. 12 Subject Of The Method – Modern Middleware • Modern (especially Java-based) middleware involves one or more processes with many threads & moving pieces. • Capturing the behavior of a single thread or interaction between the constituent pieces can be challenging. • We used JBoss 3.2.8SP1 as a representative piece of modern middleware for this project. 13 JBoss Internals • JMX Micro-Kernel Architecture – All the major J2EE subsystems are JMX beans. • Jboss 3.x fully supports J2EE 1.3 Spec. – Used for 3.2.8SP1 maturity and the Java PetStore application version used. – For more information, see [2]. • Note, Jboss 5.x is a complete architectural redesign. 14 High-Level Classification Of Threads In A JBoss J2EE Container 15 Data Gathering & Processing • Tools used to gather data during a load test – DTrace[7] – DTrace Toolkit[37] – Dtruss[38] – Bash Shell Scripting[39] – GNU Tools[18] – Prstat[40] • Tools used to process data after a load test – GNUPlot[17] – Bash Shell & other GNU tools[18] – RapidMiner[16] 16 Thread groups I am studying. • Thread Groups(collection of threads that perform similar functions) – – – – – – – – – – – HTTP Processor JMS Thread(3) JMS Session Workers Connection Consumer JBoss MQ Cache Reference Softener Scanner Thread Young GC Threads Old Gen GC Thread JIT Compiler HSQLDB Timer TimeOut Factory Thread • Why were the other threads left out? – Couldn’t capture thread type via a Java Thread Dump. – Insufficient number of System Calls made by thread during load test. 17 Result 1 – SysCall Graphs • Hypothesis: – We can use OS data (such as system call usage) to build a graphical representation (histogram) that uniquely identifies each type of thread (Thread Group). • Result: – For many thread types, yes. Classifying system call sequences using Thread Dumps shows that there is an identifiable pattern of System Calls in many thread types. 18 Data Processing—SysCall Graphs • • • • • • Split into individual threads. Replace system call names with #'s Produce frequency counts Build GNUPlot files Generate PNG images Generate HTML page 19 How To Map Threads & Graphs • The Sun JVM has the ability to pause all threads to print for each – full call stack – thread description – native lwpid. • Several thread dumps were captured during load tests. • Matched LWPIDs to NIDs(Native IDs) in Thread Dump. 20 Graph Format • 3-Dimensional – X – Time* – Y – System Call Type – Z – Frequency • *Relative time-frames are not represented. 21 Cache Reference Softener/Connection Consumer/Young GC Thread HSQLDB Timer/HTTP Processor/TimeOutFactory 22 JIT Compiler/JMS Thread(3)/Session Worker Old GC Thread/Scanner Thread 23 Results • Using these results we are able to categorize many of the threads with the SysCall Graphs. • From there, we were able to compare SysCall Graphs within a single run and between different runs. • Visually-recognizable pattern for each of the Thread Types that we are looking at. – This pattern holds for threads of the same type in each run. – This pattern holds for threads of the same type in different runs. 24 Comparisons between Runs: Connection Consumers/ JMS Session Workers/ HTTP Processor 25 Statistical Analysis of Data • Tried Nearest Neighbor on the actual sequence using Euclidean & Nominal measures—Unsuccessful. – Different length sequences • Experimented with Hyper Planes—Unsuccessful. • Experimented with 1st Order Markov Chains—Unsuccessful. • Tried NN on SysCall counts of a thread using Euclidean Measure. – Greatest success – Not perfect 26 Result 2 – Nearest Neighbor • Hypothesis 2: – We can apply machine learning techniques to predict the different thread types using the data we have gathered. • Result: – Using Nearest Neighbor on the system call counts we can partially do this. 27 Data Processing(Result 2) • RapidMiner Data Files[16] – Define an ARFF model definition file – Define an AML test data definition file. – Put test data into a space-delimited file. – Define Nearest Neighbor XML file • Produces a RapidMiner model file. – Define ModelLoader XML file. • Loads a model file and test data. • Forms predictions regarding test data. – Produces a data file that lists predictions and confidence values for each row in data file. 28 Results Thread Type # of threa ds Run1 Correct Run3 Correct Run2 Correct Run4 Correct Run5 Correct Cache Reference Softener 1 1 1 1 1 1 Connection Consumer 6 0 0 0 0 0 HSQLDB Timer 1 0 1 1 1 1 HTTP Processor 9 8 8 8 8 8 JIT Compiler 2 2 2 2 2 2 JMS Thread(3) 10 10 10 10 10 10 Old GC Thread 1 0 1 1 1 1 Scanner Thread 1 0 1 1 1 1 Session Worker 15 6 15 15 15 15 Timeout Facctory 1 0 0 0 0 1 Young GC Thread 2 2 2 2 2 2 29 What It Did NOT Accurately Predict • No Connection Consumer threads accurately predicted with Nearest Neighbor. – System Call counts very similar to other threads. • One HTTP Processor thread mispredicted. – This thread handled very little traffic. As a result its system call counts were significantly different. • Shows shortcomings of Nearest Neighbor (Euclidean Distance Measure) algorithm for our purposes. 30 Future Directions • Rth-Level Markov Chain modeling of system call sequences to accurately predict Thread Functions[48,54]. • Using Micro-State Accounting data to fingerprint/predict thread types[36]. 31 Questions? • Thank you. 32 Reference 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. JavaTM 2 Platform Standard Edition 5.0 API Specification. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/api/> Research Project: An Analysis of JBoss Architecture. Liu, Jenny. 29 Apr. 2002. School of Information Technologies, University of Sydney. 13 Jan. 2007 <http://www.huihoo.org/jboss/jboss.html.> JDKTM 5.0 Documentation. 29 Sept. 2004. Sun Microsystems Inc. 13 Jan. 2007 <http://java.sun.com/j2se/1.5.0/docs/> Java™ Platform, Standard Edition 6 API Specification. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/api/> HotSpot Runtime Overview. OpenJDK Project. 15 Apr 2007. <https://openjdk.dev.java.net/hotspot/docs/RuntimeOverview.html> JDKTM 6 Documentation. 12 Dec. 2006. Sun Microsystems Inc. 1 Apr. 2007 <http://java.sun.com/javase/6/docs/> OpenSolaris Community: Dtrace. OpenSolaris. Sun Microsystems Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/> The Java Language Specification, Third Edition. 1 Jan 2005. Gosling, James. Joy, Bill. Steele, Guy. 13 Jan. 2007 http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html The JavaTM Virtual Machine Specification Second Edition. 1999. Lindholm, Tim. Yellin, Frank. 13 Jan. 2007 <http://java.sun.com/docs/books/jvms/second_edition/html/VMSpecTOC.doc.html> Workstation 5 User’s Manual. 16 Sept. 2005. Vmware, Inc. 13 Jan. 2007. < http://www.vmware.com/pdf/ws5_manual.pdf > Fedora Project – Fedora Core 6. 22 Oct 2006. RedHat, Inc. 13 Jan. 2007. <http://www.fedoraproject.org/> KU System Programming. The University of Kansas. 13 Jan. 2007 <http://wiki.ittc.ku.edu/kusp_wiki/index.php/Main_Page> The Linux Kernel Archive. 12 Jan 2007. Linux Kernel Organization, Inc. 13 Jan 2007 <http://www.kernel.org/> Strace Project. 13 Jan 2007. Strace Project <http://sourceforge.net/projects/strace/> JBoss Admin Development Guide. 2004. JBoss, Inc. 13 Jan 2007. <http://docs.jboss.org/jbossas/admindevel326/html/> 33 Reference 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. Mierswa, I. and Wurst, M. and Klinkenberg, R. and Scholz, M. and Euler, T., Yale (now: RapidMiner): Rapid Prototyping for Complex Data Mining Tasks. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD 2006), 2006. gnuplot homepage. 15 Apr 2007. Williams, Thomas. Kelley, Colin.<http://www.gnuplot.info> The GNU Operating system - the GNU project - Free Software Foundation - Free as in Freedom - GNU/Linux. 15 Apr 2007. Free Software Foundation. <http://www.gnu.org> Design and Performance of Configurable Endsystem Scheduling Mechnaisms The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial Interference. Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks. SubDomainTM:Parsimonious Server Security Improving Host Security with System Call Policies Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools A Secure Environment for Untrusted Helper Applications Confining the Wily Hacker Ostia: A Delegating Architecture for Secure System Call Interposition Solaris Dynamic Tracing Guide. 5 Sep. 2005. Sun Microsystems, Inc. 1 Apr 2007 http://docs.sun.com/app/docs/doc/817-6223 OpenSolaris v2.11 Home at OpenSolaris.org. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/> MySQL 5.0 Reference Manual. MySQL AB. 1 Apr 2007. <http://dev.mysql.com/doc/refman/5.0/en/manual-info.html> Solaris Internals CPU/Processor. 15 July 2007. Solaris Internals. 1 Nov. 2007. <http://www.solarisinternals.com/wiki/index.php/CPU/Processor> 34 Reference 31. 32. 33. 34. 35. 36 37 38 39 40 41. 42. 43. 44. 45. JMeter: Users Manual. 1 Jun. 2006. Apache Jakarta Project. 15 Apr 2007 < http://jakarta.apache.org/jmeter/usermanual/intro.html /> Java Pet Store Demo 1.3.2. 4 Aug. 2003. Sun Microsystems, Inc. 13 Jan 2007 <http://java.sun.com/blueprints/code/jps132/docs/index.html> Java Petstore Tutorial. MobileFish. 13 Jan 2007 <http://www.mobilefish.com/tutorials/petstore_1_3_2/petstore_1_3_2_quickguide_jbossmysql.html> OpenSolaris Binary License. 4 Nov. 2005. Sun MicroSystems. 1 Apr. 2007. <http://opensolaris.org/os/licensing/opensolaris_binary_license/> COMMON DEVELOPMENT AND DISTRIBUTION LICENSE (CDDL). 24 Jan 2004. Sun Microsystems, Inc. 15 Apr. 2007 <http://www.sun.com/cddl/cddl.html> The GNU General Public License, Version 2. 1 Jun. 1991. Free Software Foundation. 1 Nov 2007 <http://www.fsf.org/licensing/licenses/info/GPLv2.html> OpenSolaris Community: Dtrace. 1 Jun. 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace> DTraceToolkit at OpenSolaris.org. 1 Jun 2005. Sun Microsystems, Inc. 1 Apr 2007 <http://www.opensolaris.org/os/community/dtrace/dtracetoolkit> Bash Reference Manual. 15 Jul. 2002. Free Software Foundation. 1 Apr 2007 <http://www.gnu.org/software/bash/manual/bashref.html> prstat(1M). 4 Jan. 2001. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-0211/6m6nc673u?a=view> man pages section 2: System Calls. 4 Oct 2005. Sun Microsystems, Inc. 1 Apr 2007. <http://docs.sun.com/app/docs/doc/816-5167?l=en> S. Zanero, Unsupervised Learning Algorithms for Intrusion Detection, Ph.D. Thesis, DEI Politecnico di Milano, 2006 The Design, Modeling, and Implementation of Group Scheduling for Isolation of Computations from Adversarial interference vmstat(1M). 20 Dec. 2004. Sun Microsystems, Inc. 1 Apr 2007 <http://docs.sun.com/app/docs/doc/816-5166/6mbb1kqjv?a=view> BEA Weblogic Server 10.0. 13 Dec. 2006. BEA Systems, Inc. 1 Nov 2007 <http://edocs.bea.com/wls/docs100/index.html> 35 Reference 46. 47. 48. 49. WebSphere Application Server documentation. 29 May 2006. IBM Inc. 1 Nov 2007 <http://publib.boulder.ibm.com/infocenter/wasinfo/v6r1/index.jsp?topic=/com.ibm.websphere.base.doc/info/welcome _base.html> JBoss.org: Community Documentation. 2004. Redhat, Inc. 13 Jan 2007 <http://labs.jboss.com/projects/docs/> Markov Chain paper Group Scheduling in SELinux to Mitigate CPU-Focused Denial of Service Attacks 36