Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Eureka: A Framework for Enabling Static Malware Analysis the 13th European Symposium on Research in Computer Security (ESORICS) conference 2008 WANG Zhi Outline 1 Overview of Generic Unpacker 2 System Call Level Heuristic 3 Statistics-Based Unpacking 4 Evaluation Metrics Overview of Unpacker Static analyses: decompile and analyze the logical structure, flow, and data stored within the binary itself. Dynamic analyses: monitor the behavior of the malware binary at runtime. Fine-grained monitor (Instruction-level) Coarse-grained monitor (page-level) Generic Automatic Unpackers PolyUnpack Renovo Instruction-level Instruction-level Model-base trigger slow OmniUnpack Page-level Eureka System call level Heuristic trigger Heuristic trigger Heuristic and Statistical trigger slow fast fast The variability in unpacking strategies come from the granularity of tracking unpacking behavior. Eureka Eureka Coarse-grained execution tracing Statistical bigram analysis bigram. NtTerminateProcess NtCreateProcess Coarse-grained Execution Tracing Eureka uses the event of program exit as a trigger. NtTerminateProcess implies that the unpacked malicious payload has been successfully decrypted. A large fraction of current malware use a new process (NtCreateProcess) to execute the unpacked malicious payload. Problems Not all malware exit and keep an executing version resident in memory Packers can make spurious event of creating new process. Malware authors can simply avoid exiting the malware process. The above two simple heuristics may work for a large fraction of malware today( as much as 80%), it may not be the same for future malware. Evaluation Statistical bigram analysis Mining statistical patterns in x86 code Use simple n-gram analysis Use the IDA Pro to extract regions from executable that were marked as functions. Looking for the most common bigrams ( opcode pairs or 2-byte opcodes) and space bigrams( byte pairs separated by 1 or more bytes) Found FF 15(call) , FF 75(push), E8---00 and E8---FF are prevalent in x86 code. Occurrence summary of bigrams calc explorer notepad ping shutdown FF 15(call) 246 3045 415 58 132 FF 75(push) 235 2494 245 41 85 E8---FF(call) 1583 2201 180 87 49 E8---00(call) 746 1091 108 57 66 Bigram Counts Bigram counts during execution of goat file packed with Aspack Bigram Counts Bigram counts during execution of goat file packed with Molbox Bigram Counts Bigram counts during execution of goat file packed with Armadillo Bigram Counts There are consistent and significant shifts in the bigram counts. The simple bigram counting approach had over a 95% success rate in distinguishing between packed and unpacked malware instance. Evaluation Metrics Code-to-data ratio An observable difference between packed code and unpacked code is the amount of identifiable code and data found in the binary Use IDA Pro to identify valid code sequences. In IDA Pro, data are represented by db, dw or dd. In packed executables, the ratio is below 3%. In unpacked executables, the ratio is above 50%. Code-to-data ratio Packed Unpacked Code-to-data ratio Grey area stand for data Blue area stand for code Original notepad.exe memory space Packed notepad.exe memory space