Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
SRE Basics
SRE Basics
1
In this Section…
We
briefly cover following topics
o Assembly code
o Virtual machine/Java bytecode
o Windows PE file format
SRE Basics
2
Assembly Code
SRE Basics
3
High Level Languages
First, high level languages…
Ancient high level languages
o Basic --- little structure
o FORTRAN --- limited structure
o C --- “structured” language
C was designed to deal with complexity
o OO languages take this one step further
Above languages considered primitive
today
SRE Basics
4
High Level Languages
Object
oriented (OO) languages
o “Object” groups code and data together
o Consider best way to handle complexity
(at least for now…)
Important
OO ideas include
o Encapsulation, inheritance, polymorphism
SRE Basics
5
High Level Languages
Program
must deal with code and data
Data
o Variables, data structures, files, etc.
Code
o Reverser must study control flow
o Conditionals, switches, loops, etc.
SRE Basics
6
High Level Languages
High level languages --- different users
want different things
o Goes back (at least) to C vs FORTRAN
Today, major tradeoff is between
simplicity and flexibility
o Simplicity --- easy to write short program to do
exactly what you want (e.g., C)
o Flexibility --- language has it all (e.g., Java)
SRE Basics
7
High Level Languages
Some languages compiled into native code
o exe is specific to the hardware
o C, C++, FORTRAN, etc.
Other languages “compiled” into “code”,
which is interpreted by a virtual machine
o Java, C#
o Often possible to make compiled version
For reverser, this distinction is far more
important than OO or not
SRE Basics
8
Intro to Assembly
At
the lowest level, machine binary
Assembly code lives between binary
and high level languages
When reversing native code, we must
deal with assembly code
o Why assembly code?
o Why not “reverse” binary to, say, C?
SRE Basics
9
Intro to Assembly
Reverser
would like to deal with high
level, but is stuck with low level
Ideally, want to create mental “link”
from low level to high level
o Easier for code written in C
o Harder for OO code, such as C++
o Why?
SRE Basics
10
Intro to Assembly
Perhaps biggest difference at assembly
level is dealing with data
o High level languages hide lots and lots of details
on data manipulations
o For example, loading and storing
Also, low level instructions are primitive
o Each instruction does not do very much
SRE Basics
11
Intro to Assembly
Consider
following simple C program
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}
Simple,
but far higher level than
assembly code
SRE Basics
12
Intro to Assembly
int multiply(int x, int y)
{
int z;
z = x * y;
return z;
}
In assembly code…
1.
2.
3.
4.
5.
6.
7.
Store state before entering function
Allocate memory for z
Load x and y into registers
Multiply x by y and store result in register
Copy result back to memory for z (optional)
Restore state that was stored in 1.
Return z
SRE Basics
13
Intro to Assembly
Why are things so complicated at low level?
It’s all about efficiency!
Reading memory and storing are slow
No single asm instruction to read memory,
operate on it, and store result
o But this is common in high level languages
SRE Basics
14
Intro to Assembly
Registers --- “local” processor memory
o So don’t have to read and write RAM
Stack --- “scratch paper” (in RAM)
o Holds register values, local variables, function
parameters and return values
o E.g., storage for “z” in multiply example
Heap --- dynamic, variable-sized data
Data section --- e.g., string constants
Control flow --- high level “if” or “while” are much
more complex at low level
SRE Basics
15
Registers
Registers used in most instructions
Specifics here deal with “IA-32”
o
o
o
o
Intel Architecture, 32-bit
Used in “Wintel” machines
We use IA-32 notation
AT&T notation also exists
Eight 32-bit registers (next slide)
o All 8 start with “E”
o Also several system registers
SRE Basics
16
Registers
EAX, EBX, EDX --- generic, used for int,
Boolean, …, memory operations
ECX --- generic, used as counter
ESI/EDI --- generic, source/destination
pointers when copying memory
o SI == source index, DI == destination index
EBP --- generic, stack “base” pointer
o Usually, stack position after return address
ESP --- stack pointer
o Curretn stack frame is between ESP to EBP
SRE Basics
17
Flags
EFLAGS --- special registers
o Status flags updated by various operations to
“record” outcomes
o System flags too, but we don’t care about them
Flags are basic tool for conditionals
For example, a TEST followed by a jump
instruction
o TEST sets various flags, jump determines
action to take, based on those flags
SRE Basics
18
Instruction Format
Most instructions consist of…
o Opcode --- the “instruction”
o One or two operands --- “parameter(s)”
Operand (parameters) are data
Operands come in 3 flavors
o Register name --- for example, EAX
o Immediate --- e.g., hard-coded constant
o Memory address --- enclosed in [brackets]
SRE Basics
19
Operand Examples
EAX
o Read from (or write to) EAX register,
depending on opcode
0x30004040
o Immediate --- number is embedded in code
o Usually a constant in high-level code
[0x4000349e]
o This os a memory address
o Could be a global variable in high level code
SRE Basics
20
Basic Instructions
We cover a few common instructions
o First we give general format
o Later, we give a few simple examples
There are lots of assembly instructions
But, most assembly code uses only a few
o About 14 assembly instructions account for more
than 90% of all code
SRE Basics
21
Opcode Counts
Typical
opcode counts, “normal” code
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
22
Opcode Counts
Opcode
counts, typical virus code
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
23
Instructions
We
consider following operations
o Moving data
o Arithmetic
o Comparisons
o Conditional branches
o Function calls
SRE Basics
24
Moving Data
MOV
is the most popular opcode
2 operands, destination and source:
o MOV DestOperand, SourceOperand
Note
the order
o Destination first, source second
SRE Basics
25
Arithmetic
Six integer arithmetic operations
o ADD, SUB, MUL, DIV, IMUL, IDIV
Many variations based on operands
Op1, Op2
; add, store result in Op1
Op1, Op2
; sub Op2 from Op1 --> Op1
Op
; mul Op by EAX ---> EDX:EAX
Op
; div EDX:EAX by Op
quotient ---> EAX, remainder ---> EDX
o IMUL, IDIV --- like MUL and DIV, but signed
o
o
o
o
ADD
SUB
MUL
DIV
SRE Basics
26
Comparisons
CMP
opcode has 2 operands
o CMP
Operand1, Operand2
Subtracts
Operand2 from Operand1
Result “stored” in flag bits
o If 0 then ZF flag is set
o Other flags can be used to tell which is
greater, depending on signed or unsigned
SRE Basics
27
Conditional Branches
Conditional
branches use “Jcc” family
of instructions (je, jne, jz, jnz, etc.)
Format is
o Jcc
If
TargetAddress
Jcc true, goto TargetAddress
o Otherwise, what happens?
SRE Basics
28
Function Calls
Use
CALL and RET
RET
can be told to increment ESP
o CALL FunctionAddress
……
o RET ; pops return address
o Need to reset stack pointer
o Why?
SRE Basics
29
Examples
cmp
jnz
ebx,0xf020
10026509
What
does this do?
Compares value in EBX with constant
Jumps to specified address if
operands are not same
o Note: JNE and JNZ are same instruction
SRE Basics
30
Examples
mov
mov
imul
edi,[ecx+0x5b0]
ebx,[ecx+0x5b4]
edi,ebx
What does this do?
First, add 0x5b0 to ECX register, get value
at that memory and put in EDI
Next, add 0x5b4 to ECX, get value at that
memory and put in EBX
o Note that ECX points to some data structure
Finally, EDI = EDI * EBX
o Note there are different forms of IMUL
SRE Basics
31
Examples
push
push
push
push
push
call
eax
edi
ebx
esi
dword ptr [esp+0x24]
0x10026eeb
What does this do?
PUSH four register values
PUSH something related to stack ptr
o Probably, parameter or local variable
o Would need to look at more code to decide
o Note “dword ptr” is effectively a cast
CALL a function
SRE Basics
32
Examples
mov
shl
mov
cmp
call
eax, dword ptr [ebp - 0x20]
eax, 4
ecx, dword ptr [ebp - 0x24]
dword ptr [eax+ecx+4], 0
0x10026eeb
What does this do?
Maybe “data structure in an array”
Last line
o ECX --- gets base pointer
o EAX --- current offset into the array
o Add 4 to get specific member of structure
SRE Basics
33
Examples
AT&T
syntax
pushl $14
pushl $helloWorld
pushl $1
movl $4, %eax
pushl %eax
int $0x80
addl $16, %esp
pushl $0
movl $1, %eax
pushl %eax
int $0x80
SRE Basics
34
Compilation
Converts high level representation of code
to binary
Front end --- lexical analysis
o Verify syntax, etc.
Intermediate representation
Optimization
o Improve structure, eliminate redundancy, …
SRE Basics
35
Compilation
Back end --- generates the actual code
o Instruction selection
o Register allocation
o Instruction scheduling --- pipelining, parallelism
Back end process might make disassembly
hard to read
o Optimization too
Each compiler has its own quirks
o Can you automatically determine compiler?
SRE Basics
36
Virtual Machines & Bytecode
SRE Basics
37
Virtual Machines
Some
languages instead generate
intermediate bytecode
Bytecode runs in a virtual machine
o Virtual machine is a program that
(historically) interprets bytecode
o Translates bytecode for the hardware
Bytecode
SRE Basics
analogous to assembly code
38
Virtual Machines
Advantages?
o Hardware independent
Disadvantages?
o Slow
Today, usually just-in-time compilers
instead of interpreters
o Compile snippets of bytecode into native code
as needed
SRE Basics
39
Reversing Bytecode
Reversing bytecode is easy
o Unless special precautions are taken
o Even then, easier than native code
Bytecode usually contains lots of metadata
o Possible to reconstruct highly accurate high
level language
Bytecode can be obfuscated
o In worst case, reverser must learn bytecode
o But bytecode is easier than native code
SRE Basics
40
Windows PE Files
SRE Basics
41
Windows PE File Format
Designed
to be standard executable
file format for all versions of OS…
o …on all supported processors
Only
small changes since PE format
was introduced
o E.g., support for 64-bit Windows
SRE Basics
42
Windows PE Files
Trivia
o
o
o
o
Q: What’s the difference between exe and dll?
A: Not much --- one bit differs in PE files
Q: What is size of smallest possible PE file?
A: 133 bytes
o
o
o
o
Once loaded into memory, it’s a module
File is mapped to module
Address where module begins is HMODULE
PE file may not all be mapped to module
PE file on disk is a file
SRE Basics
43
Windows PE Files
WINNT.H is final word on what PE file
looks like
Tools to examine PE files
o Dumpbin (Visual Studio)
o Depends
o PE Browse Professional
In spite of its name, it’s free
o PEDUMP (by author of article)
SRE Basics
44
PE File Sections
Each section is “chunk of code or data that
logically belongs together”
o For example, all import tables in one section
Code is in .text section
Data examples
Can specify section names in C++ source
o Code is code, but many types of data
o Program data (e.g., .rdata for read-only)
o API import/export tables
o Resources, relocation info, etc.
SRE Basics
45
PE File Sections
When
mapped, module starts on a
page boundary
Linker can be told to merge sections
o
o
o
o
E.g., to merge .text and .rdata:
/MERGE:.rdata=.text
Some sections commonly merged
Some sections cannot be merged
SRE Basics
46
Relative Virtual Addresses
Exe file specifies in-memory addresses
PE file specifies preferred load location
o But DLL can actually load just about anywhere
So, PE specifies addresses in a way that is
independent of where it loads
o No hardcoded addresses in PE
o Instead, Relative Virtual Addresses (RVAs)
o RVA is an offset relative to where PE is loaded
SRE Basics
47
Relative Virtual Addresses
To find actual memory location, add RVA to
the actual load address
For example, suppose
o Exe file is loaded at 0x400000
o And RVA is 0x1000
o Then code (.text) starts at 0x401000
In Windows terminology, actual address is
known as Virtual Address (VA)
SRE Basics
48
Data Directory
There are many data structures within exe
o For efficiency, must be loaded quickly
o E.g., imports, exports, resources, base
relocations, etc.
DataDirectory
o Array of 16 data structures
o #define IMAGE_DIRECTORY_ENTRY_xxx
defines array indexes (0 to 15)
SRE Basics
49
Importing Functions
To use code or data from another DLL,
must import it
When PE file loads, Windows loader locates
imported functions/data
o Usually automatic, when program first starts
o Imported DLLs may import others
o For example, any program created with Visual
C++ imports KERNEL32.DLL…
o …and KERNEL32.DLL imports from NTDLL.DLL
SRE Basics
50
Importing Functions
Each PE has Import Address Table (IAT)
o IAT contains arrays of function pointers
o One array per imported DLL
Each imported API has spot in IAT
o
o
o
o
The only place where API address stored
So, all calls to API go thru one function ptr
E.g., CALL DWORD PTR [0x00405030]
But, by default it’s a little more complex…
SRE Basics
51
PE File Structure
Next slides describe PE file structure
Note that all of these data structures
defined in WINNT.H
Usually, 32-bit and 64-bit versions
For example,
o IMAGE_NT_HEADERS32
o IMAGE_NT_HEADERS64
o Identical except for widened fields for 64-bit
SRE Basics
52
MS-DOS Header
Every PE begins with small MS-DOS exe
o Prints message saying Windows required
MS-DOS Header
o IMAGE_DOS_HEADER
o 2 “important” values
o e_lfanew --- file offset of PE header
o e_magic --- 0x5A4D, “MZ” in ASCII… Why MZ?
SRE Basics
53
IMAGE_NT_HEADERS Header
Primary
location for PE specifics
Location in file given by e_lfanew
One version for 32-bit exes and
another for 64-bit exes
o Only minor differences between them
o Single bit specifies 32-bit or 64-bit
SRE Basics
54
IMAGE_NT_HEADERS Header
Has 3 fields
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
In valid PE, Signature is 0x00004550
o In ASCII, this is “PE00”
SRE Basics
55
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
IMAGE_FILE_HEADER predates PE
o Struct containing basic info about file
o Most important info is size of “optional data”
that follows (not really optional)
SRE Basics
56
IMAGE_NT_HEADERS Header
typedef struct _IMAGE_NT_HEADERS {
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32
IMAGE_OPTIONAL_HEADER
o DataDirectory array (at end) is “address book”
of important locations in exe
o Each entry contains RVA and size of data
SRE Basics
57
PE Sections
Recall,
section is “chunk of code or
data that logically belongs together”
For example
o All data for exe’s import tables are in
one section
SRE Basics
58
Section Table
Section
table contains array of
IMAGE_SECTION_HEADER structs
An IMAGE_SECTION_HEADER has
info about associated section
o Location, length, and characteristics
o Number of such headers given by field:
IMAGE_NT_HEADERS.FileHeader.NumberOfSections
SRE Basics
59
Alignment of Sections
Visual
Studio 6.0
o 4KB sections by default
Visual
Studio .NET
o 4KB by default, except for small files
uses 0x200-byte alignment
o Also, .NET spec requires 8KB in-memory
alignment (for IA-64 compatibility)
SRE Basics
60
PE Sections
So
far, overview of PE file format
Now, look inside important sections…
o …and some data structures within sections
Then
we finish with look at PEDUMP
o Recall there are other similar utilities
SRE Basics
61
Section Names
.text ---The default code section.
.data --- The default read/write data
section. Global variables typically go here.
.rdata --- The default read-only data
section. String literals and C++/COM
vtables are examples of items put into
.rdata.
SRE Basics
62
Section Names
.idata --- The imports table. It has become
common practice (explicitly, or via linker default
behavior) to merge .idata into another section,
typically .rdata. By default, the linker only merges
the .idata section into another section when
creating a release mode exe.
.edata --- The exports table. When creating an
executable that exports APIs or data, the linker
creates an .EXP file which contains an .edata
section that's added into the final executable. Like
the .idata section, the .edata section is often
found merged into the .text or .rdata sections.
SRE Basics
63
Section Names
.rsrc --- The resources. This section is read-only.
However, it should not be renamed and should not
be merged into other sections.
.bss --- Uninitialized data. Rarely found in exes
created with recent linkers. Instead, the
VirtualSize of the exe's .data section is expanded
to make room for uninitialized data.
.crt --- Data added for supporting the C++ runtime
(CRT). A good example is the function pointers
that are used to call the constructors and
destructors of static C++ objects.
SRE Basics
64
Section Names
.tls --- Data for supporting thread local storage variables
declared with __declspec(thread). This includes the initial
value of the data, as well as additional variables needed by
the runtime.
.reloc --- Base relocations in an exe. Base relocations are
generally only needed for DLLs and not EXEs. In release
mode, the linker doesn't emit base relocations for EXE
files. Relocations can be removed when linking with the
/FIXED switch.
.sdata --- "Short" read/write data that can be addressed
relative to the global pointer. Used for IA-64 and other
architectures that use a global pointer register. Regularsized global variables on the IA-64 will go in this section.
SRE Basics
65
Section Names
.srdata --- "Short" read-only data that can be addressed
relative to the global pointer. Used on the IA-64 and other
architectures that use a global pointer register.
.pdata --- The exception table. Contains an array of
IMAGE_RUNTIME_FUNCTION_ENTRY structs, CPU-specific.
Pointed to by IMAGE_DIRECTORY_ENTRY_EXCEPTION slot
in the DataDirectory. Used for architectures with table-based
exception handling, such as the IA-64. The only architecture
that doesn't use table-based exception handling is the x86.
.didat --- Delayload import data. Found in exes built in
nonrelease mode. In release mode, the delayload data is merged
into another section.
SRE Basics
66
Exports Section
Exe may export code or data
o Makes it available to other exes
o Refer to an exported thing as a symbol
At minimum, to export symbol, must
specify its address in defined way
o Keyword ORDINAL tells linker to use numbers,
not names, for symbols
o After all, names just a convenience for coders
SRE Basics
67
IMAGE_EXPORT_DIRECTORY
Points
to 3 arrays
o And a table of ASCII strings containing
symbol names
Only
required array is Export Address
Table (EAT)
o Array of function pointers
o Addresses of exported functions
o Export ordinal is an index into this array
SRE Basics
68
IMAGE_EXPORT_DIRECTORY
Structure
example
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
69
Example
exports table:
Name:
KERNEL32.dll
Characteristics: 00000000
TimeDateStamp:
3B7DDFD8 -> Fri Aug 17 23:24:08 2001
Version:
0.00
Ordinal base:
00000001
# of functions: 000003A0
# of Names:
000003A0
Entry Pt Ordn Name
00012ADA
1 ActivateActCtx
000082C2
2 AddAtomA
•••remainder of exports omitted
SRE Basics
70
Example
Spse, call GetProcAddress on AddAtomA API
o System locates KERNEL32’s
o
o
o
o
o
IMAGE_EXPORT_DIRECTORY
Gets start address of Export Names Table (ENT)
It finds there are 0x3A0 entries in ENT
Does binary search for AddAtomA
Suppose AddAtomA is 2nd entry…
…loader reads 2nd value from export ordinal table
SRE Basics
71
Example (Continued)
Call
GetProcAddress on AddAtomA API
o … AddAtomA has export ordinal 2
o Use this as index into EAT (taking into
account base field value)
o Finds AddAtomA has RVA of 0x82C2
o Add 0x82C2 to load address of KERNEL32
to get actual address of AddAtomA
SRE Basics
72
Export Forwarding
Can forward export to another DLL
o That is, must find it at “forward” address
Example
o KERNEL32 HeapAlloc function forwarded to
RtlAllocHeap function exported by NTDLL
o In EXPORTS section of KERNEL32, find
EXPORTS
…
HeapAlloc = NTDLL.RtlAllocHeap
SRE Basics
73
Imports Section
Importing is opposite of exporting
IMAGE_IMPORTS_DESCRIPTOR
o Points to 2 essentially identical arrays
o Import Address Table & Import Name Table
IAT and INT
o Contain ordinal, address, forwarding info
o After binding, IAT rewritten, INT retains
original (pre-binding) info
o Binding discussed next…
SRE Basics
74
Imports Section
Example
o Importing APIs from USER32.DLL
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
SRE Basics
75
Binding
Binding
means IAT overwritten with
actual addresses
o VAs overwrite RVAs
Why
do this?
o Increased efficiency
Loader
SRE Basics
checks whether binding valid
76
Delayload Data
Hybrid between implicit & explicit importing
Not an OS issue
o A linker issue, at runtime
There is IAT and INT for the DLL
o Identical to regular IAT and INT
o But read by runtime library code instead of OS
Benefit? Calls then go directly to API…
SRE Basics
77
Resources Section
For
resources such as…
o icons, bitmaps, dialogs, etc.
Most
complicated section to navigate
Organized like a file system…
SRE Basics
78
Base Relocations
Executable has many memory addresses
As mentioned, PE file specifies preferred
memory address to load the module
o ImageBase field in IMAGE_FILE_HEADER
If DLL loaded elsewhere, all addresses will
be incorrect
o Base relocations tell loader all locations that
need to be modified
o Note that this is extra work for the loader
What about EXE, which is not a DLL?
SRE Basics
79
Base Relocation Example
Consider the following line of code
00401020: 8B 0D 34 D4 40 00 mov ecx,dword ptr [0x0040D434]
Note that “8B 0D” specifies opcode
o Also note the address 0x0040D434
Suppose preferred load is at 0x00400000
If it loads at that address, it runs as-is
Suppose instead it loads at 0x00500000
Then code above needs to change to
8B 0D 34 D4 50 00 mov ecx,dword ptr [0x0050D434]
SRE Basics
80
Base Relocation Example
If not loaded at preferred address, then
loader computes delta
For example on previous slide…
o delta = 0x00500000 - 0x0040000
o So, delta is 0x00100000
Also, there would be base relocation
specifying location 0x00401020
o Loader modifies address located here by delta
SRE Basics
81
Debug Directory
Contains
debug info
Not required to run the program
o But useful for development
Can
be multiple forms of debug info
o Most common is PDB file
SRE Basics
82
.NET Header
.NET executables are PE files
However, code/data is minimal
Purpose of PE is simply to get .NET-specific
info into memory
o Metadata, intermediate language (IL)
o MSCOREE.DLL at start of a .NET process
o This dll “takes charge” and uses metadata and
IL from executable
o So PE has stub to get MSCOREE.DLL going
SRE Basics
83
TLS Initialization
Thread Local Storage (TLS)
o .tls section for thread local variables
New threads initialized using .tls data
Presence of TLS data indicated by nonzero
IMAGE_DIRECTORY_ENTRY_TLS in
DataDirectory
o Points to IMAGE_TLS_DIRECTORY struct
o Contains virtual addresses, VAs (not RVAs)
o The actual struct is in .rdata, not in .tls
SRE Basics
84
Program Exception Data
x86 architecture uses frame-based
exception handling
o A fairly complex way to handle exceptions
IA-64 and others use table-based approach
o Table containing info about every function that
might be affected by exception unwinding
o Table entry includes start and end addresses,
how and where exception to be handled
o When exception occurs, search thru table…
SRE Basics
85
PEDUMP
Tools
for analyzing PE files
o Dumpbin (Visual Studio)
o Depends
o PE Browse Professional
In spite of its name, it’s free
o PEDUMP (by author of article)
SRE Basics
86