Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
C Programming and Assembly Language Janakiraman V – [email protected] NITK Surathkal 2nd August 2014 Motivation Do you know how all this is implemented in assembly? Agenda •Brief introduction to the 8086 processor architecture •Describe commonly used assembly instructions •Use of stack and related instructions •Translate high level function calls into low level assembly language •Familiarize the calling conventions •Explain how variables are passed and accessed 8086 Architecture •ALU – Arithmetic and Logical unit – The heart of the processor •Control Unit – Decodes instructions, Controls the execution flow •Registers – Implicit memory locations within the processor •Registers – Serve as arguments to most operations •Flags – All ALU operations will set particular bits after execution Registers •EAX – Stores integer return values •ECX – Stores the counters for loops and also stores “THIS” pointer •EIP –Instruction pointer. Stores the address of the next instruction to be executed •ESP – The Stack pointer. Implicitly changed during Call/ Ret instructions. •EBP – Base pointer. Used to access local variables and function parameters. Registers Contd… •EBX – A general purpose register •ESI– The source index register for string instructions •EDI - The destination index registers for string instructions •EFL – Flag register. Stores the flag bits of various flags like Carry, Zero, etc. •Segment registers point to a segment of memory. EDS, ESS, EES, ECS •EDX – Stores high 32 bits of 64 bit values Instruction Set •Data transfer •Arithmetic and logical •Stack Operations •Branching and Looping •Function calls •String Instructions •Prefix to instructions Data transfer instructions MOV Destination, Source - Format » Data transfer is always from RIGHT to LEFT. » Source Register is unaffected. LEA – Load effective address. » Loads the offset Address of the specified variable into the destination. » Equivalent of int y = &x; Arithmetic and Logical instructions •Operation destination, source – Format »ADD AX, BX »SUB AX, [BX] »OR AX, [BX+4] »XOR AX, AX – Fastest way to clear registers Exercise 1 int x=4, y=6, a=3, b=2; __asm Write an assembly program {to evaluate the following expression. (All variables areMOV 32 bit integers) EAX, x » EAX = x*y + a – b » EBX =( x^y) | ( a&b) MUL y ADD EAX, a SUB EAX, b MOV EBX, x XOR EBX, y MOV ECX, a AND ECX, b OR EBX, ECX } Branching and Looping •JMP Addr – Loads EIP with Addr •Conditional Jumps » Transfers control based on a condition » Based on state of one or more flags » ALU operation sets flags Exercise 2 Multiplication String length of byarepeated constantaddition. string Write an assembly program to evaluate the int x =9, y=10, char* z=0; pChar = “Test data"; expression “ z = x * y ”using » Repeated addition » MUL instruction __asm MOV EDI, pChar { XOR ECX, ECX COMPARE: XOR CMP EAX, [EDI], EAX 0 Write an assembly program to calculate MOV JNZ EBX, INCREASE y the string length of a constant string MULT: ADD JMP EAX, DONE x INCREASE: DEC INC EBX ECX JNZ INC MULT EDI MOV JMP z, COMPARE EAX } DONE: MOV len, ECX Stack Operations PUSH: PUSH EAX » ESP decreases by 4/ 2/ 1 » Data is moved on to top of stack » Used extensively to pass parameters to functions. POP: POP EAX » ESP increases 4/ 2/ 1 » Data is copied to the destination » Compliment of PUSH Exercise 3 Function to swap variables Swapto two integers. Write an assembly program swap two integers void swap(int* pX, int* pY) x and y. int x=4, y=5; { __asm __asm Write a C program to swap two numbers using a {{ function Swap(int* pX, int* pY). Implement the EAX, pX PUSH MOV x Swap function directly in assembly language MOV EBX, pY PUSH y PUSH DWORD PTR [EAX] POP x PUSH DWORD PTR [EBX] POP y POP DWROD PTR [EAX] } } } POP DWORD PTR [EBX] Function calls CALL – CALL ADDR » Used for function calls. » Implicitly pushes the EIP on to the stack. » Reads the address specified (ADDR) and loads EIP with ADDR. RET – RET n » Used to return to the calling function. » Implicitly pops the DWORD on the TOS into EIP. » ‘n’ Specifies the number to be added to ESP after returning. Used for stack clean up. Compile the C program!! int g_iVar = 5; int Fn(int x, int y) void main() { { int z=0; int z=0; z = x+ y z = Fn(2,4); return z; g_iVar = z; } } C and assembly language - FAQ •How are function calls in ‘C’ translated into assembly? •How are parameters passed to the function? •What does it mean to say local variables are stored on stack? Scope of local variables! •How are global variables accessed? C and Assembly language Contd…. •Cannot pass many parameters in registers •Scope – Desirable feature •Stack – Ideal to store local variables •ESP cannot be used to access the local variables •EBP is used to access them!!! Parameters, Local and Global variables •Before a function is called parameters are pushed onto stack •Parameters are accessed by [EBP +n] •Local variables are accessed by [EBP –n] •Integers are returned in EAX •Global variables are accessed by direct address values Compile the C program Contd… void main() int Fn(int x, int y) { { int z=0; int z=0; MOV z, 0 MOV z, 0 z = Fn(2,4); z = x+ y; PUSH 0x00000004 MOV EAX, x PUSH 0x00000002 ADD EAX, y CALL Fn MOV z, EAX MOV z, EAX return z; g_iVal = z; MOV [g_iVar], EAX } RET } Compile the C Program CODE SEGMENT – Function – main() Contd…. STACK SEGMENT . int z = 0; C100 MOV [EBP-4], 0 ESP z = Fn(2,4); ESP C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C104 MOV [EBP-4], EAX C200 g_iVar = z; C105 . . MOV [g_iVar], EAX ESP C104 0x00000002 0x00000004 ESP EBP 0x00000000 local var Z Compile the C Program CODE SEGMENT – Function – Fn() C200 MOV EBP, ESP C201 SUB ESP, 0x40 Contd…. STACK SEGMENT ESP Local variable space int z=0; C202 MOV [EBP-4], 0 Z z = x+ y C203 MOV EAX, [EBP+4] C204 ADD EAX, [EBP+8] C205 MOV [EBP-4], EAX ESP EBP return z; C206 C206 0x00000000 0x00000006 C104 0x00000002 0x00000004 ADD ESP, 0x40 RET EBP 0x00000000 local var Z CODE SEGMENT – Function – main() STACK SEGMENT . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); Stack corruption!!!!! You computer will now EBP C102You PUSH 0x00000002 have REBOOT!!!!! accessed the stack of C103 Call C200 the function “Fn()” ESP C101 C104 PUSH 0x00000004 MOV [EBP-4], EAX g_iVar = z; C105 MOV [g_iVar], EAX C106 RET 0x00000006 C104 0x00000002 0x00000004 0x00000000 Local var Z Compile the C Program CODE SEGMENT – Function – main() Contd…. STACK SEGMENT . int z = 0; C100 MOV [EBP-4], 0 ESP z = Fn(2,4); ESP C101 PUSH 0x00000004 C102 PUSH 0x00000002 C103 Call C104 MOV [EBP-4], EAX C200 g_iVar = z; C105 . . MOV [g_iVar], EAX ESP C104 0x00000002 0x00000004 ESP EBP 0x00000000 local var Z Compile the C Program CODE SEGMENT – Function – Fn() C200 PUSH EBP C202 MOV EBP, ESP C203 SUB ESP, 0x40 int z=0; C204 MOV [EBP-4], 0 Contd…. STACK SEGMENT ESP Z ESP z = x+ y C205 MOV EAX, [EBP+8] C206 ADD EAX, [EBP+12] C207 MOV [EBP-4], EAX ESP ESP EBP ADD ESP, 0x40 C209 POP EBP C20A RET 8 0x00000000 0x00000006 EBP - main() C104 0x00000002 return z; C208 Local variable space 0x00000004 EBP 0x00000000 local var Z CODE SEGMENT – Function – main() STACK SEGMENT . int z = 0; C100 MOV [EBP-4], 0 z = Fn(2,4); C101 PUSH 0x00000004 C102 PUSH 0x00000002 ESP C103 Call ESP C104 MOV [EBP-4], EAX C200 0x0000006 MOV [g_iVar], EAX C106 Epilogue 0x00000002 0x00000004 g_iVar = z; C105 C104 ESP EBP 0x00000006 0x00000000 Local var Z Function calls in C - Summary Function call gets translated to CALL addr Prologue » Store the current EBP on stack » Set up the stack - Initialize the EBP » Allocate space for local variables. Execute the function accordingly Epilogue » Set the ESP to its original value » Set the EBP back to its original value Stack clean up •When? » Happens after returning from a function •Why? » Undo the effect of pushing parameters •How? » RET N or ADD ESP, N C Program void main() Assembly Prologue MOV [EBP-4], 0 { int z = 0; z = Function(2, 4); } PUSH 0x00000004 PUSH 0x00000002 CALL Function MOV [EBP-4], EAX Epilogue Contd…… /*Contd……*/ Contd… C Program int Function(int a, int b) Assembly Contd… PUSH EBP MOV EBP, ESP --------- Prologue { SUB ESP, N int c=0; c = a + b; MOV [EBP-4], 0 MOV EAX, [EBP + 8] --- Body ADD EAX, [EBP+12] return c; } MOV [EBP-4], EAX ADD ESP, N POP EBP ----------------- Epilogue RET 8 Calling conventions __cdecl » Default calling convention of C functions » Needed for variable argument list » Caller cleans the stack - ADD ESP, N instruction __stdcall » Faster than the __cdecl call. » Callee cleans the stack - RET N instruction Contd…… Back to Exercise 3 Function to swap variables Write a C program to swap two numbers using a swap(int* pX, void swap(int* pX, int* pY) void swap(int* pX, int* int* pY) pY) function Swap(int* pX, int*void pY). Implement the { { { Swap function directly in assembly language Function to swap variables __asm __asm __asm { { { MOV DWORD PTR EAX, [EBP+4] PUSH DWORD PTR [pX] PUSH DWORD PTR [[EBP+4]] PUSH DWORD PTR [[EBP+8]] POP DWROD PTR [[EBP+4]] POP DWORD PTR [[EBP+8]] } } } MOV DWORD PTR EBX, [EBP+8] PUSH DWORD PTR [pY] PUSH DWORD PTR [EAX] POP DWROD PTR [pX] PUSH DWORD PTR [EBX] POP DWORD PTR [pY] POP DWROD PTR [EAX] POP DWORD PTR [EBX] } } } Double indirection is not a valid instruction What about C++? struct stTest class clsTest { { int x; int x; int y; int y; public: }; void FnTest() void FnTest(stTest* pSt) { x = 0; y=1; { pSt->x = 0; pSt->y = 1; } } }; void main() void main() { { clsTest obj; obj.FnTest(); stTest obj; FnTest(&obj); } } Calling convention Contd… this call – The C++ calling convention » Behaves like the __cdecl call in most ways » This pointer is passed in the ECX register » Stores the this pointer in [EBP-4] location on stack String Instructions •Uses ESI, EDI as its operands. •After the operation ESI and EDI are automatically Incremented/ Decremented depending on the direction flag. •Usually used with the Prefix instructions. •Very efficient for standard looping instructions. Prefix to instructions REP – REP MOVSB » Used to repeat instructions unconditionally » Implicitly decrements ECX by 1 after each execution » Stops once ECX = 0 REPNE/ REPE – REPE SCASB » Used to repeat instructions conditionally » Implicitly decrements ECX by 1 after each execution » Stops once ECX = 0 or ZERO flag is set/ reset Optimized C functions •Memcpy •Strlen •Memset