Download ppt

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Low level Programming
Linux ABI
• System Calls
– Everything distills into a system call
• /sys, /dev, /proc  read() & write() syscalls
• What is a system call?
– Special purpose function call
• Elevates privilege
• Executes function in kernel
– But what is a function call?
What is a function call?
• Special form of jmp
– Execute a block of code at a given address
– Special instruction: call <fn-address>
– Why not just use jmp?
• What do function calls need?
– int foo(int arg1, char * arg2);
• Location: foo()
• Arguments: arg1, arg2, …
• Return code: int
– Must be implemented at hardware level
Hardware implementation
int foo(int arg1, char * arg2) { return 0; }
0000000000000107 <foo>:
107: 55
push %rbp
108: 48 89 e5
mov %rsp,%rbp
10b: 89 7d fc
mov %edi,-0x4(%rbp)
10e: 48 89 75 f0
mov %rsi,-0x10(%rbp)
112: b8 00 00 00 00
mov $0x0,%eax
117: c9
leaveq
118: c3
retq
• Location
• Address of function + ret instruction
• Arguments
• Passed in registers (which ones? And why those?)
• Return code
• Stored in register: EAX
• To understand this we need to know about assembly programming…
Assembly basics
• What makes up assembly code?
– Instructions
• Architecture specific
– Operands
• Registers
• Memory (specified as an address)
• Immediates
– Conventions
• Rules of the road and/or behavior models
Registers
• General purpose
– 16bit: AX, BX, CX, DX, SI, DI
– 32 bit: EAX, EBX, ECX, EDX, ESI, EDI
– 64 bit: RAX, RBX, RCX, RDX, RSI, RDI + others
• Environmental
– RSP, RIP
– RBP = frame pointer, defines local scope
• Special uses
– Calling conventions
• RAX == return code
• RDI, RSI, RDX, RCX… == ordered arguments
– Hardware defined
• Some instructions implicitly use specific registers
– RSI/RDI  String instructions
– RBP  leaveq
Memory
• X86 provides complex memory addressing capabilities
– Immediate addressing
• mov %rsi, ($0xfff000)
– Direct addressing
• mov %rsi, (%rbp)
– Offset Addressing
• mov %rsi, $0x8(%rax)
• Base + (Index * Scale) + Displacement
–
–
–
–
A.K.A. SIB
Occasionally seen
Hardly ever used by hand
movl %ebp, (%rdi,%rsi,4)
• Address = rdi + rsi * 4
– A more complicated example
• segment:disp(base, index, scale)
8/16/32/64 bit operands
• Programmer explicitly specifies operand length in
operand
• Example: mov reg, reg
–
–
–
–
8 bits: movb %al, %bl
16 bits: movw %ax, %bx
32 bits: movl %eax, %ebx
64 bits: movq %rax, %rbx
• What about “movl %ebx, (%rdi)”?
Function call implementation
We can now decode what is going on here
int foo(int arg1, char * arg2) { return 0; }
0000000000000107 <foo>:
107: 55
push %rbp
108: 48 89 e5
mov %rsp,%rbp
10b: 89 7d fc
mov %edi,-0x4(%rbp)
10e: 48 89 75 f0
mov %rsi,-0x10(%rbp)
112: b8 00 00 00 00
mov $0x0,%eax
117: c9
leaveq
118: c3
retq
• Location
• Address of function + ret instruction
• Arguments
• Passed in registers (which ones? And why those?)
• Return code
• Stored in register: EAX
OS development requires
assembly programming
• OS operations are not typically expressible
with a higher level language
– Examples: atomic operations, page table
management, configuring segments,
• System calls(!)
• How to mix assembly with OS code (in C)
– Compile with assembler and link with C code
• .S files compiled with gas
– Inline w/ compiler support
• .c files compiled with gcc
Implementing assembler functions
• C functions:
– Location, args, return code
• ASM functions:
– Location only
– Programmer must implement everything else
• Arguments, context, return values
• Everything in foo() from before + function body
• Programmer takes place of compiler
– Must match calling conventions
Calling assembler functions
• Programmer implements calling convention
– Behaves just like a regular function
• Only need location
– Linker takes care of the rest
Defines a global variable
.globl foo
foo:
push %rbp
mov %rsp, %rbp
…
foo.S
extern int foo(int, char *);
int main() {
int x = foo(1, “test”);
}
main.c
Inline
• OS only needs a few full blown assembly
functions
– Context switches, interrupt handling, a few others
• Most of the time just need to execute a single
instruction
– i.e. set a bit in this control register
• GCC provides ability to incorporate inline
assembly instructions into a regular .c file
– Not a function
– Compiler handles argument marshaling
Overview
• Inline assembly includes 2 components
– Assembly code
– Compiler directives for operand marshaling
asm ( assembler template
: output operands
: input operands
: list of clobbered registers
);
/* optional */
/* optional */
/* optional */
Inline assembly execution
• Sequence of individual assembly instructions
– Can execute any hardware instruction
– Can reference any register or memory location
– Can reference specified variables in C code
• 3 Stages of execution
1. Load C variables into correct registers or memory
2. Execute assembly instructions
3. Copy register and memory contents into C variables
Specifying inline operands
• How does compiler copy C variables to/from
registers?
• C variables and registers are explicitly linked in
asm specification
– Sections for input and output operands
– Compiler handles copying to and from variables
before and after assembly executed
– Assembly code references marshaled values
(index of operand) instead of raw registers
Operand Codes
• Wide range of operand codes (“constraints”)
are available
– Input: “code”(c-variable)
– Output: “=code”(c-variable)
a
b
c
d
S
D
=
=
=
=
=
=
%rax,
%rbx,
%rcx,
%rdx,
%rsi,
%rdi,
%eax,
%ebx,
%ecx,
%edx,
%esi,
%edi,
%ax
%bx
%cx
%dx
%si
%di
Explicit Register codes
r
q
m
f
i
g
=
=
=
=
=
=
Any register
a, b, c, d regs
memory operand
floating point reg
immediate
anything
Other Operand codes
And many more….
Register example
int foo(int arg1, char * arg2) {
int a=10, b;
asm ("movl %1, %%ecx;\n“
“movl %%ecx, %0;\n"
: ”=b"(b)
/* output */
: “a"(a)
/* input */
: );
return 0;
}
What does this do?
0000000000000107 <foo>:
107: 55
push %rbp
108: 48 89 e5
mov %rsp,%rbp
10b: 53
push %rbx
10c: 89 7d e4
mov %edi,-0x1c(%rbp)
10f: 48 89 75 d8
mov %rsi,-0x28(%rbp)
113: c7 45 f0 0a 00 00 00 movl $0xa,-0x10(%rbp)
11a: 8b 45 f0
mov -0x10(%rbp),%eax
11d: 89 c1
mov %eax,%ecx
11f: 89 cb
mov %ecx,%ebx
121: 89 d8
mov %ebx,%eax
123: 89 45 f4
mov %eax,-0xc(%rbp)
126: b8 00 00 00 00
mov $0x0,%eax
12b: 5b
pop %rbx
12c: c9
leaveq
12d: c3
retq
Memory example
• X86 can also use memory (SIB, etc) operands
– “m” operand code
int foo(int arg1, char * arg2) {
int a=10, b;
asm ("movl
"movl
:
:
: );
return 0;
}
%1, %%ecx;\n"
%%ecx, %0;\n"
"=m"(b)
"m"(a)
0000000000000107 <foo>:
0: 55
push %rbp
1: 48 89 e5
mov %rsp,%rbp
4: 89 7d ec
mov %edi,-0x14(%rbp)
7: 48 89 75 e0
mov %rsi,-0x20(%rbp)
b: c7 45 fc 0a 00 00 00 movl $0xa,-0x4(%rbp)
12: 8b 4d fc
mov -0x4(%rbp),%ecx
15: 89 4d f8
mov %ecx,-0x8(%rbp)
18: b8 00 00 00 00
mov $0x0,%eax
1d: c9
leaveq
1e: c3
retq
Input/output operands
• Sometimes input and output operands are the
same variable
– Transform input variable in some way
int foo(int arg1, char * arg2) {
int a=10, b=5;
asm (“addl %1, %0;\n"
: "=r"(b)
: "m"(a), "0"(b)
: );
return 0;
}
0000000000000107 <foo>:
0: 55
push %rbp
1: 48 89 e5
mov %rsp,%rbp
4: 89 7d ec
mov %edi,-0x14(%rbp)
7: 48 89 75 e0
mov %rsi,-0x20(%rbp)
b: c7 45 fc 0a 00 00 00 movl $0xa,-0x8(%rbp)
12: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
19: 8b 45 fc
mov -0x4(%rbp),%eax
1c: 03 45 f8
add -0x8(%rbp),%eax
1f: 89 45 fc
mov %eax,-0x4(%rbp)
22: b8 00 00 00 00
mov $0x0,%eax
27: c9
leaveq
28: c3
retq
Input/output operands (2)
• Input/output operands can also be specified
with “+”
int foo(int arg1, char * arg2) {
int a=10, b=5;
asm (“addl %1, %0;\n"
: “+r"(b)
: "m"(a)
: );
return 0;
}
0000000000000107 <foo>:
0: 55
push %rbp
1: 48 89 e5
mov %rsp,%rbp
4: 89 7d ec
mov %edi,-0x14(%rbp)
7: 48 89 75 e0
mov %rsi,-0x20(%rbp)
b: c7 45 fc 0a 00 00 00 movl $0xa,-0x8(%rbp)
12: c7 45 fc 05 00 00 00 movl $0x5,-0x4(%rbp)
19: 8b 45 fc
mov -0x4(%rbp),%eax
1c: 03 45 f8
add -0x8(%rbp),%eax
1f: 89 45 fc
mov %eax,-0x4(%rbp)
22: b8 00 00 00 00
mov $0x0,%eax
27: c9
leaveq
28: c3
retq
Clobbered list
• We cheated earlier…
int foo(int arg1, char * arg2) {
int a=10, b;
asm ("movl
"movl
:
:
: );
• How does compiler know
to save/restore ECX?
– It doesn’t
%1, %%ecx;\n"
%%ecx, %0;\n"
"=m"(b)
"m"(a)
return 0;
}
• We must explicitly tell compiler what registers have
been implicitly messed with
– In this case ECX, but other instructions have implicit
operands (CHECK THE MANUALS)
• Second set of constraints to inline assembly
– Clobber list: Operands not used as either input or output
but still must be saved/restored by compiler
Why clobber list?
• Why do we need this?
– Compilers try to optimize performance
• Cache intermediate values and assume values don’t
change
• Compiler cannot inspect ASM behavior
– outside scope of compiler
• Clobber lists tell compiler:
– “You cannot trust the contents of these resources
after this point”
– Or “Do not perform optimizations that span this
block on these resources”
Using clobber lists
int foo(int arg1, char * arg2) {
int a=10, b;
asm ("movl %1, %%ecx;\n"
"movl %%ecx, %0;\n"
: "=m"(b)
: "m"(a)
: “ecx”, “memory” );
return 0;
}
• ECX is used implicitly so its value must be
saved/restored
• What about “memory”?
Back to system calls
• Function calls not that special
– Just an abstraction built on top of hardware
• System calls are basically function calls
– With a few minor changes
• Privilege elevation
• Constrained entry points
– Functions can call to any address
– System calls must go through “gates”
Implementing system calls
• System calls are implemented as a single function
call: syscall()
– read() and write() actually just invoke syscall()
• What does syscall do?
– Enters into the kernel at a known location
– Elevates privilege
– Instantiates kernel level environment
• Once inside the kernel, an appropriate system call
handler is invoked based on arguments to
syscall()
x86 and Linux
• Number of different mechanisms for implementing syscall
– Legacy: int 0x80 – Invokes a single interrupt handler
– 32 bit: SYSENTER – Special instruction that sets up preset kernel
environment
– 64 bit: SYSCALL – 64 bit version of SYSENTER
• All jump to a preconfigured execution environment inside
kernel space
– Either interrupt context or OS defined context
• What about arguments?
– syscall(int syscall_num, args…)
Specific system calls
• Each system call has a number assigned to it
– Index into a system call table
• Function pointers referencing each syscall handler
• Syscall(int syscall_num, args…)
– Sets up kernel environment
– Invokes syscall_table[syscall_num](args…);
– Returns to user space:
• Resets environment to state before call