Download SystemCalls

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
CSC 660: Advanced OS
System Calls
CSC 660: Advanced Operating Systems
Slide #1
A Different Kind of C
1.
2.
3.
4.
5.
6.
7.
8.
9.
No access to C library.
ISO C99 + GNU C extensions.
No memory protection.
Small fixed-size (8KB) stack.
Limited floating point support.
Concurrency and synchronization.
Portability.
Coding style and idioms.
Debugging.
CSC 660: Advanced Operating Systems
Slide #2
No access to C library
Why not?
Bootstrapping (C library uses system calls…)
Performance and size.
Kernel equivalent functions
Use lib/string.c for string operations.
Use printk() instead of printf()
CSC 660: Advanced Operating Systems
Slide #3
ISO C 99
Inline Functions
static inline void dog(int tail)
Struct Assignment
struct file_operations fops = {
.read = device_read,
.write = device_write,
.open = device_open,
.release = device_release
};
CSC 660: Advanced Operating Systems
Slide #4
GNU C
Inline Assembly (asm or __asm__ keyword)
asm ( assembler template
: output operands
: input operands
: list of clobbered registers
);
Example from arch/i386/signal.c:
__asm__("movl %%gs,%0" : "=r"(tmp): "0"(tmp));
Branch Annotation
Optimize branch for most likely decision.
likely() and unlikely() macros
CSC 660: Advanced Operating Systems
Slide #5
GNU C
asmlinkage
Function attribute to allow C functions to be called
from assembly language (prevents parameters
being placed in registers.)
volatile
Warns compiler that variable may be changed
asynchronously by other threads (prevents
compiler from optimizing away reads.)
static inline
Inline function expansion to improve speed.
CSC 660: Advanced Operating Systems
Slide #6
No Memory Protection
Kernel traps illegal memory access for users
Sends SIGSEGV to kill offending process.
No one to look out for kernel.
Memory violations result in kernel oops.
Kernel memory is not pageable.
Uses physical memory, not swap space.
CSC 660: Advanced Operating Systems
Slide #7
Small Fixed Stack
Kernel stack is 2 4KB pages
Cannot create many local variables.
No deep recursion.
CSC 660: Advanced Operating Systems
Slide #8
Floating Point
Floating point used to be handled by FPU.
Integrated into CPU with 80486DX.
Still performed with ESCAPE instructions.
FPU has own FP registers.
Shared with MMX unit.
Not saved by default on context switch.
Must use FP carefully in kernel
Call kernel_fpu_begin() before using FPU.
Call kernel_fpu_end() after using FPU.
CSC 660: Advanced Operating Systems
Slide #9
Concurrency
Asynchronous interrupts
Interrupt handlers may access resources at the same time as
your function.
Multiprocessing
Another processor may be executing function at the same
time.
Preemptive kernel
Scheduler can preempt your kernel thread in favor of
another thread.
Synchronization Solutions
Spinlocks
Semaphors
CSC 660: Advanced Operating Systems
Slide #10
Portability
Kernel runs on 22 architectures.
Different endianess.
Different word sizes.
Different page sizes.
Kernel code must be
Endian neutral
64-bit clean
No assumptions about word or page size.
CSC 660: Advanced Operating Systems
Slide #11
Portability
A char is always 8 bits (may be signed or unsigned).
A short is currently 16 bits on all archs.
An int is currently 32 bits on all archs.
A long may be 32 or 64 bits.
A pointer may be 32 or 64 bits.
Use explicitly sized types when necessary:
s8,u8,s16,u16,s32,u32,s64,u64
Use opaque types for portability
atomic_t, pid_t
CSC 660: Advanced Operating Systems
Slide #12
Coding Style
Indentation
Tabs that are 8-characters in length.
Braces
Conditionals/loops: initial { at end of statement
if (foo) {
…
} else {
…
}
Functions: { on separate line
int foo()
{
…
}
CSC 660: Advanced Operating Systems
Slide #13
Coding Style
Naming
Lower case, words separated by underscores.
Use descriptive names, especially for globals.
Functions
No longer than 2 screens of text.
Fewer than 10 local variables.
Comments
Describe what and why, not how your code works.
Ifdefs
Restrict them to include (.h) files.
CSC 660: Advanced Operating Systems
Slide #14
Idioms
do { stmt1; stmt2 } while (0)
Found in macros.
Allows multi-statement macros in if/else
Heavy use of bit operators
and(&), or(|), xor(^), not(~)
Heavy use of goto
Often used to exit control structures on error.
CSC 660: Advanced Operating Systems
Slide #15
Kernel Debugging: Oops
An oops is a major kernel failure.
Ex: dereferencing a null pointer
If kernel cannot recover, a panic results.
Information sent to console
Text description
Register contents
Stack backtrace
CSC 660: Advanced Operating Systems
Slide #16
Kernel Debugging: Oops
Unable to handle kernel NULL pointer dereference at virtual address
00000000
c0203c18
EIP:
0060:[<c0203c18>]
Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: c137a800
ebx: c0e80200
ecx: c1379050
edx: 00000000
esi: c137a800
edi: c13d0000
ebp: 00000246
esp: c13d1f2c
ds: 007b
es: 007b
ss: 0068
Stack:
c1379050 00000002 c137a800 00000008 00000000 c137a800 c02060b3
c137a800 0001221e 00000000 c030b004 c030b000 c13fdc10 c02037c0
c137a800 00000293 c0125b6d 00000000 c13fdc28 c13fdc20 c13d0000
c13d0000 c13d0000 00000000
Call Trace:
[<c02060b3>] is_complete+0x2c3/0x310
[<c02037c0>] run+0x30/0x40
[<c0125b6d>] worker_thread+0x1bd/0x2b0
[<c0203790>] run+0x0/0x40
[<c0113b10>] default_wake_function+0x0/0x20
[<c0108fd6>] ret_from_fork+0x6/0x20
[<c0113b10>] default_wake_function+0x0/0x20
[<c01259b0>] worker_thread+0x0/0x2b0
CSC 660: Advanced Operating Systems
Slide #17
printk()
Robust and callable except early in boot
Enable early_printk() option for that.
Circular log buffer
klogd reads /proc/kmsg
syslogd gets data from klogd
writes to /var/log/syslog
can also access with dmesg
Message priorities
0(high) .. 7(low)
Named: KERN_EMERG, _ALERT, _CRIT, _ERR,
_WARNING, _NOTICE, _INFO, _DEBUG
CSC 660: Advanced Operating Systems
Slide #18
Printing Debugging Information
printk()
Assertions
BUG_ON(bad_condition) causes oops
Panics
if (terrible_condition)
panic(“Terrible condition!”);
Stack traces
if (!debug_check) {
printk(KERN_DEBUG “Check x failed\n”);
dump_stack();
}
CSC 660: Advanced Operating Systems
Slide #19
System Calls
System calls provide the interface
between user programs and kernel.
1. Abstracted hardware interface.
2. Security and stability.
3. Allows virtualization.
CSC 660: Advanced Operating Systems
Slide #20
Hello World
> cat >hello.c
#include <stdio.h>
int main(int argc, char *argv[]) {
printf("Hello world!\n");
return 0;
}
> gcc –o hello hello.c
> ltrace ./hello
__libc_start_main(0x8048394, 1, 0xbffff914,
0x80483b8, 0x8048400 <unfinished ...>
printf("Hello world!\n"Hello world!
)
= 13
+++ exited (status 0) +++
CSC 660: Advanced Operating Systems
Slide #21
Hello World
>strace ./hello
execve("./hello", ["./hello"], [/* 40 vars */]) = 0
uname({sys="Linux", node="tara", ...}) = 0
brk(0)
= 0x804a000
access("/etc/ld.so.nohwcap", F_OK)
= -1 ENOENT (No such
file or directory)
old_mmap(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe9000
open("/etc/ld.so.preload", O_RDONLY)
= -1 ENOENT (No such
file or directory)
open("/etc/ld.so.cache", O_RDONLY)
= 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=50648, ...}) = 0
old_mmap(NULL, 50648, PROT_READ, MAP_PRIVATE, 3, 0) =
0xb7fdc000
close(3)
= 0
access("/etc/ld.so.nohwcap", F_OK)
= -1 ENOENT (No such
file or directory)
open("/lib/tls/i686/cmov/libc.so.6", O_RDONLY) = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\215Y\1"...,
512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=1222116, ...}) = 0
CSC 660: Advanced Operating Systems
Slide #22
Hello World
old_mmap(NULL, 1232428, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0)
= 0xb7eaf000
old_mmap(0xb7fd1000, 36864, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED, 3, 0x121000) = 0xb7fd1000
old_mmap(0xb7fda000, 7724, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7fda000
close(3)
= 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7eae000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7eae080,
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0,
limit_in_pages:1, seg_not_present:0, useable:1}) = 0
munmap(0xb7fdc000, 50648)
= 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3),
...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe8000
write(1, "Hello world!\n", 13Hello world!)
= 13
munmap(0xb7fe8000, 4096)
= 0
exit_group(0)
= ?
CSC 660: Advanced Operating Systems
Slide #23
Using a System Call
Application
Calls printf()
C library (glibc)
printf() function issues write() system call.
Kernel
write() system call manages output.
sets global errno variable if an error occurs.
returns to user application
CSC 660: Advanced Operating Systems
Slide #24
Making a System Call
Software Interrupt
Historically: int $0x80
Modern: sysenter
System Call Number
Put in %eax register before interrupt
sys_call_table in arch/i386/kernel/entry.S
Parameters
1-5 args: %ebx, %ecx, %edx, %esi, %edi
6+ args: one register has pointer to user space params
Returning
Return from software interrupt: iret or sysexit
Return value stored in %eax register.
CSC 660: Advanced Operating Systems
Slide #25
System Call Macros
include/asm-i386/unistd.h
#define _syscall0(type,name) \
type name(void) \
{ \
long __res; \
__asm__ volatile ("int $0x80" \
: "=a" (__res) \
: "0" (__NR_##name)); \
__syscall_return(type,__res); \
}
#define _syscall2(type,name,type1,arg1,type2,arg2) \
type name(type1 arg1,type2 arg2) \
{ \
long __res; \
__asm__ volatile ("int $0x80" \
: "=a" (__res) \
: "0" (__NR_##name),"b" ((long)(arg1)),"c"
((long)(arg2))); \
__syscall_return(type,__res); \
}
CSC 660: Advanced Operating Systems
Slide #26
Kernel System Call
arch/i386/entry.S
ENTRY(system_call)
pushl %eax
# save orig_eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
# system call tracing in operation
testb
$(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT),TI_flags(%ebp)
jnz syscall_trace_entry
cmpl $(nr_syscalls), %eax
jae syscall_badsys
syscall_call:
call *sys_call_table(,%eax,4)
movl %eax,EAX(%esp)
# store return value
syscall_exit:
cli
movl TI_flags(%ebp), %ecx
testw $_TIF_ALLWORK_MASK, %cx
# current->work
jne syscall_exit_work
restore_all:
RESTORE_ALL
CSC 660: Advanced Operating Systems
Slide #27
Defining a System Call
System call name: getpid()
System call function: sys_getpid()
asmlinkage long sys_getpid(void)
{
return current->tgid;
}
CSC 660: Advanced Operating Systems
Slide #28
Adding a System Call
1. Write system call function
2. Add entry to end of sys_call_table
In arch/i386/kernel/entry.S add
.long sys_mycall
3. Define system call number for user
In include/asm-i386/unistd.h
#define __NR_mycall
289
4. Compile kernel
CSC 660: Advanced Operating Systems
Slide #29
Calling your new syscall
#include <linux/unistd.h>
#define __NR_current_time 289
_syscall0(long, current_time)
#include <stdio.h>
int main()
{
long retval = 1;
retval = current_time();
printf("The return value is %ld\n", retval);
return 0;
}
CSC 660: Advanced Operating Systems
Slide #30
References
1.
2.
3.
4.
5.
6.
Daniel P. Bovet and Marco Cesati, Understanding the
Linux Kernel, 3rd edition, O’Reilly, 2005.
GNU, GNU C Library Manual,
http://www.gnu.org/software/libc/manual/, 2003.
Robert Love, Linux Kernel Development, 2nd edition,
Prentice-Hall, 2005.
Claudia Rodriguez et al, The Linux Kernel Primer,
Prentice-Hall, 2005.
Peter Salzman et. al., Linux Kernel Module Programming
Guide, version 2.6.1, 2005.
Andrew S. Tanenbaum, Modern Operating Systems, 2nd
edition, Prentice-Hall, 2001.
CSC 660: Advanced Operating Systems
Slide #31
Related documents