Download Page-Faults in Linux

Page-Faults in Linux How can we study the handling of page-fault exceptions? Why page-faults happen • • • • • • • Trying to access a virtual memory-address Instruction-operand / instruction-address Read-data/write-data, or fetch-instruction Maybe page is ‘not present’ Maybe page is ‘not readable’ Maybe page is ‘not writable’ Maybe page is ‘not visible’ Page-fault examples movl movl jmp %eax, (%ebx) (%ebx), %eax ahead ; writable? ; readable? ; present? Everything depends on the entries in the current page-directory and page-tables, and on the cpu’s Current Privilege Level Current Privilege Level (CPL) Layout of segment-register contents (16 bits) 3 15 segment-selector TI = Table-Indicator 2 1 0 T I RPL RPL=Requested Privilege Level CPL is determined by the value of RPL field in CS and SS What does the CPU do? • Whenever the cpu detects a page-fault, its action depends on Current Privilege Level • If CPL == 0 (executing in kernel mode): 1) push EFLAGS register 2) push CS register 3) push EIP register 4) push error-code 5) jump to page-fault service-routine Alternative action in user-mode • If CPL == 3 (executing in user mode) the CPU will switch to its kernel-mode stack: 0) 1) 2) 3) 4) 5) push SS and ESP push EFLAGS push CS push EIP push error-code jump to the page-fault service-routine How CPU finds new stack • • • • • • Special CPU segment-register: TR TR is the ‘Task Register’ TR holds ‘selector’ for a GDT descriptor Descriptor is for a ‘Task State Segment’ So TR points indirectly to current TSS TSS stores address of kernel-mode stack Stack Switching mechanism INTERRUPT DESCRIPTOR TABLE CS EIP user code SS ESP user stack user-space kernel-space Gate descriptor kernel code GLOBAL DESCRIPTOR TABLE kernel stack IDTR TR GDTR SS0 ESP0 TSS descriptor TASK STATE SEGMENT Let’s ‘intercept’ page-faults • • • • • • • • Use our systems programming knowledge We build a ‘new’ Interrupt Descriptor Table With our own ‘customized’ interrupt-gates Use a ‘new’ gate for page-fault exceptions Other existing gates we can simply copy Why not just modify the existing IDT? It’s ‘write-protected’ in some Linux kernels But we can still ‘read’ it (i.e., for copying) Very delicate to implement • • • • • • • Will need to use some assembly language Using C language doesn’t give full control C Compiler designers didn’t plan for this! (except they did allow for using assembly) Assembly requires us to be very precise So try keeping assembly to a minimum We can use a mixture of assembly and C Allocate a mapped page • • • • • • • • Device interrupts are ‘asynchronous’ CPU requires instant access to the IDT We must insure CPU can find new IDT Cannot risk putting it in ‘high memory’ We can use ‘get_free_page()’ function With flags: GFP_KERNEL and GFP_DMA (This insures page will be always mapped) No memory available? Cannot continue. Must find address of current IDT • • • • • • • We’ll need it for copying the existing gates We’ll need it for restoring old IDT upon exit We can use the ‘sidt’ instruction to find it But ‘sidt’ needs a 48-bit memory-operand No such type is directly supported in C We could use a 64-bit type (i.e., long long) Better to use array of three 16-bit values Getting hold of current IDT • • • • • • We need to declare a global variable Because ‘init_module()’ needs it And also ‘cleanup_module()’ needs it Use ‘static’ to make it private Use ‘short’ to get 16-bit array-entries Use ‘unsigned’ to avoid sign-extensions static unsigned short oldidtr[ 3 ]; Activating a ‘new’ IDT • • • • When we’re ready, we can use ‘sidt’ Instruction will change the IDTR register Instruction needs 48-bit memory operand So again we will declare a suitable array static unsigned short newidtr[ 3 ]; Initializations • • • • • • We need to initialize our ‘idtr’ array We need to initialize new Descriptor Table Use ‘memcpy()’ for copying within kernel Page-Fault’s gate-descriptor must be built Must conform to CPU’s expected layout Need to use a local 64-bit variable unsigned long long gate_desc; Format for a Gate Descriptor Quadword (64-bits) 63 offset[ 31…16 ] 0 gate type segment-selector offset[ 15…0 ] The address of the fault-handler is ‘split’ into a hiword and a loword Declaring our fault-handler • Tell the C compiler our handler’s name: asmlinkage void isr0x0E( void ); • Its type and value are set by assembler: asm(“ .text “); asm(“ .type isr0x0E, @function “); asm(“isr0x0E: “); Save/Restore cpu registers • Upon entering: asm(“ pushal asm(“ pushl asm(“ pushl • Upon leaving: asm(“ asm(“ asm(“ asm(“ popl popl popal jmp %ds %es %es %ds *old_isr “); “); “); “); “); “); “); Handler must access kernel data • Registers CS and SS get set up by the CPU • But its our job to set up DS and ES registers • Linux uses same segments for data and stack asm(“ mov asm(“ mov asm(“ mov %ss, %eax “); %eax, %ds “); %eax, %es “); • (Current kernel version doesn’t use FS or GS) Transfer to a C function • • • • Handler will need some info from the stack The ‘error-code’ will be needed for sure So C function will need an ‘argument’ So here’s our C function prototype: static void handler( unsigned long *tos );

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Page-Faults in Linux