Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
OPERATING SYSTEMS: DESIGN AND IMPLEMENTATION Second Edition ANDREW S. TANENBAUM Vrije Universiteit Amsterdam, The Netherlands ALBERT S. WOODHULL Hampshire College Amherst, Massachusetts PRENTICE HALL Upper Saddle River, NJ 07458 1 INTRODUCTION 1.1 1.2 1.3 1.4 1.5 WHAT IS AN OPERATING SYSTEM? HISTORY OF OPERATING SYSTEMS OPERATING SYSTEM CONCEPTS SYSTEM CALLS OPERATING SYSTEM STRUCTURE Banking system Airline reservation Compilers Editors Web browser Command Application programs interpreter System programs Operating system Machine language Microprogramming Hardware Physical devices Figure 1-1. A computer system consists of hardware, system programs, and application programs. Card reader Tape drive Input tape Output tape Printer 1401 (a) System tape (b) 7094 (c) (d) 1401 (e) (f) Figure 1-2. An early batch system. (a) Programmers bring cards to 1401. (b) 1401 reads batch of jobs onto tape. (c) Operator carries input tape to 7094. (d) 7094 does computing. (e) Operator carries output tape to 1401. (f) 1401 prints output. $END Data for program $RUN $LOAD Fortran Program $FORTRAN $JOB, 10,6610802, MARVIN TANENBAUM Figure 1-3. Structure of a typical FMS job. Job 3 Job 2 Job 1 Memory partitions Operating system Figure 1-4. A multiprogramming system with three jobs in memory. A B D E C F Figure 1-5. A process tree. Process A created two child processes, B and C. Process B created three child processes, D, E, and F. Root directory Students Robbert Matty Faculty Leo Prof. Brown Courses CS101 Papers CS105 Prof. Green Grants Prof. White Committees SOSP COST-11 Files Figure 1-6. A file system for a university department. Root a c Drive 0 b x d y a c (a) d b x (b) Figure 1-7. (a) Before mounting, the files on drive 0 are not accessible. (b) After mounting, they are part of the file hierarchy. y Process Process Pipe A B Figure 1-8. Two processes connected by a pipe. Process management pid = fork() pid = waitpid(pid, &statloc, opts) s = wait(&status) s = execve(name, argv, envp) exit(status) size = brk(addr) pid = getpid() pid = getpgrp() pid = setsid() l = ptrace(req, pid, addr, data) Create a child process identical to the parent Wait for a child to terminate Old version of waitpid Replace a process core image Terminate process execution and return status Set the size of the data segment Return the caller’s process id Return the id of the caller’s process group Create a new session and return its process group id Used for debugging s = sigreturn(&context) s = sigprocmask(how, &set, &old) s = sigpending(set) s = sigsuspend(sigmask) s = kill(pid, sig) residual = alarm(seconds) s = pause() Return from a signal Examine or change the signal mask Get the set of blocked signals Replace the signal mask and suspend the process Send a signal to a process Set the alarm clock Suspend the caller until the next signal fd = creat(name, mode) fd = mknod(name, mode, addr) fd = open(file, how, ...) s = close(fd) n = read(fd, buffer, nbytes) n = write(fd, buffer, nbytes) pos = lseek(fd, offset, whence) s = stat(name, &buf) s = fstat(fd, &buf) fd = dup(fd) s = pipe(&fd[0]) s = ioctl(fd, request, argp) s = access(name, amode) s = rename(old, new) s = fcntl(fd, cmd, ...) Obsolete way to create a new file Create a regular, special, or directory i-node Open a file for reading, writing or both Close an open file Read data from a file into a buffer Write data from a buffer into a file Move the file pointer Get a file’s status information Get a file’s status information Allocate a new file descriptor for an open file Create a pipe Perform special operations on a file Check a file’s accessibility Give a file a new name File locking and other operations s = mkdir(name, mode) s = rmdir(name) s = link(name1, name2) s = unlink(name) s = mount(special, name, flag) s = umount(special) s = sync() s = chdir(dirname) s = chroot(dirname) Create a new directory Remove an empty directory Create a new entry, name2, pointing to name1 Remove a directory entry Mount a file system Unmount a file system Flush all cached blocks to the disk Change the working directory Change the root directory uid = getuid() gid = getgid() s = setuid(uid) s = setgid(gid) s = chown(name, owner, group) oldmask = umask(complmode) Get the caller’s uid Get the caller’s gid Set the caller’s uid Set the caller’s gid Change a file’s owner and group Change the mode mask seconds = time(&seconds) s = stime(tp) s = utime(file, timep) s = times(buffer) Get the elapsed time since Jan. 1, 1970 Set the elapsed time since Jan. 1, 1970 Set a file’s "last access" time Get the user and system times used so far Signals s = sigaction(sig, &act, &oldact) Define action to take on signals File Management Directory & File System Management Protection s = chmod(name, mode) Change a file’s protection bits Time Management Figure 1-9. The MINIX system calls. while (TRUE) { /* repeat forever */ read command(command, parameters);/ * read input from terminal */ if (fork() != 0) { /* fork off child process */ /* Parent code. */ waitpid(−1, &status, 0); /* wait for child to exit */ } else { /* Child code. */ execve(command, parameters, 0);/* execute command */ } } Figure 1-10. A stripped-down shell. Throughout this book, TRUE is assumed to be defined as 1. Address (hex) FFFF Stack Gap Data Text 0000 Figure 1-11. Processes have three segments: text, data, and stack. In this example, all three are in one address space, but separate instruction and data space is also supported. struct stat { short st dev; /* device where i-node belongs */ unsigned short st ino; /* i-node number */ unsigned short st mode; /* mode word */ short st nlink; /* number of links */ short st uid; /* user id */ short st gid; /* group id */ short st rdev; /* major/minor device for special files */ long st size; /* file size */ /* time of last access */ long st atime; long st mtime; /* time of last modification */ long st ctime; /* time of last change to i-node */ }; Figure 1-12. The structure used to return information for the STAT and FSTAT system calls. In the actual code, symbolic names are used for some of the types. #define STD INPUT 0 #define STD OUTPUT 1 /* file descriptor for standard input */ /* file descriptor for standard output */ pipeline(process1, process2) char *process1, *process2; /* pointers to program names */ { int fd[2]; pipe(&fd[0]); /* create a pipe */ if (fork() != 0) { /* The parent process executes these statements. */ close(fd[0]); /* process 1 does not need to read from pipe */ close(STD OUTPUT); /* prepare for new standard output */ dup(fd[1]); /* set standard output to fd[1] */ close(fd[1]); /* this file descriptor not needed any more */ execl(process1, process1, 0); } else { /* The child process executes these statements. */ close(fd[1]); /* process 2 does not need to write to pipe */ close(STD INPUT); /* prepare for new standard input */ dup(fd[0]); /* set standard input to fd[0] */ close(fd[0]); /* this file descriptor not needed any more */ execl(process2, process2, 0); } } Figure 1-13. A skeleton for setting up a two-process pipeline. /usr/ast 16 mail 81 games 40 test /usr/jim 31 70 59 38 (a) bin memo f.c. prog1 /usr/ast 16 81 40 70 mail games test note /usr/jim 31 70 59 38 bin memo f.c. prog1 (b) Figure 1-14. (a) Two directories before linking /usr/jim/memo to ast’s directory. (b) The same directories after linking. bin dev lib (a) mnt usr bin dev usr (b) Figure 1-15. (a) File system before the mount. (b) File system after the mount. User program 2 User programs run in user mode Main memory User program 1 Kernel call 4 3 Service procedure 1 Dispatch table Operating system runs in kernel mode 2 Figure 1-16. How a system call can be made: (1) User program traps to the kernel. (2) Operating system determines service number required. (3) Operating system calls service procedure. (4) Control is returned to user program. Main procedure Service procedures Utility procedures Figure 1-17. A simple structuring model for a monolithic system. Layer Function The operator 5 User programs 4 Input/output management 3 Operator-process communication 2 Memory and drum management 1 Processor allocation and multiprogramming 0 Figure 1-18. Structure of the THE operating system. Virtual 370s System calls here I/O instructions here Trap here CMS CMS CMS Trap here VM/370 370 Bare hardware Figure 1-19. The structure of VM/370 with CMS. Client process Client process Process server Terminal server File server Memory server User mode Kernel mode Kernel Client obtains service by sending messages to server processes Figure 1-20. The client-server model. Machine 1 Machine 2 Machine 3 Machine 4 Client File server Process server Terminal server Kernel Kernel Kernel Kernel Network Message from client to server Figure 1-21. The client-server model in a distributed system. 2 PROCESSES 2.1 2.2 2.3 2.4 2.5 2.6 INTRODUCTION TO PROCESSES INTERPROCESS COMMUNICATION CLASSICAL IPC PROBLEMS PROCESS SCHEDULING OVERVIEW OF PROCESSES IN MINIX IMPLEMENTATION OF PROCESSES IN MINIX One program counter Process A Four program counters Process switch B C A B C D D C B A D (a) Time (b) (c) Figure 2-1. (a) Multiprogramming of four programs. (b) Conceptual model of four independent, sequential processes. (c) Only one program is active at any instant. Running 1 Blocked 3 2 4 Ready 1. Process blocks for input 2. Scheduler picks another process 3. Scheduler picks this process 4. Input becomes available Figure 2-2. A process can be in running, blocked, or ready state. Transitions between these states are as shown. Processes 0 1 n–2 n–1 Scheduler Figure 2-3. The lowest layer of a process-structured operating system handles interrupts and scheduling. Above that layer are sequential processes. Process management Memory management File management Registers Pointer to text segment UMASK mask Program counter Pointer to data segment Root directory Program status word Pointer to bss segment Working directory Stack pointer Exit status File descriptors Process state Signal status Effective uid Time when process started Process id Effective gid CPU time used Parent process System call parameters Children’s CPU time Process group Various flag bits Time of next alarm Real uid Message queue pointers Effective Pending signal bits Real gid Process id Effective gid Various flag bits Bit maps for signals Various flag bits Figure 2-4. Some of the fields of the MINIX process table. 1. Hardware stacks program counter, etc. 2. Hardware loads new program counter from interrupt vector. 3. Assembly language procedure saves registers. 4. Assembly language procedure sets up new stack. 5. C interrupt service runs (typically reads and buffers input). 6. Scheduler marks waiting task as ready. 7. Scheduler decides which process is to run next. 8. C procedure returns to the assembly code. 9. Assembly language procedure starts up new current process. Figure 2-5. Skeleton of what the lowest level of the operating system does when an interrupt occurs. Computer Program counter Computer Thread (a) Process (b) Figure 2-6. (a) Three processes each with one thread. (b) One process with three threads. Process A Spooler directory 4 abc 5 prog. c 6 prog. n 7 out = 4 in = 7 Process B Figure 2-7. Two processes want to access shared memory at the same time. while (TRUE) { while (TRUE) { while (turn != 0) /* wait */ ; while (turn != 1) /* wait */ ; critical region(); critical region(); turn = 1; turn = 0; noncritical region(); noncritical region(); } } (a) (b) Figure 2-8. A proposed solution to the critical region problem. #define FALSE #define TRUE #define N 2 0 1 /* number of processes */ int turn; int interested[N]; /* whose turn is it? */ /* all values initially 0 (FALSE) */ void enter region(int process); /* process is 0 or 1 */ { int other; /* number of the other process */ } other = 1 − process; /* the opposite of process */ interested[process] = TRUE; /* show that you are interested */ turn = process; /* set flag */ while (turn == process && interested[other] == TRUE) /* null statement */ ; void leave region(int process) /* process: who is leaving */ { interested[process] = FALSE; /* indicate departure from critical region */ } Figure 2-9. Peterson’s solution for achieving mutual exclusion. enter region: tsl register,lock cmp register,#0 jne enter region ret | copy lock to register and set lock to 1 | was lock zero? | if it was non zero, lock was set, so loop | return to caller; critical region entered leave region: move lock,#0 ret | store a 0 in lock | return to caller Figure 2-10. Setting and clearing locks using TSL. #define N 100 int count = 0; /* number of slots in the buffer */ /* number of items in the buffer */ void producer(void) { while (TRUE) { /* repeat forever */ produce item(); /* generate next item */ if (count == N) sleep(); /* if buffer is full, go to sleep */ enter item(); /* put item in buffer */ count = count + 1; /* increment count of items in buffer */ if (count == 1) wakeup(consumer);/* was buffer empty? */ } } void consumer(void) { while (TRUE) { /* repeat forever */ if (count == 0) sleep(); /* if buffer is empty, got to sleep */ remove item(); /* take item out of buffer */ count = count − 1; /* decrement count of items in buffer */ if (count == N−1) wakeup(producer);/* was buffer full? */ consume item(); /* print item */ } } Figure 2-11. The producer-consumer problem with a fatal race condition. #define N 100 typedef int semaphore; semaphore mutex = 1; semaphore empty = N; semaphore full = 0; /* number of slots in the buffer */ /* semaphores are a special kind of int */ /* controls access to critical region */ /* counts empty buffer slots */ /* counts full buffer slots */ void producer(void) { int item; while (TRUE) { produce item(&item); down(&empty); down(&mutex); enter item(item); up(&mutex); up(&full); } /* TRUE is the constant 1 */ /* generate something to put in buffer */ /* decrement empty count */ /* enter critical region */ /* put new item in buffer */ /* leave critical region */ /* increment count of full slots */ } void consumer(void) { int item; while (TRUE) { down(&full); down(&mutex); remove item(&item); up(&mutex); up(&empty); consume item(item); } /* infinite loop */ /* decrement full count */ /* enter critical region */ /* take item from buffer */ /* leave critical region */ /* increment count of empty slots */ /* do something with the item */ } Figure 2-12. The producer-consumer problem using semaphores. monitor example integer i; condition c; procedure producer(x); .. . end; procedure consumer(x); .. . end; end monitor; Figure 2-13. A monitor. monitor ProducerConsumer condition full, empty; integer count; procedure enter; begin if count = N then wait(full); enter item; count := count + 1; if count = 1 then signal(empty) end; procedure remove; begin if count = 0 then wait(empty); remove item; count := count − 1; if count = N − 1 then signal(full) end; count := 0; end monitor; procedure producer; begin while true do begin produce item; ProducerConsumer.enter end end; procedure consumer; begin while true do begin ProducerConsumer.remove; consume item end end; Figure 2-14. The producer-consumer problem with monitors. #define N 100 /* number of slots in the buffer */ void producer(void) { int item; message m; /* message buffer */ while (TRUE) { produce item(&item); /* generate something to put in buffer */ receive(consumer, &m); /* wait for an empty to arrive */ build message(&m, item); /* construct a message to send */ send(consumer, &m); /* send item to consumer */ } } void consumer(void) { int item, i; message m; for (i = 0; i < N; i++) send(producer, &m);/* send N empties */ while (TRUE) { receive(producer, &m); /* get message containing item */ extract item(&m, &item); /* extract item from message */ send(producer, &m); /* send back empty reply */ consume item(item); /* do something with the item */ } } Figure 2-15. The producer-consumer problem with N messages. Figure 2-16. Lunch time in the Philosophy Department. #define N 5 /* number of philosophers */ void philosopher(int i) { while (TRUE) { think(); take fork(i); take fork((i+1) % N); eat(); put fork(i); put fork((i+1) % N); } } /* i: philosopher number, from 0 to 4 */ /* philosopher is thinking */ /* take left fork */ /* take right fork; % is modulo operator */ /* yum-yum, spaghetti */ /* put left fork back on the table */ /* put right fork back on the table */ Figure 2-17. A nonsolution to the dining philosophers problem. #define N 5 #define LEFT (i-1)%N #define RIGHT (i+1)%N #define THINKING 0 #define HUNGRY 1 #define EATING 2 /* number of philosophers */ /* number of i’s left neighbor */ /* number of i’s right neighbor */ /* philosopher is thinking */ /* philosopher is trying to get forks */ /* philosopher is eating */ typedef int semaphore; int state[N]; semaphore mutex = 1; semaphore s[N]; /* semaphores are a special kind of int */ /* array to keep track of everyone’s state */ /* mutual exclusion for critical regions */ /* one semaphore per philosopher */ void philosopher(int i) { while (TRUE) { think(); take forks(i); eat(); put forks(i); } } /* i: philosopher number, from 0 to N-1 */ /* repeat forever */ /* philosopher is thinking */ /* acquire two forks or block */ /* yum-yum, spaghetti */ /* put both forks back on table */ void take forks(int i) { down(&mutex); state[i] = HUNGRY; test(i); up(&mutex); down(&s[i]); } /* i: philosopher number, from 0 to N-1 */ /* enter critical region */ /* record fact that philosopher i is hungry */ /* try to acquire 2 forks */ /* exit critical region */ /* block if forks were not acquired */ void put forks(i) { down(&mutex); state[i] = THINKING; test(LEFT); test(RIGHT); up(&mutex); } /* i: philosopher number, from 0 to N-1 */ /* enter critical region */ /* philosopher has finished eating */ /* see if left neighbor can now eat */ /* see if right neighbor can now eat */ /* exit critical region */ void test(i) /* i: philosopher number, from 0 to N-1 */ { if (state[i] == HUNGRY && state[LEFT] != EATING && state[RIGHT] != EATING) { state[i] = EATING; up(&s[i]); } } Figure 2-18. A solution to the dining philosopher’s problem. typedef int semaphore; semaphore mutex = 1; semaphore db = 1; int rc = 0; /* use your imagination */ /* controls access to ’rc’ */ /* controls access to the data base */ /* # of processes reading or wanting to */ void reader(void) { while (TRUE) { down(&mutex); rc = rc + 1; if (rc == 1) down(&db); up(&mutex); read data base(); down(&mutex); rc = rc - 1; if (rc == 0) up(&db); up(&mutex); use data read(); } } /* repeat forever */ /* get exclusive access to ’rc’ */ /* one reader more now */ /* if this is the first reader ... */ /* release exclusive access to ’rc’ */ /* access the data */ /* get exclusive access to ’rc’ */ /* one reader fewer now */ /* if this is the last reader ... */ /* release exclusive access to ’rc’ */ /* noncritical region */ void writer(void) { while (TRUE) { think up data(); down(&db); write data base(); up(&db); } } /* repeat forever */ /* noncritical region */ /* get exclusive access */ /* update the data */ /* release exclusive access */ Figure 2-19. A solution to the readers and writers problem. Figure 2-20. The sleeping barber. #define CHAIRS 5 /* # chairs for waiting customers */ typedef int semaphore; /* use your imagination */ semaphore customers = 0; semaphore barbers = 0; semaphore mutex = 1; int waiting = 0; /* # of customers waiting for service */ /* # of barbers waiting for customers */ /* for mutual exclusion */ /* customers are waiting (not being cut) */ void barber(void) { while (TRUE) { down(customers); down(mutex); waiting = waiting - 1; up(barbers); up(mutex); cut hair(); } } void customer(void) { down(mutex); if (waiting < CHAIRS) { waiting = waiting + 1; up(customers); up(mutex); down(barbers); get haircut(); } else { up(mutex); } } /* go to sleep if # of customers is 0 */ /* acquire access to ’waiting’ */ /* decrement count of waiting customers */ /* one barber is now ready to cut hair */ /* release ’waiting’ */ /* cut hair (outside critical region) */ /* enter critical region */ /* if there are no free chairs, leave */ /* increment count of waiting customers */ /* wake up barber if necessary */ /* release access to ’waiting’ */ /* go to sleep if # of free barbers is 0 */ /* be seated and be serviced */ /* shop is full; do not wait */ Figure 2-21. A solution to the sleeping barber problem. Current process B Next process F D (a) Current process G A F D G A (b) Figure 2-22. Round robin scheduling. (a) The list of runnable processes. (b) The list of runnable processes after B uses up its quantum. B Queue headers Runable processes Priority 4 (Highest priority) Priority 3 Priority 2 Priority 1 (Lowest priority) Figure 2-23. A scheduling algorithm with four priority classes. 8 4 4 4 4 4 4 8 A B C D B C D A (a) (b) Figure 2-24. An example of shortest job first scheduling. a, b, c, d e, f, g, h (a) Processes in main memory Processes on disk e, f, g, h a, b, c, d (b) b, c, f, g a, d, e, h (c) Figure 2-25. A two-level scheduler must move processes between disk and memory and also choose processes to run from among those in memory. Three different instants of time are represented by (a), (b), and (c) . Layer User process 4 Init 3 Memory manager 2 Disk task 1 User process File system Tty task Clock task User process … Network server System task User processes … Ethernet task Server processes … I/O tasks Process management Figure 2-26. MINIX is structured in four layers. Limit of memory Memory available for user programs src/tools/init Init 2383 K 2372 K src/inet/inet (optional) src/fs/fs Inet task File system src/mm/mm Memory manager Read only memory and I/O adapter memory (unavailable to Minix) 2198 K (Depends on number of buffers included in file system) 1077 K 1024 K 640 K Memory available for user programs Ethernet task 129 K (Depends on number of I/O tasks) Printer task Terminal task src/kernel/kernel Memory task Clock task Disk task Kernel Unused Interupt vectors 2 K Start of kernel 1K 0 Figure 2-27. Memory layout after MINIX has been loaded from the disk into memory. #include <minix/config.h> /* MUST be first */ #include <ansi.h> /* MUST be second */ #include <sys/types.h> #include <minix/const.h> #include <minix/type.h> #include <limits.h> #include <errno.h> #include <minix/syslib.h> Figure 2-28. Part of a master header which ensures inclusion of header files needed by all C source files. Type 16-Bit MINIX 32-Bit MINIX gid t 8 8 dev t 16 16 pid t 16 32 ino t 16 32 Figure 2-29. The size, in bits, of some types on 16-bit and 32-bit systems. m_source m_source m_source m_source m_source m_source m_type m_type m_type m_type m_type m_type m1_i1 m2_i1 m3_i1 m4_l1 m5_c2 m5_c1 m6_i1 m5_i1 m1_i2 m2_i2 m3_i2 m4_l2 m6_i2 m5_i2 m1_i3 m2_i3 m3_p1 m4_l3 m6_i3 m5_l1 m1_p1 m2_l1 m4_l4 m6_l1 m5_l2 m1_p2 m2_l2 m3_ca1 m4_l5 m6_f1 m5_l3 m1_p3 m2_p1 Figure 2-30. The six messages types used in MINIX. The sizes of message elements will vary, depending upon the architecture of the machine; this diagram illustrates sizes on a machine with 32-bit pointers, such as the Pentium (Pro). Boo4t 5p6 r7 8 og9 r: a; m boot reco rd & s te r . /0 a p! M 1 bootbl1o2 c3 k a" r# t$ i% t& ' ition io( n t r <= )* Pa Jon K2 LbM N O P lo> a? d@ s ootbQ r titi lR oS cT A Pa lo\ad] s B +, - b ta le Z EF HG rogram for pa rtit io 1 XY CD ot p Bo on 2 rtiti pa (a) VW rogram for ot p Bo [ k U n ock tbl s o Bo load I (b) Figure 2-31. Disk structures used for bootstrapping. (a) Unpartitioned disk. The first sector is the bootblock. (b) Partitioned disk. The first sector is the master boot record. #include <minix/config.h> #if WORD SIZE == 2 #include "mpx88.s" #else #include "mpx386.s" #endif Figure 2-32. How alternative assembly language source files are selected. Interrupt INT CPU s y s^ t e m INTA IRQ 0 (clock) IRQ 1 (keyboard) INT Master interrupt controller IRQ 3 (tty 2) IRQ 4 (tty 1) IRQ 5 (XT Winchester) IRQ 6 (floppy) IRQ 7 (printer) ACK Interrupt ack d a^ t a b u s INT Slave interrupt controller ACK IRQ 8 (real time clock) IRQ 9 (redirected IRQ 2) IRQ 10 IRQ 11 IRQ 12 IRQ 13 (FPU exception) IRQ 14 (AT Winchester) IRQ 15 Figure 2-33. Interrupt processing hardware on a 32-bit Intel PC. Device: Send electrical signal to interrupt controller. Controller: 1. Interrupt CPU. 2. Send digital identification of interrupting device. Caller: 1. Put message pointer and destination of message into CPU registers. 2. Execute software interrupt instruction. Kernel: 1. Save registers. 2. Execute driver software to read I/O device. 3. Send message. 4. Restart a process (not necessarily interrupted process). Kernel: 1. Save registers. 2. Send and/or receive message. 3. Restart a process (not necessarily calling process). (a) (b) Figure 2-34. (a) How a hardware interrupt is processed. (b) How a system call is made. 4 _ 3 P_Sendlink = 0 P_Sendlink = 0 P_Sendlink P_Callerq P_Callerq (a) (b) 2 1 0 Figure 2-35. Queueing of processes trying to send to process 0. Rdy_tail Rdy_head ` 2 g 1 0 f a USER_Q c SERVER_Q TASK_Q d b 3 FS Clock e 5 MM ` 4 f USER_Q 2 SERVER_Q 1 TASK_Q g 0 Figure 2-36. The scheduler maintains three queues, one per priority level. h h Base 24-31 G D 0 Limit 16-19 P DPL S Base 0-15 Type Base 16-23 Limit 0-15 32 Bits Figure 2-37. The format of an Intel segment descriptor. 4 0 Relative address 3 INPUT/OUTPUT 3.1 3.2 3.3 3.4 3.5 3.6 PRINCIPLES OF I/O HARDWARE PRINCIPLES OF I/O SOFTWARE DEADLOCKS OVERVIEW OF I/O IN MINIX BLOCK DEVICES IN MINIX RAM DISKS 3.7 DISKS 3.8 CLOCKS 3.9 TERMINALS 3.10 THE SYSTEM TASK IN MINIX Disk drives Printer Disk controller Printer controller Controller-device interface CPU Memory Other controllers System bus Figure 3-1. A model for connecting the CPU, memory, controllers, and I/O devices. I/O controller I/O address Hardware IRQ Interrupt vector 0 8 Clock 040 – 043 1 9 Keyboard 060 – 063 14 118 Hard disk 1F0 – 1F7 3 11 Secondary RS232 2F8 – 2FF 7 15 Printer 378 – 37F 6 14 Floppy disk 3F0 – 3F7 4 12 Primary RS232 3F8 – 3FF Figure 3-2. Some examples of controllers, their I/O addresses, their hardware interrupt lines, and their interrupt vectors on a typical PC running MS-DOS . CPU Memory Disk controller Drive Buffer DMA registers Count Memory address Count System bus Figure 3-3. A DMA transfer is done entirely by the controller. 7 0 7 0 5 0 6 1 3 4 2 3 5 2 6 1 7 6 4 3 (a) 2 5 (b) 4 1 (c) Figure 3-4. (a) No interleaving. (b) Single interleaving. (c) Double interleaving. Uniform interfacing for device drivers Device naming Device protection Providing a device-independent block size Buffering Storage allocation on block devices Allocating and releasing dedicated devices Error reporting Figure 3-5. Functions of the device-independent I/O software. Layer I/O request User processes Device-independent software I/O reply I/O functions Make I/O call; format I/O; spooling Naming, protection, blocking, buffering, allocation Device drivers Set up device registers; check status Interrupt handlers Wake up driver when I/O completed Hardware Perform I/O operation Figure 3-6. Layers of the I/O system and the main functions of each layer. A S D T (a) B R (b) C (c) Figure 3-7. Resource allocation graphs. (a) Holding a resource. (b) Requesting a resource. (c) Deadlock. U 1. A requests R 2. B requests S 3. C requests T 4. A requests S 5. B requests T 6. C requests R deadlock A Request R Request S Release R Release S B Request S Request T Release S Release T (a) (b) C Request T Request R Release T Release R (c) A B C A B C A B C R S T R S T R S T (d) (e) (f) (g) A B C A B C A B C R S T R S T R S T (h) (i) (j) 1. A requests R 2. C requests T 3. A requests S 4. C requests R 5. A releases R 6. A releases S no deadlock A B C A B C A B C R S T R S T R S T (k) (l) (m) (n) A B C A B C A B C R S T R S T R S T (o) (p) (q) Figure 3-8. An example of how deadlock occurs and how it can be avoided. 1. CD-ROM 2. Printer 3. Plotter 4. Tape drive 5. Robot arm (a) A B i j (b) Figure 3-9. (a) Numerically ordered resources. (b) A resource graph. Condition Approach Mutual exclusion Spool everything Hold and wait Request all resources initially No preemption Take resources away Circular wait Order resources numerically Figure 3-10. Summary of approaches to deadlock prevention. Used Maximum Name Used Maximum Name Used Maximum Name Andy 0 6 Andy 1 6 Andy 1 6 Barbara 0 5 Barbara 1 5 Barbara 2 5 Marvin 0 4 Marvin 2 4 Marvin 2 4 Suzanne 0 7 Suzanne 4 7 Suzanne 4 7 Available: 10 Available: 2 Available: 1 (a) (b) (c) Figure 3-11. Three resource allocation states: (a) Safe. (b) Safe. (c) Unsafe. B Printer u (Both processes finished) I8 I7 I6 t Plotter I5 r s A p I1 q I2 I3 I4 Printer Plotter Figure 3-12. Two process resource trajectories. B 0 3 0 1 C 1 1 D 1 1 E 0 0 1 0 1 0 1 A 0 B 0 0 Pr oc e Ta ss pe Pl driv ot te es P r rs int e CD rs -R OM S Pr oc e Ta ss pe Pl driv ot te es Pr rs int e CD rs -R OM S A C 1 D 0 E Resources assigned 1 1 0 1 3 1 0 2 0 1 0 0 2 0 1 1 1 E = (6342) P = (5322) A = (1020) 0 0 0 Resources still needed Figure 3-13. The banker’s algorithm with multiple resources. Process-structured system Monolithic system Processes A process 1 File system User process User space 4 2 File system 3 Kernel space Device driver 1–4 are request and reply messages between three independent processes. (a) Userspace part Device driver The user-space part calls the kernel-space part by trapping. The file system calls the device driver as a procedure. The entire operating system is part of each process (b) Figure 3-14. Two ways of structuring user-system communication. Requests Field Meaning Type Operation requested m.m type int use m.DEVICE int device to Minor m.PROC NR int Process requesting the I/O m.COUNT int Byte count or ioctl code m.POSITION long Position on device m.ADDRESS char* Address within requesting process Replies Field Type Meaning type Always TASK int REPLY m.m m.REP PROC NR int Same as PROC inrequest NR Bytes m.REP STATUS int transferred or error number Figure 3-15. Fields of the messages sent by the file system to the block device drivers and fields of the replies sent back. message mess; /* message buffer */ void io task() { initialize(); /* only done once, during system init. */ while (TRUE) { receive(ANY, &mess);/* wait for a request for work */ caller = mess.source;/* process from whom message came */ switch(mess.type) { case READ: rcode = dev read(&mess); break; case WRITE: rcode = dev write(&mess); break; /* Other cases go here, including OPEN, CLOSE, and IOCTL */ default: rcode = ERROR; } mess.type = TASK REPLY; mess.status = rcode; /* result code */ send(caller, &mess); /* send reply message back to caller */ } } Figure 3-16. Outline of the main procedure of an I/O task. message mess; /* message buffer */ void shared io task(struct driver table *entry points) { /* initialization is done by each task before calling this */ while (TRUE) { receive(ANY, &mess); caller = mess.source; switch(mess.type) { case READ: rcode = (*entry points >dev read)(&mess); break; case WRITE: rcode = (*entry points >dev write)(&mess); break; /* Other cases go here, including OPEN, CLOSE, and IOCTL */ default: rcode = ERROR; } mess.type = TASK REPLY; mess.status = rcode; /* result code */ send(caller, &mess); } } Figure 3-17. A shared I/O task main procedure using indirect calls. Main Memory (RAM) RAM disk … User programs RAM disk block 1 Read and writes of RAM block 0 use this memory Operating system Figure 3-18. A RAM disk. Parameter IBM 360-KB floppy disk WD 540-MB hard disk Number of cylinders 40 1048 Tracks per cylinder 2 4 Sectors per track 9 252 Sectors per disk 720 1056384 Bytes per sector 512 512 Bytes per disk 368640 540868608 Seek time (adjacent cylinders) 6 msec 4 msec Seek time (average case) 77 msec 11 msec Rotation time 200 msec 13 msec Motor stop/start time 250 msec 9 sec Time to transfer 1 sector 22 msec 53 µsec Figure 3-19. Disk parameters for the original IBM PC 360-KB floppy disk and a Western Digital WD AC2540 540-MB hard disk. Initial position X Time 0 X 5 Pending requests X X 10 X 15 X 20 25 30 X 35 Cylinder Sequence of seeks Figure 3-20. Shortest Seek First (SSF) disk scheduling algorithm. Initial position X Time 0 X 5 X X 10 X 15 X 20 25 30 X 35 Cylinder Sequence of seeks Figure 3-21. The elevator algorithm for scheduling disk requests. Register 0 1 2 3 4 5 6 7 Read Function Data Error Sector Count Sector Number (0-7) Cylinder Low (8-15) Cylinder High (16-23) Select Drive/Head (24-27) Status Write Function Data Write Precompensation Sector Count Number Sector (0-7) Cylinder Low (8-15) Cylinder High (16-23) Select Drive/Head (24-27) Command (a) 7 1 LBA: D: HSn: 6 LBA 5 1 4 3 2 D HS3 HS2 1 HS1 0 HS0 0 = Cylinder/Head/Sector Mode 1 = Logical Block Addressing Mode 0 = master drive 1 = slave drive CHS mode: Head Select in CHS mode LBA mode: Block select bits 24 - 27 (b) Figure 3-22. (a) The control registers of an IDE hard disk controller. The numbers in parentheses are the bits of the logical block address selected by each register in LBA mode. (b) The fields of the Select Drive/Head register. Crystal oscillator Counter is decremented at each pulse Holding register is used to load the counter Figure 3-23. A programmable clock. 64 bits 32 bits 32 bits Time of day in ticks Counter in ticks Time of day in seconds Number of ticks in current second System boot time in seconds (a) (b) Figure 3-24. Three ways to maintain the time of day. (c) Current time Next signal 4200 Clock header 3 4 6 3 2 Figure 3-25. Simulating multiple timers with a single clock. ! 1 X Service Access Response Clients Gettime System call Message Any process Uptime System call Message Any process Uptime Function call Function value Kernel or task Alarm System call Signal Any process Alarm System call Watchdog activation Task Synchronous alarm System call Message Server process Milli" delay Function call Busy wait Kernel or task Milli" elapsed Function call Function value Kernel or task Figure 3-26. The clock code supports a number of timerelated services. Terminals RS-232 interface Memory-mapped interface Character oriented Bit oriented Glass tty Intelligent # terminal Figure 3-27. Terminal types. Network interface X terminal $ CPU % % Video RAM ' card Memory Monitor ( & ' Video controller ) Bus + Analog * video signal (e.g., 16 MHz) Parallel , por t - Keyboard Figure 3-28. Memory-mapped terminals write directly into video RAM. 1 Video RAM RAM address Screen 2 3 4 5 A / BC 6 D 7 0123 6 …×3×2×1×0 …×D×C×B×A 160 characters . (a) / / 25 lines 0×B00A0 0×B0000 0 80 characters . (a) Figure 3-29. (a) A video RAM image for the IBM monochrome display. (b) The corresponding screen. The ×s are attribute bytes. 9 9 CPU Computer : RS-232 interface 8 card Memory : Receive line ; Bus UART Terminal Transmit line UART Figure 3-30. An RS-232 terminal communicates with a computer over a communication line, one bit at a time. The computer and the terminal are completely independent. X terminal Remote host Window Monitor QEWEQ asdsd X client process < QEWEQ asdsd X terminal’s processor QEWEQ asdsd X server process Window manager Mouse Keyboard Network Figure 3-31. Clients and servers in the M.I.T. X Window System. Terminal data structure Terminal data structure Central buffer pool Terminal Terminal 0 1 0 Buffer area for terminal 0 1 Buffer area for terminal 1 2 3 (a) (b) Figure 3-32. (a) Central buffer pool. (b) Dedicated buffer for each terminal. Character POSIX name Comment CTRL-D EOF End of file EOL End of line (undefined) CTRL-H ERASE Backspace one character DEL INTR Interrupt process (SIGINT) CTRL-U KILL Erase entire line being typed CTRL-\ QUIT Force core dump (SIGQUIT) CTRL-Q START Start output CTRL-S STOP Stop output CTRL-R REPRINT Redisplay input (MINIX extension) CTRL-V LNEXT Literal next (MINIX extension) CTRL-O DISCARD Discard output (MINIX extension) CTRL-M CR Carriage return (unchangeable) CTRL-J NL Linefeed (unchangeable) Figure 3-33. Characters that are handled specially in canonical mode. struct termios { tcflag = t c = iflag; tcflag = t c = oflag; tcflag = t c = cflag; tcflag = t c = lflag; speed = t c = ispeed; speed = t c = ospeed; cc = t c = cc[NCCS]; }; /* input modes */ /* output modes */ /* control modes */ /* local modes */ /* input speed */ /* output speed */ /* control characters */ Figure 3-34. The termios structure. In MINIX tc flag t is a short, speed t is an int, cc t is a char. ?> > > > > > > > > > ? > ?> > > > > > > > > > ? > ? MIN = 0 ? ?> > > > > > > > > > ? > ? MIN > 0 ? ? ? ? ? >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > TIME > > > > > => > 0> > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > TIME > > > > > > >> 0> > > > > > > > > > > > > > > > > Return immediately with whatever ? Timer starts immediately. Return with first > is> > available, > > > > > > > > > 0> to > > > N> > bytes > > > > > > > > > > > > > > > ?? > byte > > > > >entered > > > > > > > or > > > with > > > > > 0> bytes > > > > > > after > > > > timeout > > > > > >> > Return with at least MIN and up to ? Interbyte timer starts after first byte. Return N bytes. Possible indefinite block. ? N bytes if received by timeout, or at least > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > 1> > byte > > > > at > > >timeout. > > > > > > > > Possible > > > > > > > > indefinite > > > > > > > > > block > > >> > Figure 3-35. MIN and TIME determine when a call to read returns in noncanonical mode. N is the number of bytes requested. ? ? ? ? ? ? ? Escape sequence Meaning ESC [ n A Move up n lines ESC [ n B Move down n lines ESC [ n C Move right n spaces ESC [ n D Move left n spaces ESC [ m ; n H Move cursor to (m,n) ESC [ s J Clear screen from cursor (0 to end, 1 from start, 2 all) ESC [ s K Clear line from cursor (0 to end, 1 from start, 2 all) ESC [ n L Insert n lines at cursor ESC [ n M Delete n lines at cursor ESC [ n P Delete n chars at cursor ESC [ n @ Insert n chars at cursor ESC [ n m Enable rendition n (0=normal, 4=bold, 5=blinking, 7=reverse) ESC M Scroll the screen backward if the cursor is on the top line Figure 3-36. The ANSI escape sequences accepted by the terminal driver on output. ESC denotes the ASCII escape character (0x1B), and n, m, and s are optional numeric parameters. User @ 1 8 A 6 A 6 … FS B 2 D 3 E 7 TTY C 4 F … TTY Interrupt 5 Clock Interrupt Figure 3-37. Read request from terminal when no characters are pending. FS is the file system. TTY is the terminal task. The interrupt handler for the terminal queues characters as they are entered, but it is the clock interrupt handler that awakens TTY. tty_task Receive message from user via FS Receive message from clock do_read handle_events in_transfer handle_events kb_read in_transfer Other functions kb_read in_transfer Other functions Figure 3-38. Input handling in the terminal driver. The left branch of the tree is taken to process a request to read characters. The right branch is taken when a character-has-been-typed message is sent to the driver. tty_task do_write handle_events cons_write out_char “Easy” characters Escape Special sequences characters pause_escape End of line scroll_screen flush Figure 3-39. Major procedures used on terminal output. The dashed line indicates characters copied directly to ramqueue by cons write. Field Meaning c " start Start of video memory for this console c " limit Limit of video memory for this console c " column Current column (0-79) with 0 at left c " row Current row (0-24) with 0 at top c " cur Offset into video RAM for cursor c " org Location in RAM pointed to by 6845 base register Figure 3-40. Fields of the console structure that relate to the current screen position. ?> > > > > > > > > > > > > ? > ? ? code ? ?> > Scan > > > > > > > > > > > ? ?> ? ? 00 ?> > > > > > > > > > > > > ? > ? ? 01 ? ? ?> > > > > > > > > > > > > ? > ? ? ?> > > > > > 02 > > > > > > > ?> ? ? ? ? 13 ?> > > > > > > > > > > > > ? > ? ? ?> > > > > > 16 > > > > > > > ?? > ? ? ? 28 ?> > > > > > > > > > > > > ? > ? ? 29 ? ? ?> > > > > > > > > > > > > ? > ? ? ?> > > > > > 59 > > > > > > > ?> ? ? ? ? 127 >> > >> > > > > > > > > > > > > > > > > > > > >?> ? Character ? > > > > > > > > > > >?> ? none > > > > > > > > > > >?> ? ESC ? > > > > > > > > > > >?> ? ’1’ > > > > > > > > > > >?> ? ? ’=’ > > > > > > > > > > >?> ? ’q’ > > > > > > > > > > > ?? > ? CR/LF > > > > > > > > > > >?> ? CTRL ? > > > > > > > > > > >?> ? F1 > > > > > > > > > > >?> ? ? ??? > > > > > > > > > > > > > > > > > > > > > ?> ? Regular ? > > > > > > > > > ?> ? 0 > > > > > > > > > ?> ? C(’[’) ? > > > > > > > > > ?> ? ’1’ > > > > > > > > > ?> ? ? ’=’ > > > > > > > > > ?> ? L(’q’) ? > > > > > > > > > ?> C(’M’) ? > > > > > > > > > ?> ? CTRL ? > > > > > > > > > ?> ? F1 > > > > > > > > > ?> ? ? 0 > >> > > > > >> > > > > > > > > >? > ? SHIFT > > > > > > > > ?? > ? 0 > > > > > > > >? > ? C(’[’) ? > > > > > > > >? > ? ’!’ > > > > > > > >? > ? ? ’+’ > > > > > > > >? > ? ’Q’ > > > > > > > > ?? > C(’M’) ? > > > > > > > >? > ? CTRL ? > > > > > > > >? > ? SF1 > > > > > > > >? > ? ? 0 > > > > > > > > > > > > > > > > > >? > ? ALT1 > > > > > > > > > ?? > ? 0 > > > > > > > > >? > ? CA(’[’) ? > > > > > > > > >? > A(’1’) ? > > > > > > > > >? > ? A(’=’) ? > > > > > > > > >? > ? A(’q’) ? > > > > > > > > >? > CA(’M’) ? > > > > > > > > >? > ? CTRL ? > > > > > > > > >? > ? AF1 > > > > > > > > >? > ? ? 0 > > > > > > >> > > > > > > > > > > ?> ? ALT2 > > > > > > > > ??> ? 0 > > > > > > > > ?> ? CA(’[’) ? > > > > > > > > ?> A(’1’) ? > > > > > > > > ?> ? A(’=’) ? > > > > > > > > ?> ? A(’q’) ? > > > > > > > > ?> CA(’M’) ? > > > > > > > > ?> ? CTRL ? > > > > > > > > ?> ? AF1 > > > > > > > > ?> ? ? 0 > > >> > > > > > > > > > > > > > > > > > > ?> ? ALT+SHIFT > > > > > > > > > > > > > ?? > ? 0 > > > > > > > > > > > > > ?> ? CA(’[’) ? > > > > > > > > > > > > > ?> ? A(’!’) > > > > > > > > > > > > > ?> ? ? A(’+’) > > > > > > > > > > > > > ?> ? A(’Q’) > > > > > > > > > > > > > ?? > ? CA(’M’) > > > > > > > > > > > > > ?> ? CTRL ? > > > > > > > > > > > > > ?> ? ASF1 > > > > > > > > > > > > > ?> ? ? 0 > > > > > >> > > > > >> > Figure 3-41. A few entries from a keymap source file. > > > >> > >? ? > > > > > > > ?? CTRL ? 0 > > > >> > >? ? ? C(’[’) > > > >> > >? ? C(’A’) > > > >> > >? ? ? C(’@’) > > > >> > >? C(’Q’) ? C(’J’) ? > > > > > > > ?? > > > >> > >? ? ? CTRL > > > >> > >? ? CF1 > > > >> > >? 0 > > > >> > > ? ? Field Default values c" iflag BRKINT ICRNL IXON IXANY c" oflag OPOST ONLCR c" cflag CREAD CS8 HUPCL c" lflag ISIG IEXTEN ICANON ECHO ECHOE Figure 3-42. Default termios flag values. POSIX function POSIX operation IOCTL type IOCTL parameter tcdrain (none) TCDRAIN (none) tcflow TCOOFF TCFLOW int=TCOOFF tcflow TCOON TCFLOW int=TCOON tcflow TCIOFF TCFLOW int=TCIOFF tcflow TCION TCFLOW int=TCION tcflush TCIFLUSH TCFLSH int=TCIFLUSH tcflush TCOFLUSH TCFLSH int=TCOFLUSH tcflush TCIOFLUSH TCFLSH int=TCIOFLUSH tcgetattr (none) TCGETS termios tcsetattr TCSANOW TCSETS termios tcsetattr TCSADRAIN TCSETSW termios tcsetattr TCSAFLUSH TCSETSF termios tcsendbreak (none) TCSBRK int=duration Figure 3-43. POSIX calls and IOCTL operations. 0 V D N c c c c 7 6 5 4 3 2 1 0 V: IN " ESC, escaped by LNEXT (CTRL-V) D: N: IN " EOF, end of file (CTRL-D) IN " EOT, line break (NL and others) cccc: 7: count of characters echoed Bit 7, may be zeroed if ISTRIP is set 6-0: Bits 0-6, ASCII code Figure 3-44. The fields in a character code as it is placed into the input queue. ?> > > > > ? > > > > > ?? ? 42 ? 35 > > > > > > > > > > ?> > > > > > > ? > > > > ?? ? 170 ? 18 > > > > > > > > > > > >?> > > > ?? 38 > > > > > > ?> > > > > ?? 38 > > > > > > ?> > > > > > ? > > > > ?? ? 24 ? 57 > > > > > > > > > > >? > > > > ?? 54 > > >> > > ?> > > > > ?? 17 > > > >> > ?> > > > > > > ? > > > > ?? ? 182 ? 24 > > > >> > > > > >> >?> > > > > ?? 19 > > > > >> ? > > > > > ?> > > > > ?? ? 38 ? 32 > > > > >> > > > > Figure 3-45. Scan codes in the input buffer, with corresponding key presses below, for a line of text entered at the keyboard. L+, L-, R+, and R- represent, respectively, pressing and releasing the left and right Shift keys. The code for a key release is 128 more than the code for a press of the same key. >? > > > > ? ?? ? 28 ? >> > > > Key Scan code ‘‘ASCII’’ Escape sequence Home 71 0x101 ESC [ H Up Arrow 72 0x103 ESC [ A Pg Up 73 0x107 ESC [ V − 74 0x10A ESC [ S Left Arrow 75 0x105 ESC [ D 5 76 0x109 ESC [ G Right Arrow 77 0x106 ESC [ C + 78 0x10B ESC [ T End 79 0x102 ESC [ Y Down Arrow 80 0x104 ESC [ B Pg Dn 81 0x108 ESC [ U Ins 82 0x10C ESC [ @ Figure 3-46. Escape codes generated by the numeric keypad. When scan codes for ordinary keys are translated into ASCII codes the special keys are assigned ‘‘pseudo ASCII’’ codes with values greater than 0xFF. Key Purpose F1 Display process table F2 Display details of process memory use F3 Toggle between hardware and software scrolling F5 Show Ethernet statistics (if network support compiled) CF7 Send SIGQUIT, same effect as CTRL-\ CF8 Send SIGINT, same effect as DEL CF9 Send SIGKILL, same effect as CTRL-U Figure 3-47. The function keys detected by func key(). c_esc_state = 1 numeric or “;” “[” ESC Not “[” c_esc_state = 0 Not ESC call do_escape call do_escape c_esc_state = 2 Not numeric or “;” collect numeric parameters Figure 3-48. Finite state machine for processing escape sequences. Registers Function 10 – 11 Cursor size 12 – 13 Start address for drawing screen 14 – 15 Cursor position Figure 3-49. Some of the 6845’s registers. Message type From Meaning SYS " FORK MM A process has forked SYS " NEWMAP MM Install memory map for a new process SYS " GETMAP MM MM wants memory map of a process SYS " EXEC MM Set stack pointer after EXEC call SYS " XIT MM A process has exited SYS " TIMES FS FS wants a process’ execution times SYS " ABORT Both Panic: MINIX is unable to continue SYS " SENDSIG MM Send a signal to a process SYS " SIGRETURN MM Cleanup after completion of a signal. SYS " KILL FS Send signal to a process after KILL call SYS " ENDSIG MM Cleanup after a signal from the kernel SYS " COPY Both Copy data between processes SYS " VCOPY Both Copy multiple blocks of data between processes SYS " GBOOT FS Get boot parameters SYS " MEM MM MM wants next free chunk of physical memory SYS " UMAP FS Convert virtual address to physical address SYS " TRACE MM Carry out an operation of the PTRACE call Figure 3-50. The message types accepted by the system task. User User 1 1 7 FS 2 4 4 FS 5 Disk task 2 6 System task Disk task 3 System task 3 Interrupt Interrupt (a) (b) Figure 3-51. (a) Worst case for reading a block requires seven messages. (b) Best case for reading a block requires four messages. 4 MEMORY MANAGEMENT 4.1 4.2 4.3 4.4 4.5 4.6 BASIC MEMORY MANAGEMENT SWAPPING VIRTUAL MEMORY PAGE REPLACEMENT ALGORITHMS DESIGN ISSUES FOR PAGING SYSTEMS SEGMENTATION 4.7 OVERVIEW OF MEMORY MANAGEMENT IN MINIX 4.8 IMPLEMENTATION OF MEMORY MANAGEMENT IN MINIX 0xFFF … Device drivers in ROM Operating system in ROM User program User program User program Operating system in RAM Operating system in RAM 0 (a) 0 (b) 0 (c) Figure 4-1. Three simple ways of organizing memory with an operating system and one user process. Other possibilities also exist. Multiple input queues Partition 4 Partition 4 700K Single input queue Partition 3 Partition 3 400K Partition 2 Partition 2 Partition 1 Operating system (a) Partition 1 100K 0 Operating system (b) Figure 4-2. (a) Fixed memory partitions with separate input queues for each partition. (b) Fixed memory partitions with a single input queue. Time B C C C B B B C C E A A A D D D Operating system Operating system Operating system Operating system Operating system Operating system Operating system (a) (b) (c) (d) (e) (f) (g) Figure 4-3. Memory allocation changes as processes come into memory and leave it. The shaded regions are unused memory. B-Stack Room for growth Room for growth B B-Data Actually in use B-Program A-Stack Room for growth Room for growth A-Data A Actually in use A-Program Operating system Operating system (a) (b) Figure 4-4. (a) Allocating space for a growing data segment. (b) Allocating space for a growing stack and a growing data segment. A B C 8 D 16 E 24 (a) 11111000 P 0 5 H 5 3 P 8 6 P 14 4 P 26 3 H 29 3 11111111 11001111 11111000 H 18 2 Hole Starts Length at 18 2 (b) P 20 6 Process (c) Figure 4-5. (a) A part of memory with five processes and three holes. The tick marks show the memory allocation units. The shaded regions (0 in the bit map) are free. (b) The corresponding bit map. (c) The same information as a list. X Before X terminates (a) A X (b) A X (c) (d) X X B After X terminates becomes A A becomes B becomes becomes B B Figure 4-6. Four neighbor combinations for the terminating process, X. CPU card The CPU sends virtual addresses to the MMU CPU Memory management unit Memory Disk controller The MMU sends physical addresses to the memory Figure 4-7. The position and function of the MMU. Bus Virtual address space 60K-64K x 56K-60K X 52K-56K X 48K-52K X 44K-48K 7 40K-44K X 36K-40K 5 32K-36K 28K-32K 24K-28K X X 28K-32K X 24K-28K 3 20K-24K 4 16K-20K 16K-20K 8K-12K Physical memory address 20K-24K 12K-16K Virtual page 12K-16K 0 6 4K-8K 1 0K-4K 2 8K-12K 4K-8K 0K-4K Page frame Figure 4-8. The relation between virtual addresses and physical memory addresses is given by the page table. 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 Page table 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 000 000 000 000 111 000 101 000 000 000 011 100 000 110 001 010 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 Outgoing physical address (24580) 12-bit offset copied directly from input to output 110 Present/ absent bit Virtual page = 2 is used as an index into the page table 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 Incoming virtual address (8196) Figure 4-9. The internal operation of the MMU with 16 4K pages. Second-level page tables Page table for the top 4M of memory Top-level page table 1023 Bits 10 PT1 10 12 PT2 Offset (a) 6 5 4 3 2 1 0 1023 6 5 4 3 2 1 0 To pages (b) Figure 4-10. (a) A 32-bit address with two page table fields. (b) Two-level page tables. Caching disabled Modified Present/absent Page frame number Referenced Protection Figure 4-11. A typical page table entry. Valid Virtual page Modified Protection Page frame 140 1 31 1 RW 20 0 38 1 R X 130 1 29 1 RW 129 1 62 1 RW 19 0 50 1 R X 21 0 45 1 R X 860 1 14 1 RW 861 1 75 1 RW Figure 4-12. A TLB to speed up paging. Page loaded first 0 3 7 8 12 14 15 18 A B C D E F G H Most recently loaded page (a) 3 7 8 12 14 15 18 20 B C D E F G H A A is treated like a newly loaded page (b) Figure 4-13. Operation of second chance. (a) Pages sorted in FIFO order. (b) Page list if a page fault occurs at time 20 and A has its R bit set. A B L K C J D I E H When a page fault occurs, the page the hand is pointing to is inspected. The action taken depends on the R bit: R = 0: Evict the page R = 1: Clear R and advance hand F G Figure 4-14. The clock page replacement algorithm. 0 Page 1 2 3 0 Page 1 2 3 0 Page 1 2 3 0 Page 1 2 3 0 Page 1 2 3 0 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 1 1 0 0 1 1 0 0 0 1 0 0 0 2 0 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 1 1 0 1 3 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 (b) (a) (c) (d) (e) 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 0 0 1 0 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 1 1 0 (f) (g) (h) (i) Figure 4-15. LRU using a matrix. (j) R bits for pages 0-5, clock tick 0 R bits for pages 0-5, clock tick 1 R bits for pages 0-5, clock tick 2 R bits for pages 0-5, clock tick 3 R bits for pages 0-5, clock tick 4 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 0 0 0 1 0 0 1 1 0 0 0 0 10000000 11000000 11100000 11110000 01111000 1 00000000 10000000 11000000 01100000 10110000 2 10000000 01000000 00100000 00100000 10001000 3 00000000 00000000 10000000 01000000 00100000 4 10000000 11000000 01100000 10110000 01011000 5 10000000 01000000 10100000 01010000 00101000 (a) (b) (c) (d) (e) Page Figure 4-16. The aging algorithm simulates LRU in software. Shown are six pages for five clock ticks. The five clock ticks are represented by (a) to (e). A0 A1 A2 A3 A4 A5 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 (a) Age 10 7 5 4 6 3 9 4 6 2 5 6 12 3 5 6 A0 A1 A2 A3 A4 A6 B0 B1 B2 B3 B4 B5 B6 C1 C2 C3 A0 A1 A2 A3 A4 A5 B0 B1 B2 A6 B4 B5 B6 C1 C2 C3 (b) (c) Figure 4-17. Local versus global page replacement. (a) Original configuration. (b) Local page replacement. (c) Global page replacement. Page faults/sec A B Number of page frames assigned Figure 4-18. Page fault rate as a function of the number of page frames assigned. # Virtual address space ) Call stack Free & ! Address space allocated to the ( parse tree ' Parse tree ) ! " Space currently being used by the parse tree Constant table Source text ! ! Symbol table $ Symbol table has bumped into the % source text table Figure 4-19. In a one-dimensional address space with growing tables, one table may bump into another. 20K 16K 16K 12K 12K 12K 8K 8K 12K Symbol table 8K 4K 4K 0K 0K Segment 0 Source text 4K 0K Segment 1 Parse tree Constants Segment 2 8K 4K 0K Call stack 0K Segment 3 Segment 4 Figure 4-20. A segmented memory allows each table to grow or shrink independently of the other tables. Paging Consideration Segmentation Need the programmer be aware that this technique is being used? No Yes How many linear address spaces are there? 1 Many Can the total address space exceed the size of physical memory? Yes Yes Can procedures and data be distinguished and separately protected? No Yes Can tables whose size fluctuates be accommodated easily? No Yes Is sharing of procedures between users facilitated? No Yes Why was this technique invented? To get a large linear address space without having to buy more physical memory To allow programs and data to be broken up into logically independent address spaces and to aid sharing and protection Figure 4-21. Comparison of paging and segmentation. * * * * Segment 4 (7K) * Segment 4 (7K) * * (3K) (3K) Segment 5 (4K) * Segment 5 (4K) (10K) (4K) Segment 3 (8K) Segment 3 (8K) Segment 3 (8K) Segment 2 * (5K) Segment 2 (5K) Segment 2 * (5K) Segment 2 (5K) (3K) (3K) (3K) Segment 2 (5K) Segment 7 (5K) Segment 7 (5K) Segment 7 (5K) Segment 7 (5K) Segment 0 (4K) Segment 0 (4K) Segment 0 (4K) Segment 0 (4K) Segment 0 (4K) (a) (b) (c) (d) (e) Segment 1 (8K) Segment 6 (4K) Segment 5 (4K) Segment 6 (4K) Figure 4-22. (a)-(d) Development of checkerboarding. (e) Removal of the checkerboarding by compaction. 36 bits Page 2 entry Page 1 entry Segment 6 descriptor Page 0 entry Segment 5 descriptor Page table for segment 3 Segment 4 descriptor Segment 3 descriptor Segment 2 descriptor Segment 1 descriptor Page 2 entry Segment 0 descriptor Page 1 entry Descriptor segment Page 0 entry Page table for segment 1 + (a) 18 9 Main memory address of the page table Segment length (in pages) 1 1 1 3 3 Page size: 0 = 1024 words 1 = 64 words 0 = segment is paged 1 = segment is not paged Miscellaneous bits Protection bits (b) Figure 4-23. The MULTICS virtual memory. (a) The descriptor segment points to the page tables. (b) A segment descriptor. The numbers are the field lengths. Address within the segment Segment number Page number Offset within the page 18 6 10 Figure 4-24. A 34-bit MULTICS virtual address. MULTICS virtual address Page number Segment number Offset Word Descriptor Segment number Descriptor segment Page frame Page number Offset Page table Page Figure 4-25. Conversion of a two-part MULTICS address into a main memory address. , . Is this 0 entry used? Compar ison / field Segment number Virtual page Page frame 4 1 7 Read/write 13 1 2 Read only 10 1 1 Read/write 1 2 12 4 0 6 3 3 Protection Age 4 2 2 1 2 1 2 2 2 0 Execute only 12 Execute only 5 0 7 1 9 1 Figure 4-26. A simplified version of the MULTICS TLB. The existence of two page sizes makes the actual TLB more complicated. Bits 13 1 2 Index 6 0 = GDT/1 = LDT Privilege level (0-3) Figure 4-27. A Pentium selector. 7 0: Segment is absent from memory 1: Segment is present in memory 0: 16-Bit segment 1: 32-Bit segment 0: Li is in bytes 1: Li is in pages Base 24-31 Privilege level (0-3) 0: System 1: Application Segment type and protection 7 G D 0 Limit 16-19 P DPL S Base 0-15 Type Limit 0-15 32 Bits Base 16-23 4 0 Relative address Figure 4-28. Pentium code segment descriptor. Data segments differ slightly. : 8 Selector Offset Descriptor + Base address Limit 8 Other fields 9 32-Bit linear address Figure 4-29. Conversion of a (selector, offset) pair to a linear address. Bits 10 Linear address 10 12 Dir Page Offset (a) Page directory Page table Page frame Word selected 1024 Entries Off Dir Page Directory entry points to page table Page table entry points to word (b) Figure 4-30. Mapping of a linear address onto a physical address. F G rog User p rams C d libraDrieE e r a s Sh = Typical uses of the levels ? @ stem calAlB s y S > Kernel ; 0 1 2 < 3 Level Figure 4-31. Protection on the Pentium. K K N Upper memory O limit K K K K K P H 0 A’s child P (a) H K P (b) B J A MINIX I C B J A MINIX I L B J M J K H A MINIX I (c) Figure 4-32. Memory allocation. (a) Originally. (b) After a SY FORK. (c) After the child does an EXEC . The shaded regions are unused memory. The process is a common I&D one. Q Q Stack Stack segment grows downward Symbols Size of disk file Data Text Total memory use Gap Data + bss Header Text (a) (b) Data segment grows upward (or downward) when BRK calls are made. Figure 4-33. (a) A program as stored in a disk file. (b) Internal memory layout for a single process. In both parts of the figure the lowest disk or memory address is at the bottom and the highest address is at the top. SR R R R R R R R R R R R R R R R S Message type SR R R R R R R R R R R R R R R R S FORK SR R R R R R R R R R R R R R R R SR R EXIT S R R R R R R R RR R R R R R SR R WAIT R R R R R R R RR R R R R R S WAITPID SR R R R R R R R R R R R R R R R S BRK SR R R R R R R R R R R R R R R R S EXEC SR R R R R R R R R R R R R R R R SR R KILL S R R R R R R R RR R R R R R SR R ALARM R R R R R R R RR R R R R R S PAUSE SR R R R R R R R R R R R R R R R S SIGACTION SR R R R R R R R R R R R R R R R S SIGSUSPEND SR R R R R R R R R R R R R R R R SR R SIGPENDING S R R R R R R R RR R R R R R SR R SIGMASK R R R R R R R RR R R R R R S SIGRETURN SR R R R R R R R R R R R R R R R S GETUID SR R R R R R R R R R R R R R R R S GETGID SR R R R R R R R R R R R R R R R SR R GETPID S R R R R R R R RR R R R R R SR R SETUID R R R R R R R RR R R R R R S SETGID SR R R R R R R R R R R R R R R R S SETSID SR R R R R R R R R R R R R R R R S GETPGRP SR R R R R R R R R R R R R R R R SR R PTRACE S R R R R R R R RR R R R R R SR R REBOOT R R R R R R R RR R R R R R S KSIG RR R R R R R R R RR R R R R R S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S S Input parameters Reply value S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S (none) S Child’s pid, (to child: 0) S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R status S R R Exit S (No reply if successful) S R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R (none) R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R Status R R R R R R R R R R R R R R R R R R R R R S (none) S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S New size S New size S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S Pointer to initial stack S (No reply if successful) S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R identifier and signal S R R Process S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R Number R R R R R R R R ofR R seconds R R R R R R R R toR R wait R R R R R R R R RS R Residual R R R R R R R R time R R R R R R R R R R R R R S (none) S (No reply if successful) S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S Sig. number, action, old action S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S Signal mask S (No reply if successful) S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R (none) S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R How, R R R R R set, R R R R old R R R set R R R R R R R R R R R R R R R R RS R Status R R R R R R R R R R R R R R R R R R R R R S Context S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S (none) S Uid, effective uid S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S (none) S Gid, effective gid S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R (none) S Pid, parent pid S R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R New R R R R Ruid R R R R R R R R R R R R R R R R R R R R R R R RS R Status R R R R R R R R R R R R R R R R R R R R R S New gid S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S New sid S Process group S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S New gid S Process group S R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R pid, address, data S R R Request, S Status S R R R R R R R R R R R R R R R R R R R R R R R R R R R R RS R R R R R R R R R R R R R R R R R R R R R R S R R How R R R R R(halt, R R R R Rreboot, R R R R R R or R R R panic) R R R R R R R R R RS R (No R R R R reply R R R R R ifR successful) R R R R R R R R R R R S Process slot and signals S (No reply) R R R R R RR R R R R R R R R R R RR R R R R RR R R R R RR R R R R R R R R R R R R R R R R R R R R R R S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S S Figure 4-34. The message types, input parameters, and reply values used for communicating with the memory manager. X T [ Y U Y Z V Stack 210K (0x34800) V V V 0x340 V 0 Text 208K (0x34000) U 207K (0x33c00) V 0x20 Data Stack \ Virtual Physical Length Address (hex) V 0x320 V 0 0x8 0x1c V 0x320 W 0 (b) Data ] U [ 203K (0x32c00) Y Text U 200K (0x32000) Stack (a) V \ V 0x14 V V 0 V 0x340 V 0 Data Text W Z Virtual Physical Length V 0x32c V 0x10 0x320 W 0x8 (c) Figure 4-35. (a) A process in memory. (b) Its memory representation for nonseparate I and D space. (c) Its memory representation for separate I and D space. V 0xc ^ 0x3dc00 Stack (proc 2) ^ Gap 0x3d400 0x3d000 Data (proc 2) Virtual Physical Length ^ 0x3c000 0x34800 ^ Virtual Physical Length Stack 0x14 0x340 0 0x32c 0x10 Text 0 0x320 0xc (a) Gap 0x8 Data Process 1 Stack (proc 1) 0x34000 0x33c00 Stack 0x14 0x3d4 0x8 Data 0 0x3c0 0x10 Text 0 0x320 0xc Process 2 (c) Data (proc 1) 0x32c00 Text (shared) 0x32000 (b) Figure 4-36. (a) The memory map of a separate I and D space process, as in the previous figure. (b) The layout in memory after a second process starts, executing the same program image with shared text. (c) The memory map of the second process. `_ ______________________________________________________ ` 1. Check to see if process table is full. `_ ______________________________________________________ ` 2. Try to allocate memory for the child’s data and stack. `_ ______________________________________________________ ` 3. Copy the parent’s data and stack to the child’s memory. `_ ______________________________________________________ ` 4. Find a free process slot and copy parent’s slot to it. `_ ______________________________________________________ ` 5. Enter child’s memory map in process table. `_ ______________________________________________________ ` 6. Choose a pid for the child. `_ ______________________________________________________ ` 7. Tell kernel and file system about child. `_ ______________________________________________________ ` 8. Report child’s memory map to kernel. `_ ______________________________________________________ ` 9. Send reply messages to parent and child. _ ______________________________________________________ ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Figure 4-37. The steps required to carry out the FORK system call. `_ ____________________________________________________________ ` 1. Check permissions—is the file executable? `_ ____________________________________________________________ ` 2. Read the header to get the segment and total sizes. `_ ____________________________________________________________ ` 3. Fetch the arguments and environment from the caller. `_ ____________________________________________________________ ` 4. Allocate new memory and release unneeded old memory. `_ ____________________________________________________________ ` 5. Copy stack to new memory image. ` _ ____________________________________________________________ ` 6. Copy data (and possibly text) segment to new memory image. ` _ ____________________________________________________________ ` 7. Check for and handle setuid, setgid bits. `_ ____________________________________________________________ ` 8. Fix up process table entry. `_ ____________________________________________________________ ` 9. Tell kernel that process is now runnable. _ ____________________________________________________________ ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Figure 4-38. The steps required to carry out the EXEC system call. a \0 t s a 52 / r s u 48 / = E M 44 b Environment array b 0 \0 t s a 8188 / r s u 8184 / = E M 8180 b b a a \0 t s a 8188 / r s u 8184 / = E M 8180 b b a O H \0 c 40 O H \0 c 8176 O H \0 c 8176 . \0 c 36 . \0 c 8172 . \0 c 8172 \0 l 32 . \0 l 8168 . \0 l 8168 s l 28 – s l 8164 – s l 8164 . HOME = /usr/ast a a – a a g a f \0 a a g a f \0 a a g a f \0 0 24 0 8160 0 8160 42 20 8178 8156 8178 8156 0 16 0 8152 0 8152 Argument array 38 12 8174 8148 8174 8148 34 8 8170 8144 8170 8144 0 31 4 8167 8140 8167 8140 28 0 8164 8136 8164 8136 g.c (a) f.c envp 8156 8132 –l argv 8136 8128 ls argc 4 8124 return 8120 (b) (c) (d) Figure 4-39. (a) The arrays passed to execve. (b) The stack built by execve. (c) The stack after relocation by the memory manager. (d) The stack as it appears to main at the start of execution. push push push call c main push call c exit hlt ecx! push environ edx! push argv eax! push argc ! main(argc, argv, envp) eax! push exit status ! force a trap if exit fails Figure 4-40. The key part of the C run-time, start-off routine. SR R R R R R R R R R R R S R S Signal S SR R R R R R R R R R R R S R S SIGHUP S SR R R R R R R R R R R R S R SR R SIGINT S S R R R R R R R R R R SR SR R SIGQUIT R R R R R R R R R R SR S SIGILL S SR R R R R R R R R R R R S R S SIGTRAP S SR R R R R R R R R R R R S R S SIGABRT S SR R R R R R R R R R R R S R SR R SIGFPE S S R R R R R R R R R R SR SR R SIGKILL R R R R R R R R R R SR S SIGUSR1 S SR R R R R R R R R R R R S R S SIGSEGV S SR R R R R R R R R R R R S R SR R SIGUSR2 S S R R R R R R R R R R SR SR R SIGPIPE S S R R R R R R R R R R SR SR R SIGALRM R R R R R R R R R R SR S SIGTERM S SR R R R R R R R R R R R S R S SIGCHLD S SR R R R R R R R R R R R S R SR R SIGCONT S S R R R R R R R R R R SR SR R SIGSTOP S S R R R R R R R R R R SR SR R SIGTSTP R R R R R R R R R R SR S SIGTTIN S SR R R R R R R R R R R R S R S SIGTTOU S RR R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR S R R R R R R R R R R R R Description R R R R R R R R R R R R R R R R R R R R R R R SR S R Hangup R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR R Interrupt R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SS R R Quit R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR S R Illegal R R R R R R instruction R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR S R Trace R R R R R R trap R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR S R Abnormal R R R R R R R R R termination R R R R R R R R R R R R R R R R R R R R R R R R R SR R Floating R R R R R R R R point R R R R R exception R R R R R R R R R R R R R R R R R R R R R SS R R Kill R R R (cannot R R R R R R R be R R R caught R R R R R R R or R R ignored) R R R R R R R R R R R R SR S User-defined signal # 1 R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SR S R Segmentation R R R R R R R R R R R R R violation R R R R R R R R R R R R R R R R R R R R R SR R User R R R R R defined R R R R R R R signal R R R R R R #R 2R R R R R R R R R R R R R R R SS R R Write R R R R R on R R R aR R pipe R R R R with R R R R no R R R one R R R R toR R read R R R R R itR R SS R R Alarm R R R R R R clock, R R R R R timeout R R R R R R R R R R R R R R R R R R R R R R R SR S R Software R R R R R R R R termination R R R R R R R R R R R signal R R R R R R from R R R R kill R R R R R SR S R Child R R R R R process R R R R R R R R terminated R R R R R R R R R R orR R stopped R R R R R R R R R SR R Continue R R R R R R R R R ifR stopped R R R R R R R R R R R R R R R R R R R R R R R R SS R R Stop R R R R R signal R R R R R R R R R R R R R R R R R R R R R R R R R R R R R SS R R Interactive R R R R R R R R R R stop R R R R signal R R R R R R R R R R R R R R R R R R R R SR S Background process R R R R R R R R R R R R R R R R R R R R wants R R R R R R toR R read R R R R R R R SR S R Background R R R R R R R R R R R process R R R R R R R R wants R R R R R R toR R write R R R R RR R R R R RR R R R R RR R R R R R R R Generated R R R R R R R R R R by R R R R S S R S S R KILL R R R R Rsystem R R R R R R call R R RR S Kernel R R R R R R R R R R R R R R R R SS R Kernel R RR R R R R RR R R R R RR S S R Kernel R R R R R R (*) R RR R R R R RR S S R Kernel R R R R R R (M) R RR R R R R RR S S Kernel R R RR R R R R RR R R R R RR S R Kernel R R R R R R (*) R R R R R R R R R SS R KILL R R R R Rsystem R R R R R R call R R RR S S Not supported R R RR R R R R RR R R R R RR S S R Kernel R R R R R R (*) R RR R R R R RR S R Not R R R Rsupported R R R R R R R R R R R SS R Kernel R R R R R R R R R R R R R R R SS R Kernel R RR R R R R RR R R R R RR S S R KILL R R R R Rsystem R R R R R R call R R RR S S R Not R R R Rsupported R R R RR R R R R RR S R Not R R R Rsupported R R R R R R R R R R R SS R Not R R R Rsupported R R R R R R R R R R R SS R Not R R R Rsupported R R R RR R R R R RR S S Not supported R R RR R R R R RR R R R R RR S S R Not R R R Rsupported R R R RR R R R R RR Figure 4-41. Signals defined by POSIX and MINIX . Signals indicated by (*) depend upon hardware support. Signals marked (M) are not defined by POSIX , but are defined by MINIX for compatibility with older programs. Several obsolete names and synonyms are not listed here. Ret addr Ret addr Ret addr Ret addr Local vars (process) Local vars (process) Local vars (process) Local vars (process) Stackframe (CPU regs) (original) Stackframe (CPU regs) (original) Ret addr 2 Ret addr 2 Sigframe structure Local vars (Sigreturn) Stack Ret addr 1 Local vars (handler) Stackframe (CPU regs) (original) Stackframe (CPU regs) (modified, ip = handler) Stackframe (CPU regs) (modified, ip = handler) Stackframe (CPU regs) (original) Before Handler executing Sigreturn executing Back to normal (a) (b) (c) (d) Process table Figure 4-42. A process’ stack (above) and its stackframe in the process table (below) corresponding to phases in handling a signal. (a) State as process is taken out of execution. (b) State as handler begins execution. (c) State while SIGRETURN is executing. (d) State after SIGRETURN completes execution. Waiting d INIT 6 e 7 INIT 8 g d Waiting 6 7 e 8 f 52 f 12 f 52 (a) Exiting 53 h Zombie (b) Figure 4-43. (a) The situation as process 12 is about to exit. (b) The situation after it has exited. f 53 `_ __________________` _________________________________________ ` System call ` Purpose `_ __________________` _________________________________________ ` SIGACTION ` Modify response to future signal `_ __________________` _________________________________________ ` SIGPROCMASK ` Change set of blocked signals `_ __________________` _________________________________________ ` KILL ` Send signal to another process `_ __________________` _________________________________________ ` ALARM ` Send ALRM signal to self after delay `_ __________________` _________________________________________ ` PAUSE ` Suspend self until future signal `_ __________________` _________________________________________ ` SIGSUSPEND ` Change set of blocked signals, then PAUSE `_ __________________` _________________________________________ ` SIGPENDING ` Examine set of pending (blocked) signals `_ __________________` _________________________________________ ` SIGRETURN ` Clean up after signal handler _ ___________________________________________________________ Figure 4-44. System calls relating to signals. ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Layer r i j Handler for ALRM User q k 1 s m l 4 Clock 7 3 MM 2 m3 v 4 o p 8 5 6 t n 9 u 2 System Process management Figure 4-45. Messages for an alarm. The most important are: (1) User does ALARM. (4) After the set time has elapsed, the signal arrives. (7) Handler terminates with call to SIGRETURN. See text for details. q 1 Processor registers (64 bytes) Mask Sigcontext structure Sigframe structure Flags Pointer to sigcontext Return address Frame pointer Pointer to sigcontext Code (floating point) Signal number w Address of sigreturn Figure 4-46. The sigcontext and sigframe structures pushed on the stack to prepare for a signal handler. The processor registers are a copy of the stackframe used during a context switch. `_ ______________` ___________________________________ ` System Call ` Description `_ ______________` ___________________________________ ` GETUID ` Return real and effective uid `_ ______________` ___________________________________ ` GETGID ` Return real and effective gid `_ ______________` ___________________________________ ` GETPID ` Return pids of process and its parent `_ ______________` ___________________________________ ` SETUID ` Set caller’s real and effective uid `_ ______________` ___________________________________ ` SETGID ` Set caller’s real and effective gid `_ ______________` ___________________________________ ` SETSID ` Create new session, return pid `_ ______________` ___________________________________ ` GETPGRP ` Return ID of process group _ _________________________________________________ ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Figure 4-47. The system calls supported in mm/getset.c. `_ _______________` _______________________________________ ` Command ` Description `_ _______________` _______________________________________ ` T x STOP ` Stop the process `_ _______________` _______________________________________ ` T x OK ` Enable tracing by parent for this process `_ _______________` _______________________________________ ` T x GETINS ` Return value from text (instruction) space `_ _______________` _______________________________________ ` T x GETDATA ` Return value from data space `_ _______________` _______________________________________ ` T x GETUSER ` Return value from user process table `_ _______________` _______________________________________ ` T x SETINS ` Set value in instruction space `_ _______________` _______________________________________ ` T x SETDATA ` Set value in data space `_ _______________` _______________________________________ ` T x SETUSER ` Set value in user process table `_ _______________` _______________________________________ ` T x RESUME ` Resume execution `_ _______________` _______________________________________ ` T x EXIT ` Exit `_ _______________` _______________________________________ ` T x STEP ` Set trace bit _ ______________________________________________________ ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` Figure 4-48. Debugging commands supported by mm/trace.c. 5 FILE SYSTEMS 5.1 5.2 5.3 5.4 5.5 5.6 FILES DIRECTORIES FILE SYSTEM IMPLEMENTATION SECURITY PROTECTION MECHANISMS OVERVIEW OF THE MINIX FILE SYSTEM 5.7 IMPLEMENTATION OF THE MINIX FILE SYSTEM Extension Meaning file.bak Backup file file.c C source program file.f77 Fortran 77 program Compuserve Graphical Interchange Format image file.gif file.hlp Help file file.html World Wide Web HyperText Markup Language document Movie encoded with the MPEG standard file.mpg file.o Object file (compiler output, not yet linked) file.ps PostScript file file.tex Input for the TEX formatting program file.txt General text file file.zip Compressed archive Figure 5-1. Some typical file extensions. 1 Byte 1 Record Cat Cow Dog Hen (a) (b) Ibis Ant Fox Pig Goat Lion Owl Pony Rat Lamb (c) Figure 5-2. Three kinds of files. (a) Byte sequence. (b) Record sequence. (c) Tree. Worm 16 Bits Magic number Module name Header Text size Header Data size BSS size Date Symbol table size Object module Owner Protection Entry point Size Flags Header Text Object module Data Header Relocation bits Symbol table (a) Object module (b) Figure 5-3. (a) An executable file. (b) An archive. Field Protection Password Creator Owner Read-only flag Hidden flag System flag Archive flag ASCII/binary flag Random access flag Temporary flag Lock flags Record length Key position Key length Creation time Time of last access Time of last change Current size Maximum size Meaning Who can access the file and in what way Password needed to access the file Id of the person who created the file Current owner 0 for read/write; 1 for read only 0 for normal; 1 for do not display in listings 0 for normal files; 1 for system file 0 for has been backed up; 1 for needs to be backed up 0 for ASCII file; 1 for binary file 0 for sequential access only; 1 for random access 0 for normal; 1 for delete file on process exit 0 for unlocked; nonzero for locked Number of bytes in a record Offset of the key within each record Number of bytes in the key field Date and time the file was created Date and time the file was last accessed Date and time the file has last changed Number of bytes in the file Number of bytes the file may grow to Figure 5-4. Some possible file attributes. games attributes games mail attributes mail news attributes news work attributes (a) work (b) Data structure containing the attributes Figure 5-5. (a) Attributes in the directory entry. (b) Attributes elsewhere. Directory Root directory A A B File Root directory C A A B A B C C C Files User directory C Root directory A A B B C B B B C C C C User subdirectories C (a) (b) C (c) Figure 5-6. Three file system designs. (a) Single directory shared by all users. (b) One directory per user. (c) Arbitrary tree per user. The letters indicate the directory or file’s owner. C C / bin Root directory etc lib usr tmp bin etc lib tmp usr ast jim ast Figure 5-7. A UNIX directory tree. jim /usr/jim File A Physical block File block File 0 block 1 4 7 File block File block File block 2 10 12 2 3 4 File B File block 0 Physical block 6 File block 1 3 0 File block 2 11 0 File block 3 14 Figure 5-8. Storing a file as a linked list of disk blocks. Physical block 0 1 2 10 3 11 4 7 File A starts here 3 File B starts here 5 6 7 2 8 9 10 12 11 14 12 0 13 14 15 0 Unused block Figure 5-9. Linked list allocation using a table in main memory. I-node Disk addresses Attributes Single indirect block Addresses of data blocks Double indirect block Triple indirect block Figure 5-10. An i-node. Bytes 1 8 3 1 2 16 File name User code File type (extension) Extent Block count Disk block numbers Figure 5-11. A directory entry that contains the disk block numbers for each file. Bytes 8 3 1 10 2 2 File name Extension Attributes 2 4 Size Reserved Time Date Figure 5-12. The MS-DOS directory entry. First block number Bytes 2 14 File name I-node number Figure 5-13. A UNIX directory entry. Root directory 1 . .. 4 bin 7 dev 1 I-node 6 is for /usr Mode size times 132 Block 132 is /usr directory 6 1 30 Block 406 is /usr/ast directory 26 Mode size times 6 64 grants 92 books jim 60 mbox dick 19 I-node 26 is for /usr/ast erik 406 14 lib 51 9 etc 26 ast 81 minix 6 usr 45 bal 17 src 8 tmp Looking up usr yields i-node 6 I-node 6 says that /usr is in block 132 /usr/ast is i-node 26 I-node 26 says that /usr/ast is in block 406 Figure 5-14. The steps in looking up /usr/ast/mbox. /usr/ast/mbox is i-node 60 100 150 75 100 50 25 50 Data rate 0 0 128 256 512 1K Block size 2K 4K 8K Figure 5-15. The solid curve (left-hand scale) gives the data rate of a disk. The dashed curve (right-hand scale) gives the disk space efficiency. All files are 1K. Disk space utilization (in percent) Data rate (Kbytes/sec) Disk space utilization 200 Free disk blocks: 16, 17, 18 42 230 86 1001101101101100 136 162 234 0110110111110111 210 612 897 1010110110110110 97 342 422 0110110110111011 41 214 140 1110111011101111 63 160 223 1101101010001111 21 664 223 0000111011010111 48 216 160 1011101101101111 262 320 126 1100100011101111 310 180 142 0111011101110111 516 482 141 1101111101110111 A 1K disk block can hold 256 32-bit disk block numbers A bit map (a) (b) Figure 5-16. (a) Storing the free list on a linked list. (b) A bit map. Disk 0 Disk 1 Backup of Data 1 Backup Data 0 Data 0 Data 1 CPU Figure 5-17. Backing up each drive on the other one wastes half the storage. Block number Block number 0 1 2 3 4 5 6 7 8 9 101112131415 0 1 2 3 4 5 6 7 8 9 101112131415 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 0 Blocks in use 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 0 Blocks in use 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 Free blocks 0 0 0 0 1 0 0 0 0 1 1 0 0 0 1 1 Free blocks (a) (b) 0 1 2 3 4 5 6 7 8 9 101112131415 0 1 2 3 4 5 6 7 8 9 101112131415 1 1 0 1 0 1 1 1 1 0 0 1 1 1 0 0 Blocks in use 1 1 0 1 0 2 1 1 1 0 0 1 1 1 0 0 Blocks in use 0 0 1 0 2 0 0 0 0 1 1 0 0 0 1 1 Free blocks 0 0 1 0 1 0 0 0 0 1 1 0 0 0 1 1 Free blocks (c) (d) Figure 5-18. File system states. (a) Consistent. (b) Missing block. (c) Duplicate block in free list. (d) Duplicate data block. I-nodes are located near the start of the disk Disk is divided into cylinder groups, each with its own i-nodes Cylinder group (a) (b) Figure 5-19. (a) I-nodes placed at the start of the disk. (b) Disk divided into cylinder groups, each with its own blocks and i-nodes. First page (in memory) F A B A A A A A A A A A A A A A A (a) (b) A Second page (not in memory) Page boundary Figure 5-20. The TENEX password problem. (c) ! Spring Pressure plate Figure 5-21. A device for measuring finger length. " Domain 1 File1[ R ] File2 [ RW ] Domain 2 " " File3 [ R ] File4 [ RW X ] File5 [ RW ] # Domain 3 " File6 [ RW X ] Printer1 [ W ] Figure 5-22. Three protection domains. Plotter2 [ W ] Object File1 File2 Read Read Write File3 File4 File5 Read Read Write Execute Read Write File6 Printer1 Plotter2 Domain 1 2 3 Write Read Write Execute Figure 5-23. A protection matrix. Write Write Object File1 File2 Read Read Write File3 File4 File5 File6 Printer1 Plotter2 Domain1 Domain2 Domain3 Domain 1 2 3 Enter Read Read Write Execute Read Write Write Read Write Execute Write Write Figure 5-24. A protection matrix with domains as objects. &$ $%$%$%&$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & # & Type & Rights & Object &$ $%$%$%&$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & 0 & File & R – – & Pointer to File3 &$ $%$%$%&$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & 1 & File & RWX & Pointer to File4 &$ $%$%$%&$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & 2 & File & RW– & Pointer to File5 &$ $%$%$%&$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & 3 & Pointer & –W– & Pointer to Printer1 $ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & & & & & & & & & Figure 5-25. The capability list for domain 2 in Fig. 5-23. Client Server Collaborator Encapsulated server Kernel Kernel (a) (b) Covert channel Figure 5-26. (a) The client, server, and collaborator processes. (b) The encapsulated server can still leak to the collaborator via covert channels. Messages from users ACCESS CHDIR CHMOD CHOWN CHROOT CLOSE CREAT DUP FCNTL FSTAT IOCTL LINK LSEEK MKDIR MKNOD MOUNT OPEN PIPE READ RENAME RMDIR STAT STIME SYNC TIME TIMES UMASK UMOUNT UNLINK UTIME WRITE Messages from MM EXEC EXIT FORK SETGID SETSID SETUID Other messages REVIVE UNPAUSE Input parameters Reply value File name, access mode Status Name of new working directory Status File name, new mode Status File name, new owner, group Status Name of new root directory Status File descriptor of file to close Status Name of file to be created, mode File descriptor File descriptor (for dup2, two fds) New file descriptor File descriptor, function code, arg Depends on function Name of file, buffer Status File descriptor, function code, arg Status Name of file to link to, name of link Status File descriptor, offset, whence New position File name, mode Status Name of dir or special, mode, address Status Special file, where to mount, ro flag Status Name of file to open, r/w flag File descriptor Pointer to 2 file descriptors (modified) Status File descriptor, buffer, how many bytes # Bytes read File name, file name Status File name Status File name, status buffer Status Pointer to current time Status (None) Always OK Pointer to place where current time goes Status Pointer to buffer for process and child times Status Complement of mode mask Always OK Name of special file to unmount Status Name of file to unlink Status File name, file times Always OK File descriptor, buffer, how many bytes # Bytes written Input parameters Reply value Pid Status Pid Status Parent pid, child pid Status Pid, real and effective gid Status Pid Status Pid, real and effective uid Status Input parameters Reply value Process to revive (No reply) Process to check (See text) Figure 5-27. File system messages. ' ( Boot block ) Super ( block * I-nodes + One disk block , (I-nodes bit map Data ( Zone bit map Figure 5-28. Disk layout for the simplest disk: a 360K floppy disk, with 128 i-nodes and a 1K block size (i.e., two consecutive 512-byte sectors are treated as a single block). Number of nodes Number of zones (V1) Number of i-node bit map blocks Number of zone bit map blocks Present on disk and in memory First data zone Log2 (block/zone) Maximum file size Magic number Padding Number of zones (V2) Pointer to i-node for root of mounted file system Pointer to i-node mounted upon I-nodes/block Present in memory but not on disk Device number Read-only flag Big-endian FS flag FS version Direct zones/i-node Indirect zones/indirect block First free bit in i-node bit map First free bit in zone bit map Figure 5-29. The MINIX super-block. 16 bits Mode File type and rwx bits Number of links Directory entries for this file Uid Identifies user who owns file Gid Owner’s group File size Number of bytes in the file -Access time Modification time Times are all in seconds since Jan 1, 1970 Status change time Zone 0 Zone 1 64 bytes Zone 2 Zone 3 Zone 4 Zone numbers for the first seven data zones in the file Zone 5 Zone 6 Indirect zone Used for files larger than 7 zones Double indirect zone Unused (Could be used for triple indirect zone) Figure 5-30. The MINIX i-node. Hash table Front (LRU) Figure 5-31. The linked lists used by the block cache. Rear (MRU) Unmounted file system / / /bin /usr /bal After mounting / /jim / . /ast /lib /bin /usr … /lib Root file system /ast/f1 /ast/f2 /usr/bal /usr/ast /usr/ast/f2 (a) (b) (c) Figure 5-32. (a) Root file system. (b) An unmounted file system. (c) The result of mounting the file system of (b) on /usr. 0 Flip table Process 1 table 0 2 I-node 1 table File position I-node pointer Parent 3 Child Figure 5-33. How file positions are shared between a parent and a child. &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & Procedure & & Function &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & get 4 block & Fetch a block for reading or writing & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & put 4 block & Return a block previously requested with get 4 block & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & alloc 4 zone & Allocate a new zone (to make a file longer) & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & free 4 zone & Release a zone (when a file is removed) & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & rw 4 block & Transfer a block between disk and cache & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & invalidate & Purge all the cache blocks for some device & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & flushall & Flush all dirty blocks for one device & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & rw 4 scattered & Read or write scattered data from or to a device & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & rm 4 lru & Remove a block from its LRU chain & $ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ Figure 5-34. Procedures used for block management. &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & Procedure & & Function &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & get 4 inode & Fetch an i-node into memory & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & put 4 inode & Return an i-node that is no longer needed & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & alloc 4 inode & Allocate a new i-node (for a new file) & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & wipe 4 inode & Clear some fields in an i-node & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & free 4 inode & Release an i-node (when a file is removed) & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & update 4 times & Update time fields in an i-node & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & rw 4 inode & Transfer an i-node between memory and disk & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & old 4 icopy & Convert i-node contents to write to V1 disk i-node & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & new 4 icopy & Convert data read from V1 file system disk i-node & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & dup 4 inode & Indicate that someone else is using an i-node & $ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ Figure 5-35. Procedures used for i-node management. &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & Procedure & & Function &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & alloc 4 bit & Allocate a bit from the zone or i-node map & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & free 4 bit & Free a bit in the zone or i-node map & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & get 4 super & Search the super-block table for a device & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & mounted & Report whether given i-node is on a mounted (or root) f.s. & &$ $%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & read 4 super & Read a super-block & $ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ Figure 5-36. Procedures used to manage the super-block and bit maps. &$ $%$%$%$%$%$%$%$%$%$%$%$%$%&$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & Operation & Meaning &$ $%$%$%$%$%$%$%$%$%$%$%$%$%&$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETLK & Lock region for both reading and writing &$ $%$%$%$%$%$%$%$%$%$%$%$%$%&$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETLKW & Lock region for writing &$ $%$%$%$%$%$%$%$%$%$%$%$%$%&$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 GETLK & Report if region is locked $ $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & & & & & & & Figure 5-37. The POSIX advisory record locking operations. These operations are requested by using an FCNTL system call. 0 k … … Front n Rear NULL NULL NULL (a) Front k NULL NULL … … 0 Rear n NULL (b) Rear Front NULL 0 k n … … NULL NULL (c) Figure 5-38. Block cache initialization. (a) Before any buffers have been used. (b) After one block has been requested. (c) After the block has been released. 5 Byte number 5 0 1 2 3 4 5 6 7 8 5 Block 9 10 11 12 13 14 15 Block Current position = 1 Chunk = 6 Current position = 6 5 Chunk = 2 Current position = 9 Chunk = 1 Figure 5-39. Three examples of how the first chunk size is determined for a 10-byte file. The block size is 8 bytes, and the number of bytes requested is 6. The chunk is shown shaded. 16 Entry points do_read do_write read_write Main procedure for reading/writing dev_io pipe_check Special files Pipes read_map Look up disk address rw_chunk rahead rd_indir get _block Get indirect block address (if necessary) rw_block Read or write one block sys_copy put _block Transfer from FS to user Return block to cache Search the cache dev_io rw_dev Listed in dmap table sendrec Sends message to the kernel Figure 5-40. Some of the procedures involved in reading a file. (a) 24 (b) 24 25 (c) 24 25 40 (d) 24 25 40 41 (e) 24 25 40 41 62 (f) Free zones: 12 20 31 36 … Block number 24 25 40 41 62 63 Figure 5-41. (a) - (f) The successive allocation of 1K blocks with a 2K zone. eat_path last _dir Loops on path components Get final directory Convert path to i-node advance Process one component search_dir get_inode read_map get_block put _block Look up disk address Find block in cache Return block to cache advance Load i-node Figure 5-42. Some of the procedures used in looking up path names. &$ %$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & Call & Function &$ %$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & UTIME & Set a file’s time of last modification &$ %$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & TIME & Set the current real time in seconds &$ %$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & STIME & Set the real time clock &$ %$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & TIMES & Get the process accounting times $ %$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & & & & & & & & & Figure 5-43. The four system calls involving time. &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & Operation & Meaning &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 DUPFD & Duplicate a file descriptor &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 GETFD & Get the close-on-exec flag &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETFD & Set the close-on-exec flag &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 GETFL & Get file status flags &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETFL & Set file status flags &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 GETLK & Get lock status of a file &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETLK & Set read/write lock on a file &$ %$%$%$%$%$%$%$%$%$%$%$%$%$%& $%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & F 4 SETLKW & Set write lock on a file $ %$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$%$ & & & & & & & & & & & & & & & & & & Figure 5-44. The POSIX request parameters for the FCNTL system call.