Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Go (programming language) wikipedia , lookup
Stream processing wikipedia , lookup
Reactive programming wikipedia , lookup
Supercomputer architecture wikipedia , lookup
Computer file wikipedia , lookup
One-pass compiler wikipedia , lookup
Structured programming wikipedia , lookup
Interpreter (computing) wikipedia , lookup
Dynamic-link library wikipedia , lookup
CS252: Systems Programming Ninghui Li Based on Slides by Prof. Gustavo Rodriguez-Rivera Topic 1: Introduction to the Course and Program Structure General Information Web Page: http://www.cs.purdue.edu/homes/cs252 Office: LWSN2142K Office hours: Tentative: Tuesdays and Fridays between 1:30pm and 2:30pm. Talk with me after class, drop by for short questions, or email to make appointment E-mail: [email protected] Textbooks No required textbook. Primary source is lecture notes Recommended (not required): Introduction to Systems Programming: a Hands-on Approach by by Gustavo A. Junipero Rodriguez-Rivera and Justin Ennen https://www.cs.purdue.edu/homes/grr/SystemsProgrammingBook/ Operating Systems: Three Easy Pieces http://pages.cs.wisc.edu/~remzi/OSTEP/ The Linux Programming Interface: A Linux and UNIX System Programming Handbook Useful for some projects, and good as a reference book How Linux Works (2nd Edition) by Brian Ward The Linux Command Line by William E. Shotts, Jr. Course Communications Use Piazza for questions/answers, announcements, etc. Students have been added to Piazza Use Blackboard Learn for grade distribution Labs There is no lab the first week. The projects will be explained in the lab sessions. E-mail administrative questions to [email protected] Post project questions on Piazza TAs office hours will be posted in the web page. Grading Grade allocation Midterm: 18% (Mon Mar 6, 8pm to 9:30pm, PHYS 114, PHYS 112) Final: 24% Projects/labs: 48% Six labs, weighted approximately by #of weeks Attendance as measured by clicker quizs: 10% Exams also include questions about the labs. Policy Regarding Cheating • Cheating defined as • Copy your answers or code for the lab or exam from another source other than provided by the course • Consequence: receives −𝒙 in the lab, where x is the original grade not considering impact of cheating • Give your code to others to copy • 𝒙 𝟐 Consequence: Receives in the lab • Every cheating case will be referred to Dean of Students and recorded • What happened in the past? What is Systems Programming? • How do computers differ from other machines? • Computers are general-purpose machines that can be programmed to do many, many things • System programming is about implementing software systems to support programming, i.e., how to provide abstraction and virtualization. Roles of This Course • Provide introduction to the following core CS topics • • • • • • • Compilers (CS 352) Programming language (CS 456) Operating systems (CS 354) Networking (CS 422) Security (CS 426) Information systems (CS 348) Theory of computing (CS 483) Roles of This Course • Learn what to expect in upper-division CS courses • Ability to complete (which includes debug) large programming projects independently Course Organization: Part 1: C Programming Review • • • • Process of Compiling, Assembling, Linking, Loading, Runtime Linker, Static and Shared Libraries; Address space. Structure of a Program. Review of pointers, arrays, pointers to functions, memory management Debugging using GDB. Stack frames and buffer-overflow vulnerabilities Lab 1 (2 weeks): Implementing memory allocation Course Organization: Part 2: Language Interpreter • • • • Learn a simple function programming language that uses only non-negative integers, and has 5 keywords: • inc, dec, ifz, define, halt Learn a simple fragment of LISP/Scheme Learn concept of regular expressions, finite state automata, grammars Learn to use Lex and YACC to create parsers Lab 2 (2 weeks): Extend a language interpreter for a simple functional language. Course Organization: Part 3: Unix Systems Programming I • • • • • Unix history & file system fundamentals Common Unix utilities Unix Shell scripting Structure of a Unix Shell Useful system calls: file creation, read, write, close, file mode, IO redirection, pipes, fork, wait, waitpid, signals, etc. Lab 3 (3 weeks): Writing a Unix shell. Course Organization: Part 4: Unix Systems Programming II • • • • • Introduction to operating systems OS kernel fundamentals: user mode, kernel mode, interrupts, system calls Processes, scheduling, Programming with threads, thread creation. Race Conditions, Mutex locks. Other synchronization methods Lab 4 (1 week): Introduction to Threads. Course Organization: Part 5: Unix Network Programming • • • Internet overview: ARP, IP, DNS, TCP, UDP, NAT Socket Programming. Server architecture Lab 5 (3 weeks): Implementing a Web Server Course Organization: Part 6: Miscellaneous topics • More UNIX Systems Programming • Introduction to SQL • Concept of Turing Machines, Computability, Deterministic and Nondeterministic Turing Machines • Concept of P=NP Lab 6 (Team Project, 3 weeks): Implement a nondeterministic Turing Machine Problem of the Week • Questions taken from the books • Cracking the Coding Interview: 150 Programming Questions and Solutions • Elements of Programming Interviews • Not officially part of the grade equation From Source Files to Programs and Processes What Happens From a C Source Program to Program Execution Building a program, i.e, generating an executable file from source code What are the steps? What does an executable file look like? Loading a program Each time when you execute a program, a process is created. In Unix/Linux, use “ps” command to show processes in the system Building a Program The programmer writes a program hello.c The preprocessor expands #define, #include, #ifdef etc preprocessor statements and generates a hello.i file. The compiler compiles hello.i, optimizes it and generates an assembly instruction listing hello.s The assembler (as) assembles hello.s and generates an object file hello.o The compiler (cc or gcc) by default hides all these intermediate steps. You can use compiler options to run each step independently. Original file hello.c #include <stdio.h> main() { printf("Hello\n"); } After preprocessor gcc -E hello.c > hello.i (-E stops compiler after running preprocessor) hello.i: /* Expanded /usr/include/stdio.h */ typedef void *__va_list; typedef struct __FILE __FILE; typedef int ssize_t; struct FILE {…}; extern int fprintf(FILE *, const char *, ...); extern int fscanf(FILE *, const char *, ...); extern int printf(const char *, ...); /* and more */ main() { printf("Hello\n"); } After compiling gcc -S hello.c (-S stops compiler after generating assembly code) Resulting file is hello.s Actual code depends on the system LC0: .ascii "Hello\0" .text .globl _main .def _main; .scl _main: pushl movl andl subl call movl call leave ret %ebp %esp, %ebp $-16, %esp $16, %esp ___main $LC0, (%esp) _puts 2; .type 32; .endef After compiling & assembling “gcc -c hello.c” generates hello.o The main function already has a value in the object file hello.o hello.o has undefined symbols, like the _puts function call that we don’t know where it is placed. The command “nm” can lists the symbols from object files Output of “nm hello.o” 0000000000000000 b .bss 0000000000000000 d .data 0000000000000000 t .text U __main 0000000000000000 T main U puts // uninitilized data // Global and static vars // entry point of program // main function defined in code __main and puts are undefined in “hello.o” They are provided by the libraries Building a program (continued) The linker puts together all object files as well as the object files in static libraries. The linker also takes the definitions in shared libraries and verifies that the symbols (functions and variables) needed by the program are completely satisfied. If there is symbol that is not defined in either the executable or shared libraries, the linker will give an error. Static libraries (.a files) are added to the executable. shared libraries (.so files) are not added to the executable file. Static and Shared Libraries Shared libraries are shared across different processes. There is only one instance of each shared library for the entire system. Static libraries are not shared. There is an instance of an static library for each process. After linking “gcc –o hello hello.c” generates the hello executable The hello.o object code is statically linked with libraries to include code of library functions In linking, “static = compilation/building time” Sometimes, not all functions’ code are included, some code are stored in shared libraries and dynamically linked. In linking, “dynamic = loading/execution time” Building a Program hello.c hello.i C Preprocessor Editor Programmer Optimizer hello.s hello.o Assembler (as) Compiler (cc) (static) Linker (ld) Other .o files Static libraries (.a files) They add to the size of the executable. Executable File (hello) What is a program? A program is a file in a special format that contains all the necessary information to load an application into memory and make it run. A program file includes: machine instructions initialized data List of library dependencies List of memory sections that the program will use List of undefined values in the executable that will be known until the program is loaded into memory. Executable File Formats There are different executable file formats ELF – Executable Link File It is used in most UNIX systems (Solaris, Linux) Can use elfdump to see information in binary file COFF – Common Object File Format It is used in Windows systems a.out – Used in BSD (Berkeley Standard Distribution) and early UNIX It was very restrictive. It is not used anymore. Note: BSD UNIX and AT&T UNIX are the predecessors of the modern UNIX flavors like Solaris and Linux. Loading a Program After one types ./hello in a shell, the shell creates a new process and load the file hello. The loader is a program that is used to run an executable file in a process. Before the program starts running, the loader allocates space for all the sections of the executable file (text, data, bss etc) It loads into memory the executable and shared libraries (if not loaded yet) Loading a Program It also writes (resolves) any values in the executable to point to the functions/variables in the shared libraries.(E.g. calls to printf in hello.c) Once memory image is ready, the loader jumps to the _start entry point that calls init() of all libraries and initializes static constructors. Then it calls main() and the program begins. _start also calls exit() when main() returns. The loader is also called “runtime linker”. Loading a Program Executable File Loader (runtime linker) (/usr/lib/ld.so.1) Shared libraries (.so, .dll) Executable in memory Static and Shared Libraries Shared libraries are shared across different processes. There is only one instance of each shared library for the entire system. Static libraries are not shared. There is an instance of an static library for each process. Static and Dynamic Static – Events that happen during program building. Example: Static linker Dynamic – Events that happen while program is running. Also called “Runtime”. Example: Dynamic linker Memory Structure of a Process Memory of a Program A program sees memory as an array of bytes that goes from address 0 to 264-1 (0x0000000000000000-0xFFFFFFFFFFFFFFFF) That is assuming a 64-bit architecture. 264-1 0xFFFFFFFFFFFFFFFF 0 0x0000000000000000 Memory Sections The memory is organized into sections called “memory mappings”. 264-1 Stack Shared Libs Heap 0 Bss Data Text Memory Sections Each section has different permissions: read/write/execute or a combination of them. Text- Instructions that the program runs Data – Initialized global variables. Bss – Uninitialized global variables. They are initialized to zeroes. Heap – Memory returned when calling malloc/new. It grows upwards. Stack – It stores local variables and return addresses. It grows downwards. Memory Sections Dynamic libraries – They are libraries shared with other processes. Each dynamic library has its own text, data, and bss. Each program has its own view of the memory that is independent of each other. Virtual memory, mapped by OS to physical memory This view is called the “Address Space” of the program. If a process modifies a byte in its own address space, it will not modify the address space of another process. Where things are located Program hello.c int a = 5; // Stored in int b[20]; // Stored in int main() { // Stored in int x; // Stored in int *p =(int*) malloc(sizeof(int)); } data section bss text stack //In heap Memory Gaps Between each memory section there may be gaps that do not have any memory mapping. If the program tries to access a memory gap, the OS will send a SEGV signal that by default kills the program and dumps a core file. The core file contains the value of the variables global and local at the time of the SEGV. The core file can be used for “post mortem” debugging. gdb program-name core gdb> where Review Steps of building a program Static vs. shared library Static vs. dynamic linking Memory structure of a process: text, data, stack, heap where different variables occur Upcoming • Topic 2: Memory Management