* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download threads
Structured programming wikipedia , lookup
Functional programming wikipedia , lookup
Falcon (programming language) wikipedia , lookup
C Sharp (programming language) wikipedia , lookup
Go (programming language) wikipedia , lookup
Monad (functional programming) wikipedia , lookup
Corecursion wikipedia , lookup
Parallel computing wikipedia , lookup
Abstraction (computer science) wikipedia , lookup
Application Interface Specification wikipedia , lookup
Join-pattern wikipedia , lookup
Scheduling (computing) wikipedia , lookup
Combining Events and Threads for Scalable Network Services Peng Li and Steve Zdancewic University of Pennsylvania PLDI 2007, San Diego 1 Overview A Haskell framework for massively concurrent network applications Servers, P2P systems, load generators Massive concurrency ::= 1,000 threads? (easy) A lazy, purely functional language | 10,000 programming threads? (common) | 100,000http://www.haskell.org threads? (challenging) | 1,000,000 threads? (20 years later?) | 10,000,000 threads? (in 15 minutes) How to write such programs? The very first decision to make: the programming model Shall we use threads or events? 2 Threads vs. Events The event-driven model: The multithreaded model One thread ↔ 10000 clients Asynchronous I/O Scheduling: programmer One thread ↔ one client Synchronous I/O Scheduling: OS/runtime libs “Why threads a bad int send_data(int fd1, are int fd2) { idea (for most purposes)” while (!EOF(fd1)) { [USENIX ATC 1999] size = read_chunk(fd, buf, count); write_chunk(fd, buf, size); } events are a bad idea while(1)“Why { (for high-concurrency servers)” nfds=epoll_wait(kdpfd, events, MAXEVT,-1); for(n=0; n<nfds; ++n) 2003] [HotOS handle_event(events[n]); … … Threads Events Expressiveness and Abstraction (for programming each client) Synchronous I/O + intuitive Finite state machines / control flow primitives Continuation-passing style (CPS) programming Flexibility and Control (for resource scheduling) Baked into OS/runtime, difficult to customize Programmer has complete control – tailored to each application’s needs 3 Can we get the best of both worlds? Programming with each client: threads • Synchronous I/O • Intuitive control-flow primitives One application program The bridge between threads/events? (some kind of “continuation” support) Resource scheduling: events •Written as part of the application • Tailored to application’s needs 4 Roads to lightweight, application-level concurrency Direct language support for continuations: Good if you have them Source-to-source CPS translations Requires hacking on compiler/runtime Often not very elegant Other solutions? (no language support) (no compiler/runtime hacks) 5 The poor man’s concurrency monad “A poor man’s concurrency monad” by Koen Claessen, JFP 1999. (Functional Pearl) The thread interface: The CPS monad The event interface: A lazy, tree-like data structure called “trace” Internal Representation Multithreaded code Scheduler code SYS_NBIO(accept) CPS Monad Event Abstraction SYS_EPOLL_WAIT(s) Thread Abstraction server_loop s = do { sock <- sock_accept s; sys_fork (client sock); server_loop; } client_loop sock = do { sock_send sock data; sock_close sock; } sock_send sock data = do { ... n<-sys_nbio (write_nb ...); ...; sys_epoll_wait sock EPOLL_READ; ... foo; ... n<-sys_nbio (write_nb ...); ... } Trace SYS_NBIO(accept) SYS_FORK SYS_NBIO(write_nb) SYS_EPOLL_WAIT(sock) SYS_NBIO(write_nb) scheduler = do { ... trace <- fetch_thread; execute trace; ... } execute trace = case trace of SYS_NBIO c -> do { cont <- c; execute cont; } SYS_FORK t1 t2 -> ... 6 Questions on the poor man’s approach Does it work for high-performance network services? (using a pure, lazy, functional language?) How does the design scale up to real systems? Symmetrical multiprocessing? Synchronization? I/O? How cheap is it? How much does a poor man’s thread cost? How poor is it? Does it offer acceptable performance? 7 Our experiment A high-performance Haskell framework for massivelyconcurrent network services!!! Supported features: Linux Asynchronous IO (AIO) epoll() and nonblocking IO OS thread pools SMP support Thread synchronization primitives Applications developed IO benchmarks on FIFO pipes / Disk head scheduling A simple web server for static files HTTP load generator Prototype of an application-level TCP stack We used the Glasglow Haskell Compiler (GHC) 8 Multithreaded code example Nested function calls Exception handling Conditional branches Synchronous call to I/O lib Recursion 9 Event-driven code example An event loop running in a separate OS thread A wrapper function to the C library call using the Haskell Foreign Function Interface (FFI) Put events in queues for processing in other OS threads 10 A complete event-driven I/O One “virtual subsystem processor” event loop for each ready_queue CPU System call completion, thread ready to run Fetch threads OS thread pool for CPS thread execution worker_main OS thread pool for blocking I/O Context switch / fork worker_blio blio_queue Fetch thread worker_main worker_main worker_main Haskell Foreign Function Inteface (FFI) worker_blio SYS_BLIO Register event handler Submit AIO request with event handler Epoll interface AIO interface Event notification AIO completion worker_epoll worker_aio Each event loop runs in a separate OS thread 11 Modular and customizable I/O system (add a TCP stack if you like) ready_queue System call completion, thread ready to run Fetch threads OS thread pool for CPS thread execution worker_main OS thread pool for blocking I/O Context switch / fork worker_blio blio_queue SYS_BLIO Fetch thread worker_main Define / interpret Register TCPevent handler worker_main syscalls (22 lines) Epoll interface Event Event notification worker_blio worker_epoll loop for incoming packets (7 lines) worker_main Submit AIO request with event handler AIO interface worker_main Blocking TCP User requests TCP stack states AIO completion Request Completion Request Completion Event loop for timers (9 lines) worker_aio worker_tcp_input worker_tcp_timer 12 How cheap is a poor man’s thread? Minimal memory consumption: 48 bytes 48 bytes Each thread just loops and does nothing Actual size determined by thread-local states Even an ethernet packet can be >1,000 bytes… Pay as you go --- only pay for things needed In contrast: A Linux POSIX thread’s stack has 2MB by default The state-of-the-art user-level thread system (Capriccio) use at least a few KBs for each thread Observation: The poor man’s thread is extremely memory-efficient (Challenging most event-driven systems) 13 I/O scalability test Comparison against the Linux POSIX Thread Library (NPTL) Highly optimized OS thread implementation Each NPTL thread’s stack limited to 32KB Mini-benchmarks used: Disk head scheduling (all threads running) FIFO pipe scalability with idle threads (128 threads running) 14 A simple web server 15 How poor is the poor man’s monad? Not too shabby Benchmarks shows comparable (if not higher) performance to existing, optimized systems An elegant design is more important than 10% performance improvement type safety for Added benefit: many dangerous things Continuations, thread queues, schedulers, asynchronous I/O 16 Related Work We are motivated by two projects: Twisted: the python event-driven framework for scalable internet applications - The programmer must write code in CPS Capriccio: a high-performance user-level thread system for network servers - Requires C compiler hacks - Difficult to customize (e.g. adding SMP support) Continuation-based concurrency [Wand 80], [Shivers 97], … Other languages and programming models: CML, Erlang, … 17 Conclusion Haskell and The Poor Man’s Concurrency Monad are a promising solution for highperformance, massively-concurrent networking applications: Get the best of both threads and events! This poor man’s approach is actually very cheap, and not so poor! http://www.cis.upenn.edu/~lipeng/homepage/unify.html 18