Download Figure 5.01

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

DNIX wikipedia , lookup

Library (computing) wikipedia , lookup

Burroughs MCP wikipedia , lookup

Process management (computing) wikipedia , lookup

Thread (computing) wikipedia , lookup

Transcript
Introduction to Threads
 Overview
 Multithreading Models
 Thread Libraries
 Threading Issues
 Operating System Examples
 Windows XP Threads
 Linux Threads
4.1
Threads
 A Thread is just a sequence of instructions to execute
 Threads share the same memory space as other threads in the same
application – so they automatically share data and variables.
 Threads can run on different processor cores on a multicore processor –
this makes applications faster and more responsive
 Even on a single core processor threads make an application more
responsive – if one thread stops waiting for I/O, other threads can still run
 Processes have a unique virtual memory address space and they take a lot
longer for the OS to switch between than threads. Sharing data requires
additional overhead and steps – so they have a lot more overhead than
threads in many applications. Most applications have one process with
several threads.
 In C/C++, a thread typically runs the code in a C/C++ function and a special
API call starts up a new thread running that function.
4.2
Single and Multithreaded Processes
4.3
Benefits of Threads
 Responsiveness
 Applications can run up to N times faster on an N core processor
 Resource Sharing
 Economy
 Scalability
4.4
Multicore Programming
 Applications only run on one processor core - unless they use multiple
threads
 Multicore systems are putting more pressure on programmers to use
threads, multithreaded application challenges include:

Dividing activities

Balancing the Computational Load

Data splitting

Data dependency

Testing and debugging
4.5
Concurrent Execution on a Single-core System
OS can time slice between the four Threads T1…T4
4.6
Parallel Execution on a Multicore System
OS can time slice the four Threads T1…T4 on two
processor cores. Two threads can run in parallel on
different cores. Application could run up to twice as
fast. Without threads, an application can run on only
one core!
4.7
User Threads
 Thread management done by a user-level threads library
 Three primary thread libraries:

POSIX Pthreads

Win32 threads

Java and C# threads
4.8
Thread Libraries
 Thread library provides programmer with API for creating and managing
threads
 Two primary ways of implementing

Library entirely in user space

Kernel-level library supported by the OS
4.9
Pthreads
 A POSIX standard (IEEE 1003.1c) API for thread creation
and synchronization
 API specifies behavior of the thread library, implementation
is up to development of the library
 Common in UNIX operating systems (Solaris, Linux, Mac
OS X)
 Can also be added to Windows by installing the optional
Pthreads library
4.10
Java and C# Threads
 Thread support is built into these newer languages with
keywords
 Java threads are managed by the JVM
 C# thread support is in .Net Framework (the C# JVM)
 Typically implemented using the threads model provided by
underlying OS
 Java and C# threads may be created by:

Extending Thread class

Implementing the Runnable interface
4.11
Threading Issues
 Semantics of fork() and exec() system calls
 Thread cancellation of target thread

Asynchronous or deferred
 Signal handling
 Thread pools
 Thread-specific data
 Scheduler activations
4.12
Thread Cancellation
 Terminating a thread before it has finished
 Two general approaches:

Asynchronous cancellation terminates the target
thread immediately

Deferred cancellation allows the target thread to
periodically check if it should be cancelled
4.13
Signal Handling

Signals are used in UNIX systems to notify a process that a
particular event has occurred

A signal handler is used to process signals

1.
Signal is generated by particular event
2.
Signal is delivered to a process
3.
Signal is handled
Options:

Deliver the signal to the thread to which the signal applies

Deliver the signal to every thread in the process

Deliver the signal to certain threads in the process

Assign a specific thread to receive all signals for the
process
4.14
Thread Pools
 Create a number of threads in a pool where they await work
 Advantages:

Usually slightly faster to service a request with an existing thread
than create a new thread

Allows the number of threads in the application(s) to be bound to
the size of the pool
4.15
Windows Threads
 Implements the one-to-one mapping, kernel-level
 Each thread contains

A thread id

Register set

Separate user and kernel stacks

Private data storage area
 The register set, stacks, and private storage area are known
as the context of the threads
4.16
Linux Threads
 Linux refers to them as tasks rather than threads
 Thread creation is done through clone() system call
 clone() allows a child task to share the address space
of the parent task (process)
4.17
Background on the need for Synchronization
• Threads may need to wait for other threads to
finish an operation
• Additionally concurrent access to shared data with
threads may result in data inconsistency (i.e.,
incorrect values)
• Maintaining data consistency requires
mechanisms to ensure the orderly execution of
cooperating processes (or threads)
Example Problem
• Suppose two threads share a common buffer
array. The producer put items in the buffer and
the consumer removes them.
• A solution to a two thread consumer-producer
problem that fills all the buffer space has an
integer count that keeps track of the number of
full buffers. Initially, count is set to 0. It is
incremented by the producer after it produces a
new buffer and is decremented by the consumer
after it consumes a buffer.
Producer
while (true) {
/* produce an item and put in
nextProduced */
while (count == BUFFER_SIZE)
; // do nothing
buffer [in] = nextProduced;
in = (in + 1) % BUFFER_SIZE;
count++;
}
Consumer
while (true) {
while (count == 0)
; // do nothing
nextConsumed = buffer[out];
out = (out + 1) % BUFFER_SIZE;
count--;
// consume the item in nextConsumed
}
Critical Section
• The code segments that read and write global
shared data between threads or processes is
called a “critical section”
• Possible race condition bugs on global variable
values – example will follow
• OS Synchronization API used to solve this
• Must be careful and use OS synchronization
primitives to control access to a critical section
or hidden bugs will appear in code
Race Condition on Count
•
count++ could be implemented as
•
register1 = count
register1 = register1 + 1
count = register1
count-- could be implemented as
•
register2 = count
register2 = register2 - 1
count = register2
Consider this execution interleaving with “count = 5” initially:
S0: producer executes register1 = count {register1 = 5}
S1: producer executes register1 = register1 + 1 {register1 = 6}
S2: consumer executes register2 = count {register2 = 5}
S3: consumer executes register2 = register2 - 1 {register2 = 4}
S4: producer executes count = register1 {count = 6 }
S5: consumer executes count = register2 {count = 4}
Need an Atomic Operation
• Count++ and Count-- code must run to end
before switching to other thread to avoid bugs
• Atomic operation here means a basic
operation which cannot be stopped or
interrupted in the middle to switch to another
thread
• Race conditions will occur faster on systems
with multiple processors since threads are
running in parallel
Solution to Critical-Section Problem
1. Mutual Exclusion (Mutex) - If process Pi is executing in its
critical section, then no other processes can be executing in
their critical sections
2. Progress - If no process is executing in its critical section and
there exist some processes that wish to enter their critical
section, then the selection of the processes that will enter
the critical section next cannot be postponed indefinitely
3. Bounded Waiting - A bound must exist on the number of
times that other processes are allowed to enter their critical
sections after a process has made a request to enter its
critical section and before that request is granted
 Assume that each process executes at a nonzero speed
 No assumption concerning relative speed of the N processes
Solution to Critical-section Problem Using Mutex Locks
do {
acquire lock
critical section
release lock
remainder section
} while (TRUE);
Deadlock and Starvation
• Deadlock – two or more processes or threads are waiting indefinitely for
an event that can be caused by only one of the waiting processes
• Let S and Q be two semaphores initialized to 1 (i.e. a mutual exclusion
lock)
P0
P1
wait (S);
wait (Q);
.
.
.
signal (S);
signal (Q);
wait (Q);
wait (S);
.
.
.
signal (Q);
signal (S);
• Starvation – indefinite blocking. A process may never be removed from
the semaphore queue in which it is suspended
• Priority Inversion - Scheduling problem when lower-priority process holds
a lock needed by higher-priority process. Might need to run lower –
priority process first to continue. – messes up priority on processes
RTOS
• Real Time Operating System (RTOS)
• Used in systems that need a fast response
time to external events on the order of
milliseconds
• This is about 10-100X faster than PCs
• The general purpose OS in a PC is optimized
for throughput and a fast graphical user
interface – but at the expense of the Real
Time response
Mbed RTOS & Threads
• Runs a 1ms time slice to switch between
threads this is about 10-100X faster than PCs
• Memory is limited to around 8 threads – each
thread needs its own stack and the RTOS also
uses a fair chunk of RAM (32K). RAM is used
for variables only. Nonvolatile Flash memory
stores code and constants -there is (512K) of
it, so it is typically not the issue.
MBED RTOS
• The mbed RTOS also provides some basic
synchronization primitves:
– Mutex Lock – used to lock and unlock access to
shared memory (variables) and I/O devices
– On the mbed compiler, using the keyword volatile
will put the equivalent of a mutex lock on a simple
built in global variable data type (but not arrays)
– Signals – can be used to send signals between
threads
MBED RTOS
• Semaphores – a more advanced
synchronization primitive than a mutex. Can
count things, but also slower than a mutex.
• Thread::wait(x ms) – tells the RTOS scheduler
to not run this thread again until x ms of time
has passed. Useful to keep a thread from using
too much processor time when it does not
need it. Other threads run during the delay.
• Don’t use wait – use Thread::wait
Mbed RTOS
• Free for ARM mbed users. Many RTOSes
require a license fee. Just need RTOS library in
project and a new #include “rtos.h” after
mbed.h include
• Documentation and code examples found in
the mbed Handbook under “Real Time
Operating System” click “mbed RTOS” link
• Free networking libraries are also available
that use the RTOS for Internet of Things
Devices (IoT)