Download Lecture2

Document related concepts
no text concepts found
Transcript
CSC 536 Lecture 2
Outline
Concurrency on the JVM (and between JVMs)
Working problem
Java concurrency tools (review)
Solution using traditional Java concurrency tools
Solution using Akka concurrency tools
Overview of Akka
Working problem
Compute the total size of all regular files stored,
directly or indirectly, in a directory
> java Sequential C:\Windows
Users
Total size: 34405975972
Time taken: 47.777222426
sammy
etc
foo.txt
docs
bar.txt
ellie
xyz.txt
abc.txt
A recursive solution
Basis step: if input is a regular file, return its size
Recursive step: if input is a directory, call function
recursively on every item in the directory, add up the
returned values and return the sum
(Depth-First Traversal)
Sequential.java
Threads
Should be using threads to traverse filesystem in parallel
A thread is a “lightweight process”
A thread really lives inside a process
A thread has its own:
program counter
stack
register set
A thread shares with other threads in the process
code
global variables
Interface Runnable
Must be implemented by any class that will be executed by
a thread
Implement method run() with code the thread will run
Anonymous class example:
new Runnable() {
public void run() { // code to be run by thread }
}
Class Thread
Encapsulates a thread of execution in a program
To execute a thread:
An instance of a Runnable class is passed as an argument when
creating the thread
The thread is started with method start()
Example:
Runnable r = new Runnable() {
public void run() { // code executed by thread }};
new Thread(r).start();
Class Thread
Encapsulates a thread of execution in a program
To execute a thread:
An instance of a Runnable class is passed as an argument when
creating the thread
The thread is started with method start()
Example:
Runnable r = new Runnable() {
public void run() { // code executed by thread }};
new Thread(r).start();
Issue with threads: synchronizing access to shared data
Producer-Consumer example
Setup
A shared memory buffer
Producer puts objects into the buffer
Consumer reads objects from the buffer
ProducerConsumerTest.java, UnsyncBuffer.java
Producer-Consumer example
Setup
A shared memory buffer
Producer puts objects into the buffer
Consumer reads objects from the buffer
ProducerConsumerTest.java, UnsyncBuffer.java
Problem:
producer can over-produce, consumer can over-consume
(example of race condition)
Need to synchronize (coordinate) the processes
Synchronization
Mechanisms that ensure that concurrent threads/processes
do not render shared data inconsistent
Three most widely used synchronization mechanisms in
centralized systems are
Semaphores
Locks
Monitors
Monitors
Monitor = Set of operations + set of variables + lock
Set of variables is the monitor’s state
Variables can be accessed only by the monitor’s operations
At most one thread can be active within the monitor at a time
To execute a monitor’s operation, thread A must obtain the
monitor’s lock
If thread B holds the monitor’s lock, thread A must wait on the
monitor’s queue (wait)
Once thread A is done with the monitor’s lock, it must release it
so that other threads can obtain it (notify)
Synchronization in Java
Each Java class becomes a monitor when at least one of its
methods uses the synchronized modifier
The synchronized modifier is used to write code blocks and
methods that require a thread to obtain a lock
Synchronization is always done with respect to an object
ProducerConsumerTest.java, SyncBuffer.java
Java Memory model (before Java 5)
Before Java 5: ill defined
a thread not seeing values written by other threads
a thread observing impossible behaviors by other threads
Java 5 and later
Monitor lock rule: a release of a lock happens before the
subsequent acquire of the same lock
Volatile variable rule: a write of a volatile variable happens
before every subsequent read of the same volatile variable
Disadvantages of synchronization
Disadvantages:
Synchronization is error-prone
Synchronization blocks threads and takes time
Improper synchronization results in deadlocks
Creating a thread is not a low-overhead operation
Too many threads slow down the system
Disadvantages of synchronization
Disadvantages:
Synchronization is error-prone
Synchronization blocks threads and takes time
Improper synchronization results in deadlocks
Creating a thread is not a low-overhead operation
Too many threads slow down the system
Thread pooling
Thread pooling is a solution to the thread creation and
management problem
The main idea is to create a bunch of threads in advance and have
them wait for something to do
The same thread can be recycled for different operations
Thread pool components:
A blocking queue
A pool of threads
Blocking queue
Queue is a sequence of objects
Two basic operations:
enqueue
dequeue
Blocking Queue:
A dequeue thread must block if the queue is empty
An enqueue thread must add an object to the queue and
notify blocked threads
Blocking queue must be thread safe
Blocking Queue dequeue
To dequeue an object from the queue:
Wait until the lock on the queue is obtained
If queue is empty, release lock and sleep
If queue is not empty, remove the first element and return it
To enqueue an object to the queue:
Wait until the lock on the queue is obtained
Add object at the end of the queue
Notify any sleeping thread
BlockingQueue.java
Thread Pool = threads + tasks
Thread pool = group of threads + queue of Runnable tasks
Thread pool starts by creating the group of threads
Each thread loops indefinitely
In every iteration, each thread attempts to dequeue a task from
the task queue
If the task queue is empty, block on the queue
If a task is dequeued, run the task
Thread pool method execute(task)
simply adds the task to the task queue
ThreadPool.java, ThreadPoolTest.java
Java thread pool API
Interface ExecutorService defines objects that run Runnable
tasks
Using method execute()
Class Executors defines factory methods for obtaining a
thread pool (i.e. an ExecutorService object)
newFixedThreadPool(n) creates a pool of n threads
ExecutorService service = Executors.newFixedThreadPool(10);
service.execute(new Runnable() {
public void run() { // task code });
Back to working problem
Compute the total size of all regular files stored,
directly or indirectly, in a directory
Users
sammy
etc
foo.txt
docs
bar.txt
ellie
xyz.txt
abc.txt
Modern Java Concurrent solution
Use Runnable objects
Create Runnable object for every (sub)directory
Use thread pool
Keeps the number of threads manageable
Keep overhead of thread creation low
Reuse threads
Avoid sharing state
Variable totalSize only
Access must be synchronized
AtomicLong
Accumulator variable totalSize is incremented by all
threads
Must insure that the incrementing operation (the critical
section) is not interrupted by a context switch
Solution 1: Use a Java lock to synchronize access to the
critical section
Solution 2: Use class AtomicLong
method addAndGet() executes as a single atomic instruction
Modern Java Concurrent solution
Use Runnable objects
Create Runnable object for every (sub)directory
Use thread pool
Keeps the number of threads manageable
Keep overhead of thread creation low
Reuse threads
Avoid sharing state
Variable totalSize only
Access must be synchronized
Concurrent1.java
Does not work
Concurrent1 problem
The main thread must wait until all (sub)directories have
been processed
No way to know when that happens
Need to:
1. keep track of pending tasks, i.e. (directory processing) task
creation and termination
2. Block the main thread until the number of pending tasks is 0
Modern Java Concurrent solution
Use Runnable objects
Create Runnable object for every (sub)directory
Use thread pool
Keeps the number of threads manageable
Keep overhead of thread creation low
Reuse threads
Avoid sharing state
Variable totalSize only
Access must be synchronized
Require synchronization variables
To terminate the application
Concurrent2.java
CountDownLatch
Synchronization tool that allows one or more threads to
wait until a set of operations being performed in other
threads completes.
initialized with a given count
method await() blocks until count reaches 0
method countdown() decrements count by 1
After count reaches 0, any subsequent invocations of await
return immediately.
A CountDownLatch initialized with a count of 1 serves as a
simple on/off gate: all threads invoking await() wait at the gate
until it is opened by a thread invoking countDown().
An Akka/Scala concurrent solution
Use Akka Actors
Task of processing a directory is given to a worker actor by a
master actor
Worker actor processes directory
computes the total size of all the regular files and sends it to master
sends to master the (path)name of every sub-directory
Master actor
Initiates the process
sends tasks to worker actors
collects the total size
keeps track of pending tasks
ConcurrentAkka.java
Akka
Actor-based concurrency framework
Provides solutions for non-blocking concurrency
Written in Scala, but also has Java API
Each actor has a state that is invisible to other actors
Each actor has a message queue
Actors receive and handle messages
sequentially, therefore no synchronization issues
Actors should rarely block
Actors are lightweight and asynchronous
650 bytes
can have millions of actors running on a few threads on a single
machine
Why use Akka in DSII?
Distributed computing
Actors do not share state and interact through messages
Actor locations (local vs remote) are transparent
Akka developed for distributed applications from ground up
Group membership
Akka Cluster provides a fault-tolerant membership service
Uses gossip protocols and automatic failure-detectors
Fault tolerance
Akka implements “let-it-crash” semantics model
Uses supervisor hierarchies that self-heal
Reliable communication
Akka includes an implementation of reactive streams
Actors
State
Supposed to be invisible to other actors
Behavior
The actions to be taken in reaction to a message
Mailbox
Actors process messages from mailbox sequentially
Children
Actors can create other actors
A hierarchy of actors
Supervisor strategy
An actor is supervised by its parent
Actors
class First extends Actor {
def receive = {
case "hello" => println("Hello world!")
case msg: String => println("Got " + msg + " from " + sender)
case _ => println("Unknown message")
}
}
object Server extends App {
val system = ActorSystem("FirstExample")
val first = system.actorOf(Props[First], name = "first")
println("The path associated with first is " + first.path)
first ! "hello"
first ! "Goodbye"
first ! 4
}
First.scala
Using sbt
Simple Build Tool (http://www.scala-sbt.org/)
Easy to set up
Sample build.sbt configuration file
lazy val root = (project in file(".")).
settings (
name := "First Example",
version := "1.0",
scalaVersion := "2.12.1",
scalacOptions in ThisBuild ++= Seq("-unchecked", "-deprecation"),
resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/",
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.17"
)
Abstract Class Actor
Extend Actor class and implement method receive
Method receive should have case statements that
define the messages the actor handles
implement the logic of how messages are handled
use Scala pattern matching
class First extends Actor {
def receive = {
case "hello" => println("Hello world!")
case msg: String => println("Got " + msg)
case _ => println("Unknown message")
}
}
Class ActorSystem
Actors form hierarchies, i.e. a system
Class ActorSystem encapsulates a hierarchy of actors
Class ActorSystem provides methods for
creating actors
looking up actors.
At least the first actor in the system is created using it
Class ActorContext
Class ActorContext also provides methods for
creating actors
looking up actors.
Each actor has its own instance of ActorContext that allows
it to create (child) actors and lookup references to actors
Obtaining actor references
Creating actors
ActorSystem.actorOf()
ActorContext.actorOf()
Both methods return ActorRef reference to new actor
Looking up existing actor by concrete path
ActorSystem.actorSelection()
ActorContext.actorSelection()
Both methods return ActorSelection reference to new actor
ActorRef or ActorSelection references can be used to send a
message to the actor
Class ActorRef
Immutable and serializable handle to an actor
actor could be in the same ActorSystem, a different one, or
even another, remote JVM
obtained from ActorSystem (or indirectly from ActorContext)
ActorRefs can be shared among actors by message passing
you can serialize it, send it over the wire and use it on a remote
host and it will still be representing the same Actor on the
original node, across the network.
In fact, every message carries the ActorRef of the sender
Message passing conversely is their only purpose
Actor System
Class Props
Props is an Actor configuration object
recipe for creating an actor including associated deployment
info
Hides the instantiation of the actor so reference to it is
unavailable
Used when creating new actors through
ActorSystem.actorOf
ActorContext.actorOf
Sending messages
Messages are sent to an Actor through one of
method tell or simply !
means “fire-and-forget”, e.g. send a message asynchronously and
return immediately.
method ask or simply ?
sends a message asynchronously and returns a Future representing a
possible reply
Message ordering is guaranteed on a per-sender basis
Tell is the preferred way of sending messages.
No blocking waiting for a message
Best concurrency and scalability characteristics
Message ordering
For a given pair of actors, messages sent from the first to
the second will be received in the order they were sent
Causality between messages is not guaranteed!
Actor A sends message M1 to actor C
Actor A then sends message M2 to actor B
Actor B forwards message M2 to actor C
Actor C may receive M1 and M2 in any order
Also, message delivery is “at-most-once delivery”
i.e. no guaranteed delivery
Message ordering
Akka also guarantees
The actor send rule
The send of the message to an actor happens before the receive of that
message by the same actor.
The actor subsequent processing rule
processing of one message happens before processing of the next message
by the same actor.
Both rules only apply for the same actor instance and are
not valid if different actors are used
Messages and immutability
Messages can be any kind of object but have to be
immutable.
Scala can’t enforce immutability (yet) so this has to be by
convention.
Primitives like String, Int, Boolean are always immutable.
Apart from these the recommended approach is to use
Scala case classes which are immutable (if you don’t
explicitly expose the state) and work great with pattern
matching at the receiver side
Other good messages types are scala.Tuple2, scala.List,
scala.Map which are all immutable and great for pattern
matching
Actor API
Scala trait (think partially implemented Java Interface) that
defines one abstract method: receive()
Offers useful references:
self: reference to the ActorRef of actor
sender: reference to sender Actor of the last received message
typically used for replying to messages
context: reference to ActorContext of actor that includes
references to
factory methods to create child actors (actorOf)
system that the actor belongs to
parent supervisor
supervised children
Ping Pong examples
Second.scala
Third.scala
Scala pattern matching
Scala has a built-in general pattern matching mechanism
It allows to match on any sort of data with a first-match policy
object MatchTest1 extends App {
def matchTest(x: Int): String = x match {
case 1 => "one"
case 2 => "two"
case _ => "many"
}
println(matchTest(3))
println(matchTest(2))
println(matchTest(1))
}
Scala pattern matching
Scala has a built-in general pattern matching mechanism
It allows to match on any sort of data with a first-match policy
object MatchTest2 extends App {
def matchTest(x: Any): Any = x match {
case 1 => "one"
case "two" => 2
case y: Int => "scala.Int: " + y
}
println(matchTest(1))
println(matchTest("two"))
println(matchTest(3))
println(matchTest("four"))
}
Scala case classes
Case classes are regular classes with special conveniences
automatically have factory methods with the name of the class
all constructor parameters become immutable public fields of
the class
have natural implementations of toString, hashode, and equals
are serializable by default
provide a decomposition mechanism via pattern matching
case class Start(secondPath : String)
case object PING
case object PONG
Scala pattern matching
Scala has a built-in general pattern matching mechanism
It allows to match on any sort of data with a first-match policy
case class Start(secondPath : String)
case object PING
case object PONG
object MatchTest3 extends App {
def matchTest(x: Any): Any = x match {
case Start(secondPath) => "got " + secondPath
case PING => "got ping"
case PONG => "got pong"
}
println(matchTest(Start("path")))
println(matchTest(PING))
}
Scala pattern matching
Scala has a built-in general pattern matching mechanism
It allows to match on any sort of data with a first-match policy
object MatchTest4 extends App {
def length [X] (xs:List[X]): Int = xs match {
case Nil
=> 0
case y :: ys => 1 + length(ys)
}
println(length(List()))
println(length(List(1,2)))
println(length(List("one", "two", "three")))
}
Scala pattern matching
sealed trait Op
case object OpAdd
case object OpSub
case object OpMul
case object OpDiv
extends
extends
extends
extends
Op
Op
Op
Op
sealed trait Exp
case class ExpNum (n:Double) extends Exp
case class ExpOp (e1:Exp, op:Op, e2:Exp) extends Exp
object MatchTest5 extends App {
def evaluate (e:Exp) : Double = e match {
case ExpNum (v) => v
case ExpOp (e1, op, e2) =>
val n1:Double = evaluate (e1)
val n2:Double = evaluate (e2)
op match {
case OpAdd => n1 + n2
case OpSub => n1 - n2
case OpMul => n1 * n2
case OpDiv => n1 / n2
}
}
}
Defining Akka message classes
Use Scala case classes
case class Start(secondPath : String)
case object PING
case object PONG
class PingPong extends Actor {
def receive = {
case PING => ...
case PONG => ...
case Start(secondPath) => ...
}
}
An Akka/Scala concurrent solution,
in more detail
Use Akka Actors
Task of processing a directory is given to a worker actor by a
master actor
Worker actor processes directory
computes the total size of all the regular files and sends it to master
sends to master the (path)name of every sub-directory
Master actor
Initiates the process
sends tasks to worker actors
collects the total size
keeps track of pending tasks
ConcurrentAkka.scala
class RoundRobinPool
Creating a new worker actor for every task (processing a
directory) is not efficient.
tasks are very small so Actor creation overhead is relatively
large
Instead, create a pool of worker actors (routees) managed
by a router actor of type RoundRobinPool
the router is the parent of the routees
a message (task) sent by some actor A to the router is
forwarded to a routee chosen in a round-robin fashion
The routee sees actor A as the sender of the message
context.actorOf(RoundRobinPool(50).props(Props[FileProcessor]), name = "workerRouter")