Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
CSC 536 Lecture 2 Outline Concurrency on the JVM (and between JVMs) Working problem Java concurrency tools (review) Solution using traditional Java concurrency tools Solution using Akka concurrency tools Overview of Akka Working problem Compute the total size of all regular files stored, directly or indirectly, in a directory > java Sequential C:\Windows Users Total size: 34405975972 Time taken: 47.777222426 sammy etc foo.txt docs bar.txt ellie xyz.txt abc.txt A recursive solution Basis step: if input is a regular file, return its size Recursive step: if input is a directory, call function recursively on every item in the directory, add up the returned values and return the sum (Depth-First Traversal) Sequential.java Threads Should be using threads to traverse filesystem in parallel A thread is a “lightweight process” A thread really lives inside a process A thread has its own: program counter stack register set A thread shares with other threads in the process code global variables Interface Runnable Must be implemented by any class that will be executed by a thread Implement method run() with code the thread will run Anonymous class example: new Runnable() { public void run() { // code to be run by thread } } Class Thread Encapsulates a thread of execution in a program To execute a thread: An instance of a Runnable class is passed as an argument when creating the thread The thread is started with method start() Example: Runnable r = new Runnable() { public void run() { // code executed by thread }}; new Thread(r).start(); Class Thread Encapsulates a thread of execution in a program To execute a thread: An instance of a Runnable class is passed as an argument when creating the thread The thread is started with method start() Example: Runnable r = new Runnable() { public void run() { // code executed by thread }}; new Thread(r).start(); Issue with threads: synchronizing access to shared data Producer-Consumer example Setup A shared memory buffer Producer puts objects into the buffer Consumer reads objects from the buffer ProducerConsumerTest.java, UnsyncBuffer.java Producer-Consumer example Setup A shared memory buffer Producer puts objects into the buffer Consumer reads objects from the buffer ProducerConsumerTest.java, UnsyncBuffer.java Problem: producer can over-produce, consumer can over-consume (example of race condition) Need to synchronize (coordinate) the processes Synchronization Mechanisms that ensure that concurrent threads/processes do not render shared data inconsistent Three most widely used synchronization mechanisms in centralized systems are Semaphores Locks Monitors Monitors Monitor = Set of operations + set of variables + lock Set of variables is the monitor’s state Variables can be accessed only by the monitor’s operations At most one thread can be active within the monitor at a time To execute a monitor’s operation, thread A must obtain the monitor’s lock If thread B holds the monitor’s lock, thread A must wait on the monitor’s queue (wait) Once thread A is done with the monitor’s lock, it must release it so that other threads can obtain it (notify) Synchronization in Java Each Java class becomes a monitor when at least one of its methods uses the synchronized modifier The synchronized modifier is used to write code blocks and methods that require a thread to obtain a lock Synchronization is always done with respect to an object ProducerConsumerTest.java, SyncBuffer.java Java Memory model (before Java 5) Before Java 5: ill defined a thread not seeing values written by other threads a thread observing impossible behaviors by other threads Java 5 and later Monitor lock rule: a release of a lock happens before the subsequent acquire of the same lock Volatile variable rule: a write of a volatile variable happens before every subsequent read of the same volatile variable Disadvantages of synchronization Disadvantages: Synchronization is error-prone Synchronization blocks threads and takes time Improper synchronization results in deadlocks Creating a thread is not a low-overhead operation Too many threads slow down the system Disadvantages of synchronization Disadvantages: Synchronization is error-prone Synchronization blocks threads and takes time Improper synchronization results in deadlocks Creating a thread is not a low-overhead operation Too many threads slow down the system Thread pooling Thread pooling is a solution to the thread creation and management problem The main idea is to create a bunch of threads in advance and have them wait for something to do The same thread can be recycled for different operations Thread pool components: A blocking queue A pool of threads Blocking queue Queue is a sequence of objects Two basic operations: enqueue dequeue Blocking Queue: A dequeue thread must block if the queue is empty An enqueue thread must add an object to the queue and notify blocked threads Blocking queue must be thread safe Blocking Queue dequeue To dequeue an object from the queue: Wait until the lock on the queue is obtained If queue is empty, release lock and sleep If queue is not empty, remove the first element and return it To enqueue an object to the queue: Wait until the lock on the queue is obtained Add object at the end of the queue Notify any sleeping thread BlockingQueue.java Thread Pool = threads + tasks Thread pool = group of threads + queue of Runnable tasks Thread pool starts by creating the group of threads Each thread loops indefinitely In every iteration, each thread attempts to dequeue a task from the task queue If the task queue is empty, block on the queue If a task is dequeued, run the task Thread pool method execute(task) simply adds the task to the task queue ThreadPool.java, ThreadPoolTest.java Java thread pool API Interface ExecutorService defines objects that run Runnable tasks Using method execute() Class Executors defines factory methods for obtaining a thread pool (i.e. an ExecutorService object) newFixedThreadPool(n) creates a pool of n threads ExecutorService service = Executors.newFixedThreadPool(10); service.execute(new Runnable() { public void run() { // task code }); Back to working problem Compute the total size of all regular files stored, directly or indirectly, in a directory Users sammy etc foo.txt docs bar.txt ellie xyz.txt abc.txt Modern Java Concurrent solution Use Runnable objects Create Runnable object for every (sub)directory Use thread pool Keeps the number of threads manageable Keep overhead of thread creation low Reuse threads Avoid sharing state Variable totalSize only Access must be synchronized AtomicLong Accumulator variable totalSize is incremented by all threads Must insure that the incrementing operation (the critical section) is not interrupted by a context switch Solution 1: Use a Java lock to synchronize access to the critical section Solution 2: Use class AtomicLong method addAndGet() executes as a single atomic instruction Modern Java Concurrent solution Use Runnable objects Create Runnable object for every (sub)directory Use thread pool Keeps the number of threads manageable Keep overhead of thread creation low Reuse threads Avoid sharing state Variable totalSize only Access must be synchronized Concurrent1.java Does not work Concurrent1 problem The main thread must wait until all (sub)directories have been processed No way to know when that happens Need to: 1. keep track of pending tasks, i.e. (directory processing) task creation and termination 2. Block the main thread until the number of pending tasks is 0 Modern Java Concurrent solution Use Runnable objects Create Runnable object for every (sub)directory Use thread pool Keeps the number of threads manageable Keep overhead of thread creation low Reuse threads Avoid sharing state Variable totalSize only Access must be synchronized Require synchronization variables To terminate the application Concurrent2.java CountDownLatch Synchronization tool that allows one or more threads to wait until a set of operations being performed in other threads completes. initialized with a given count method await() blocks until count reaches 0 method countdown() decrements count by 1 After count reaches 0, any subsequent invocations of await return immediately. A CountDownLatch initialized with a count of 1 serves as a simple on/off gate: all threads invoking await() wait at the gate until it is opened by a thread invoking countDown(). An Akka/Scala concurrent solution Use Akka Actors Task of processing a directory is given to a worker actor by a master actor Worker actor processes directory computes the total size of all the regular files and sends it to master sends to master the (path)name of every sub-directory Master actor Initiates the process sends tasks to worker actors collects the total size keeps track of pending tasks ConcurrentAkka.java Akka Actor-based concurrency framework Provides solutions for non-blocking concurrency Written in Scala, but also has Java API Each actor has a state that is invisible to other actors Each actor has a message queue Actors receive and handle messages sequentially, therefore no synchronization issues Actors should rarely block Actors are lightweight and asynchronous 650 bytes can have millions of actors running on a few threads on a single machine Why use Akka in DSII? Distributed computing Actors do not share state and interact through messages Actor locations (local vs remote) are transparent Akka developed for distributed applications from ground up Group membership Akka Cluster provides a fault-tolerant membership service Uses gossip protocols and automatic failure-detectors Fault tolerance Akka implements “let-it-crash” semantics model Uses supervisor hierarchies that self-heal Reliable communication Akka includes an implementation of reactive streams Actors State Supposed to be invisible to other actors Behavior The actions to be taken in reaction to a message Mailbox Actors process messages from mailbox sequentially Children Actors can create other actors A hierarchy of actors Supervisor strategy An actor is supervised by its parent Actors class First extends Actor { def receive = { case "hello" => println("Hello world!") case msg: String => println("Got " + msg + " from " + sender) case _ => println("Unknown message") } } object Server extends App { val system = ActorSystem("FirstExample") val first = system.actorOf(Props[First], name = "first") println("The path associated with first is " + first.path) first ! "hello" first ! "Goodbye" first ! 4 } First.scala Using sbt Simple Build Tool (http://www.scala-sbt.org/) Easy to set up Sample build.sbt configuration file lazy val root = (project in file(".")). settings ( name := "First Example", version := "1.0", scalaVersion := "2.12.1", scalacOptions in ThisBuild ++= Seq("-unchecked", "-deprecation"), resolvers += "Typesafe Repository" at "http://repo.typesafe.com/typesafe/releases/", libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.4.17" ) Abstract Class Actor Extend Actor class and implement method receive Method receive should have case statements that define the messages the actor handles implement the logic of how messages are handled use Scala pattern matching class First extends Actor { def receive = { case "hello" => println("Hello world!") case msg: String => println("Got " + msg) case _ => println("Unknown message") } } Class ActorSystem Actors form hierarchies, i.e. a system Class ActorSystem encapsulates a hierarchy of actors Class ActorSystem provides methods for creating actors looking up actors. At least the first actor in the system is created using it Class ActorContext Class ActorContext also provides methods for creating actors looking up actors. Each actor has its own instance of ActorContext that allows it to create (child) actors and lookup references to actors Obtaining actor references Creating actors ActorSystem.actorOf() ActorContext.actorOf() Both methods return ActorRef reference to new actor Looking up existing actor by concrete path ActorSystem.actorSelection() ActorContext.actorSelection() Both methods return ActorSelection reference to new actor ActorRef or ActorSelection references can be used to send a message to the actor Class ActorRef Immutable and serializable handle to an actor actor could be in the same ActorSystem, a different one, or even another, remote JVM obtained from ActorSystem (or indirectly from ActorContext) ActorRefs can be shared among actors by message passing you can serialize it, send it over the wire and use it on a remote host and it will still be representing the same Actor on the original node, across the network. In fact, every message carries the ActorRef of the sender Message passing conversely is their only purpose Actor System Class Props Props is an Actor configuration object recipe for creating an actor including associated deployment info Hides the instantiation of the actor so reference to it is unavailable Used when creating new actors through ActorSystem.actorOf ActorContext.actorOf Sending messages Messages are sent to an Actor through one of method tell or simply ! means “fire-and-forget”, e.g. send a message asynchronously and return immediately. method ask or simply ? sends a message asynchronously and returns a Future representing a possible reply Message ordering is guaranteed on a per-sender basis Tell is the preferred way of sending messages. No blocking waiting for a message Best concurrency and scalability characteristics Message ordering For a given pair of actors, messages sent from the first to the second will be received in the order they were sent Causality between messages is not guaranteed! Actor A sends message M1 to actor C Actor A then sends message M2 to actor B Actor B forwards message M2 to actor C Actor C may receive M1 and M2 in any order Also, message delivery is “at-most-once delivery” i.e. no guaranteed delivery Message ordering Akka also guarantees The actor send rule The send of the message to an actor happens before the receive of that message by the same actor. The actor subsequent processing rule processing of one message happens before processing of the next message by the same actor. Both rules only apply for the same actor instance and are not valid if different actors are used Messages and immutability Messages can be any kind of object but have to be immutable. Scala can’t enforce immutability (yet) so this has to be by convention. Primitives like String, Int, Boolean are always immutable. Apart from these the recommended approach is to use Scala case classes which are immutable (if you don’t explicitly expose the state) and work great with pattern matching at the receiver side Other good messages types are scala.Tuple2, scala.List, scala.Map which are all immutable and great for pattern matching Actor API Scala trait (think partially implemented Java Interface) that defines one abstract method: receive() Offers useful references: self: reference to the ActorRef of actor sender: reference to sender Actor of the last received message typically used for replying to messages context: reference to ActorContext of actor that includes references to factory methods to create child actors (actorOf) system that the actor belongs to parent supervisor supervised children Ping Pong examples Second.scala Third.scala Scala pattern matching Scala has a built-in general pattern matching mechanism It allows to match on any sort of data with a first-match policy object MatchTest1 extends App { def matchTest(x: Int): String = x match { case 1 => "one" case 2 => "two" case _ => "many" } println(matchTest(3)) println(matchTest(2)) println(matchTest(1)) } Scala pattern matching Scala has a built-in general pattern matching mechanism It allows to match on any sort of data with a first-match policy object MatchTest2 extends App { def matchTest(x: Any): Any = x match { case 1 => "one" case "two" => 2 case y: Int => "scala.Int: " + y } println(matchTest(1)) println(matchTest("two")) println(matchTest(3)) println(matchTest("four")) } Scala case classes Case classes are regular classes with special conveniences automatically have factory methods with the name of the class all constructor parameters become immutable public fields of the class have natural implementations of toString, hashode, and equals are serializable by default provide a decomposition mechanism via pattern matching case class Start(secondPath : String) case object PING case object PONG Scala pattern matching Scala has a built-in general pattern matching mechanism It allows to match on any sort of data with a first-match policy case class Start(secondPath : String) case object PING case object PONG object MatchTest3 extends App { def matchTest(x: Any): Any = x match { case Start(secondPath) => "got " + secondPath case PING => "got ping" case PONG => "got pong" } println(matchTest(Start("path"))) println(matchTest(PING)) } Scala pattern matching Scala has a built-in general pattern matching mechanism It allows to match on any sort of data with a first-match policy object MatchTest4 extends App { def length [X] (xs:List[X]): Int = xs match { case Nil => 0 case y :: ys => 1 + length(ys) } println(length(List())) println(length(List(1,2))) println(length(List("one", "two", "three"))) } Scala pattern matching sealed trait Op case object OpAdd case object OpSub case object OpMul case object OpDiv extends extends extends extends Op Op Op Op sealed trait Exp case class ExpNum (n:Double) extends Exp case class ExpOp (e1:Exp, op:Op, e2:Exp) extends Exp object MatchTest5 extends App { def evaluate (e:Exp) : Double = e match { case ExpNum (v) => v case ExpOp (e1, op, e2) => val n1:Double = evaluate (e1) val n2:Double = evaluate (e2) op match { case OpAdd => n1 + n2 case OpSub => n1 - n2 case OpMul => n1 * n2 case OpDiv => n1 / n2 } } } Defining Akka message classes Use Scala case classes case class Start(secondPath : String) case object PING case object PONG class PingPong extends Actor { def receive = { case PING => ... case PONG => ... case Start(secondPath) => ... } } An Akka/Scala concurrent solution, in more detail Use Akka Actors Task of processing a directory is given to a worker actor by a master actor Worker actor processes directory computes the total size of all the regular files and sends it to master sends to master the (path)name of every sub-directory Master actor Initiates the process sends tasks to worker actors collects the total size keeps track of pending tasks ConcurrentAkka.scala class RoundRobinPool Creating a new worker actor for every task (processing a directory) is not efficient. tasks are very small so Actor creation overhead is relatively large Instead, create a pool of worker actors (routees) managed by a router actor of type RoundRobinPool the router is the parent of the routees a message (task) sent by some actor A to the router is forwarded to a routee chosen in a round-robin fashion The routee sees actor A as the sender of the message context.actorOf(RoundRobinPool(50).props(Props[FileProcessor]), name = "workerRouter")