Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Semi-Explicit Parallel Programming in Haskell Satnam Singh Microsoft Research Cambridge Leeds2009 0 1 19 0 1 9 19 public class ArraySummer { private double[] a; // Encapsulated array private double sum; // Variable used to compute sum // Constructor requiring an initial value for array public ArraySummer(double[] values) { a = values; } // Method to compute the sum of segment of the array public void SumArray(int fromIndex, int toIndex, out double arraySum) { sum = 0; for (int i = fromIndex; i < toIndex; i++) sum = sum + a[i]; arraySum = sum; } } thread 1 ThreadCreate thread.Start thread 2 thread.Join class Program { static void Main(string[] args) { const int testSize = 100000000; double[] testValues = new double[testSize] ; for (int i = 0; i < testSize; i++) testValues[i] = i/testSize; ArraySummer summer = new ArraySummer(testValues) ; Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); double testSum ; summer.SumArray(0, testSize, out testSum); TimeSpan ts = stopWatch.Elapsed; Console.WriteLine("Sum duration (mili-seconds) = " + stopWatch.ElapsedMilliseconds); Console.WriteLine("Sum value = " + testSum); Console.ReadKey(); } } } class Program { static void Main(string[] args) { const int testSize = 100000000; double[] testValues = new double[testSize]; for (int i = 0; i < testSize; i++) testValues[i] = i / testSize; ArraySummer summer = new ArraySummer(testValues); Stopwatch stopWatch = new Stopwatch(); stopWatch.Start(); double testSumA = 0 ; double testSumB; Thread sumThread = new Thread(delegate() { summer.SumArray(0, testSize / 2, out testSumA); }); sumThread.Start(); summer.SumArray(testSize/2+1, testSize, out testSumB); sumThread.Join(); TimeSpan ts = stopWatch.Elapsed; Console.WriteLine("Sum duration (mili-seconds) = " + stopWatch.ElapsedMilliseconds); Console.WriteLine("Sum value = " + (testSumA+testSumB)); Console.ReadKey(); } } The Accidental Semi-colon A; B; A B createThread (A) ; B; A B Execution Model “Thunk” for “fib 10” Pointer to the 1 for free implementation 1 Values variables 3 8 6 8 10 8 5 5 Storage slot for 9 the result fib 0 = 0 fib 1 = 1 fib n = fib (n-1) + fib (n-2) wombat and numbat wombat :: Int -> Int wombat n = 42*n pure function side-effecting function numbat :: Int -> IO Int numbat n = do c <- getChar return (n + ord c) Computation inside a ‘monad’ IO (), pronounced “IO unit” numbat :: IO () numbat = do c <- getChar putChar (chr (1 + ord c)) f (g + h) z!!2 mapM f [a, b, ... infer type [Int] -> Bool pure function deterministic IO String stateful operation may be non-deterministic , g] Functional Programming to the Rescue? • Why not evaluate every-sub expression of our pure functional programs in parallel? – execute each sub-expression in its own thread? • The 80s dream does not work: – granularity – data-dependency Infix Operators • mod a b mod 7 3 = 1 • Infix with backquotes: a `mod` b 7 `mod` 3 = 1 x `par` y • x is sparked for speculative evaluation • a spark can potentially be instantiated on a thread running in parallel with the parent thread • x `par` y = y • typically x used inside y • blurRows `par` (mix blurCols blurRows) x `par` (y + x) y is evaluated first y x is evaluated second x is sparked x fizzles x x x `par` (y + x) P1 P2 y is evaluated on P1 y x is taken up for evaluation on P2 x x is sparked on P1 x par is Not Enough • pseq :: a -> b -> b • pseq is strict in its first argument but not in its second argument • Related function: – – – – seq :: a -> b -> b Strict in both arguments Compiler may transform seq x y to seq y x No good for controlling order for evaluation for parallel programs Don Stewart Parallel fib with threshold cutoff = 35 -- Threshold for parallel evaluation -- Sequential fib fib' :: Int -> Integer fib' 0 = 0 fib' 1 = 1 fib' n = fib' (n-1) + fib' (n-2) -- Parallel fib with thresholding fib :: Int -> Integer fib n | n < cutoff = fib' n | otherwise = r `par` (l `pseq` l + r) where l = fib (n-1) r = fib (n-2) -- Main program main = forM_ [0..45] $ \i -> printf "n=%d => %d\n" i (fib i) Parallel fib performance parallel fib from 1 to 8 cores (2X Intel quad core) Speedup over 1 core 7 6 5 4 3 parfib 2 1 0 1 2 3 4 5 Number of cores 6 7 8 Parallel quicksort (wrong) quicksortN quicksortN quicksortN quicksortN = losort where losort hisort :: (Ord a) => [a] -> [a] [] = [] [x] = [x] (x:xs) `par` hisort `par` losort ++ (x:hisort) = quicksortN [y|y <- xs, y < x] = quicksortN [y|y <- xs, y >= x] What went wrong? losort Unevaluated thunk cons cell Unevaluated thunk forceList forceList :: [a] -> () forceList [] = () forceList (x:xs) = x `seq` forceList xs Parallel quicksort (right) quicksortF [] = [] quicksortF [x] = [x] quicksortF (x:xs) = (forceList losort) `par` (forceList hisort) `par` losort ++ (x:hisort) where losort = quicksortF [y|y <- xs, y < x] hisort = quicksortF [y|y <- xs, y >= x] parSumArray :: Array Int Double -> Double parSumArray matrix = lhs `par` (rhs`pseq` lhs + rhs) where lhs = seqSum 0 (nrValues `div` 2) matrix rhs = seqSum (nrValues `div` 2 + 1) (nrValues-1) matrix Strategies • Haskell provides a collection of evaluation strategies for controlling the evaluation order of various data-types. • Users have to define indicate how their own types are evaluated to a normal form. • Algorithms + Strategy = Parallelism, P. W. Trinder, K. Hammond, H.-W. Loidl and S. L. Peyton Jones. • http://www.macs.hw.ac.uk/~dsg/gph/papers/h tml/Strategies/strategies.html Explicitly Creating Threads • forkIO :: IO () -> ThreadID • Creates a lightweight Haskell thread, not an operating system thread. Inter-thread Communication • putMVar :: MVar a -> IO () • takeMVar :: MVar a -> IO a MVars empty 52 mv ... putMVar mv 52 ... ... ... ... v <- takeMVar mv ... Rendezvous threadA :: MVar Int -> MVar Float -> IO () threadA valueToSendMVar valueReceivedMVar = do -- some work -- new perform rendezvous by sending 72 putMVar valueToSendMVar 72 -- send value v <- takeMVar valueToReadMVar putStrLn (show v) Rendezvous threadB :: MVar Int -> MVar Float -> IO () threadB valueToReceiveMVar valueToSendMVar = do -- some work -- now perform rendezvous by waiting on value z <- takeMVar valueToReceiveMVar putMVar valueToSendMVar (1.2 * z) -- continue with other work Rendezvous main :: IO () main = do aMVar <- newEmptyMVar bMVar <- newEmptyMVar forkIO (threadA aMVar bMVar) forkIO (threadB aMVar bMVar) threadDelay 1000 -- BAD! fib again fib :: Int -> Int -- As before fibThread :: Int -> MVar Int -> IO () fibThread n resultMVar = putMVar resultMVar (fib n) sumEuler :: Int -> Int -- As before fib fixed fibThread :: Int -> MVar Int -> IO () fibThread n resultMVar = do pseq f (return ()) putMVar resultMVar f where f = fib n $ time fibForkIO +RTS -N1 real user sys 0m40.473s 0m0.000s 0m0.031s $ time fibForkIO +RTS -N2 real user sys 0m38.580s 0m0.000s 0m0.015s “STM”s in Haskell data STM a instance Monad STM -- Monads support "do" notation and sequencing -- Exceptions throw :: Exception -> STM a catch :: STM a -> (Exception->STM a) -> STM a -- Running STM computations atomically :: STM a -> IO a retry :: STM a orElse :: STM a -> STM a -> STM a -- Transactional variables data TVar a newTVar :: a -> STM (TVar a) readTVar :: TVar a -> STM a writeTVar :: TVar a -> a -> STM () 43 Transactional Memory Q1 Q2 void GetEither() { atomic { do { i = Q1.Get(); } orelse { i = Q2.Get(); } R.Put( i ); } } R • do {...this...} orelse {...that...} tries to run “this” • If “this” retries, it runs “that” instead • If both retry, the do-block retries. GetEither() will thereby wait for there to be an item in either queue ThreadScope • GHC run-time can generate eventlogs. • Instrument: – thread creating, start/stop, migration – GCs • ThreadScope graphical viewer • Q: how to mine / understand the information? Lots Unsaid • • • • xperf / VTune correlation Verification Debugging Parallel garbage collection Summary • Three ways of writing parallel and concurrent programs in Haskell: – `par` and `pseq` (semi-explicit parallelism) – Mvars (explicit concurrency) – STM (explicit concurrency with transactions) • Implicit concurrency • Pure functional programming has pros and cons for parallel programming. • Can mainstream languages take advantage of the same techniques? • How can visualization help with performance tuning?