Multicore, parallelism, and multithreading
... Parallelizing compiler tries to split up a loops so
that its iterations can be executed on separate
Identify dependences between references -independent actions can operate in parallel
Nonlinear Data Structures
... • A tree is just one example of a nonlinear data structure. Two other
examples are multidimensional arrays and graphs. In the next few
lessons, we will examine these data structures to see how they are
represented using the computer's linear memory. Remember that in
the last lesson we saw that we co ...
Monica Borra 2
... Efficiency: DataMPI speeds up varied Big Data workloads and improves job
execution time by 31%-41%.
Fault Tolerance: DataMPI supports fault tolerance. Evaluations show that
DataMPI-FT can attain 21% improvement over Hadoop.
... Scalable Memory and Communication Fabric
• Vision: Scalable memory and communication fabric that provides
performance, scalability, power efficiency, and flexibility
• Specific research topics:
o Flexible memory hierarchy
o Adaptable designs for
Intro to MIMD Architectures
... + Communication between processor is efficient
- Synchronized access to share data in memory
needed. Synchronising constructs (semaphores,
conditional critical regions, monitors) result in
nondeterministc behaviour which can lead
programming errors that are difficult to discover
- Lack of scalabilit ...
... – Implication: About a hundred cores in five
• BUT: Software can only make use of one!
Stream processing is a computer programming paradigm, equivalent to data-flow programming and reactive programming, that allows some applications to more easily exploit a limited form of parallel processing. Such applications can use multiple computational units, such as the FPUs on a GPU or field programmable gate arrays (FPGAs), without explicitly managing allocation, synchronization, or communication among those units.The stream processing paradigm simplifies parallel software and hardware by restricting the parallel computation that can be performed. Given a set of data (a stream), a series of operations (kernel functions) is applied to each element in the stream. Uniform streaming, where one kernel function is applied to all elements in the stream, is typical. Kernel functions are usually pipelined, and local on-chip memory is reused to minimize external memory bandwidth. Since the kernel and stream abstractions expose data dependencies, compiler tools can fully automate and optimize on-chip management tasks. Stream processing hardware can use scoreboarding, for example, to launch DMAs at runtime, when dependencies become known. The elimination of manual DMA management reduces software complexity, and the elimination of hardware caches reduces the amount of the area not dedicated to computational units such as ALUs.During the 1980s stream processing was explored within dataflow programming. An example is the language SISAL (Streams and Iteration in a Single Assignment Language).