Download pptx

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Human performance modeling wikipedia, lookup

Reactive programming wikipedia, lookup

Falcon (programming language) wikipedia, lookup

Structured programming wikipedia, lookup

C Sharp syntax wikipedia, lookup

Program optimization wikipedia, lookup

Coding theory wikipedia, lookup

Parallel computing wikipedia, lookup

Monitor (synchronization) wikipedia, lookup

Java ConcurrentMap wikipedia, lookup

Smith–Waterman algorithm wikipedia, lookup

Operational transformation wikipedia, lookup

C Sharp (programming language) wikipedia, lookup

Thread (computing) wikipedia, lookup

Stream processing wikipedia, lookup

General-purpose computing on graphics processing units wikipedia, lookup

Transcript
The Fast Evaluation of Hidden
Markov Models on GPU
Presented by Ting-Yu Mu & Wooyoung Lee
Introduction

Hidden Markov Model (HMM):
◦ A statistical method (Probability based)
◦ Used in a wide range of applications:
 Speech recognition
 Computer vision
 Medical image analysis

One of the problems need to be solved:
◦ Evaluate the probability of an observation
sequence on a given HMM (Evaluation)
◦ The solution of above is the key to choose the
best matched models among the HMMs
Introduction – Example
Application: Speech Recognition
 Goal: Recognize words one by one
 Input:

◦ The speech signal of a given word →
 Represented as a time sequence of coded spectral
vectors

Output:
◦ The observation sequence →
 Represented as an index indicator in the spectral
codebook
Introduction – Example

The tasks:
◦ Design individual HMMs for each word of
vocabulary
◦ Perform unknown word recognition:
 Using the solution of evaluation problem to score
each HMM based on the observation sequence of
the word
 The model scores the highest is selected as the
result

The accuracy:
◦ Based on the correctness of the chosen result
Computational Load

Computational load of previous example
consists of two parts:
◦ Estimate the parameters of HMMs to build the
models, and the load varies upon each HMM
 Executed only one time
◦ Evaluate the probability of an observation
sequence on each HMM
 Executed many times on recognition process

The performance is depend on the
complexity of the evaluation algorithm
Efficient Algorithm

The lower order of complexity:
◦ Forward-backward algorithm
 Consists of two passes:
 Forward probability
 Backward probability
 Used extensively
 Computational intensive
◦ One way to increase the performance:
 Design the parallel algorithm
 Utilizing the present day’s multi-core systems
General Purpose GPU

Why choose Graphic Processing Unit
◦ Rapid increases in the performance
 Supports floating-points operations
 Fast computational power/memory bandwidth
GPU is specialized for
compute-intensive and
highly parallel
computation
More transistors are
devoted to data
processing rather that
data caching
CUDA Programming Model

The GPU is seen as a compute device to
execute part of the application that:
◦ Has to be executed multiple times
◦ Can be isolated as a function
◦ Works independently on different data
Such a function can be compiled to run on
the device. The resulting program is called
a Kernel
 The batch of threads that executes a
kernel is organized as a grid of blocks

CUDA Programming Model

Thread Block:
◦ Contains the batch of threads that can be
cooperate together:
 Fast shared memory
 Synchronizable
 Thread ID
◦ The block can be one-, two-, or threedimensional arrays
CUDA Programming Model

Grid of Thread Block:
◦ Contains the limited number of threads in a
block
◦ Allows larger numbers of thread to execute
the same kernel with one invocation
◦ Blocks identifiable through block ID
◦ Leads to a reduction in thread cooperation
◦ Blocks can be one- or two-dimensional arrays
CUDA Programming Model
CUDA Memory Model
Parallel Algorithm on GPU

The tasks of computing the evaluation
probability is split into pieces and
delivered to several threads
◦ A thread block evaluates a Markov model
◦ Calculating the dimension of the grid:
 Obtained by dividing the number of states N by the
block size
◦ Forward probability is computed by a thread
within a thread block
◦ Needs to synchronize the threads due to:
 Shared data
CUDAfy.NET
What is CUDAfy.Net?
Made by Hybrid DSP Systems in
Netherlands
 a set of libraries and tools that permit
general purpose programming of NVIDIA
CUDA GPUs from the Microsoft .NET
framework.
 combining flexibility, performance and
ease of use
 First release: March 17, 2011

Cudafy.NET SDK

Cudafy .NET Library
◦ Cudafy Translator (Convert .NET code to
CUDA C)
◦ Cudafy Library (CUDA support for .NET)
◦ Cudafy Host (Host GPU wrapper)
◦ Cudafy Math (FFT + BLAS)

The translator converts .NET code into
CUDA code. Based on ILSPY (Open
Source .NET assembly browser and
decompiler)
Cudafy Translator
GENERAL CUDAFY PROCESS

Two main components to the Cudafy SDK:
◦ Translation from .NET to CUDA C and compiling using NVIDIA
compiler (this results in a Cudafy module xml file)
◦ Loading Cudafy modules and communicating with GPU from host

It is not necessary for the target machine to perform the first step
above.
◦ 1. Add reference to Cudafy.NET.dll from your .NET project
◦ 2. Add the Cudafy, Cudafy.Host and Cudafy.Translator namespaces
to source files (using in C#)
◦ 3. Add a parameter of GThread type to GPU functions and use it to
access thread, block and grid information as well as specialist
synchronization and local shared memory features.
◦ 4. Place a Cudafy attribute on the functions.
◦ 5. In your host code before using the GPU functions call
Cudafy.Translator.Cudafy( ). This returns a Cudafy Module instance.
◦ 6. Load the module into a GPGPU instance. The GPGPU type allows
you to interact seamlessly with the GPU from your .NET code.
Development Requirement
NVIDIA CUDA Toolkit 4.1
 Visual Studio 2010

◦ Microsoft VC++ Compiler (used by NVIDIA
CUDA Compiler)
Windows( XP SP3, VISTA, 7 32bit/64bit)
 NVIDIA GPUs
 NVIDIA Graphic Driver

GPU vs PLINQ vs LINQ
GPU vs PLINQ vs LINQ
Reference
ILSpy : http://wiki.sharpdevelop.net/ilspy.ashx
Cudafy.NET : http://cudafy.codeplex.com/
Using Cudafy for GPGPU Programming in .NET :
http://www.codeproject.com/Articles/202792/Usi
ng-Cudafy-for-GPGPU-Programming-in-NET
 Base64 Encoding on a GPU :
http://www.codeproject.com/Articles/276993/Bas
e64-Encoding-on-a-GPU
 High Performance Queries: GPU vs LINQ vs
PLINQ :
http://www.codeproject.com/Articles/289551/Hig
h-Performance-Queries-GPU-vs-PLINQ-vs-LINQ


