Download Game Programming Practice

Document related concepts
no text concepts found
Transcript
GPU Programming
Yanci Zhang
Game Programming Practice
Outline
 Parallel computing
 GPU overview
 OpenGL shading language overview
 Vertex / Geometry / Fragment shader
 Using GLSL in OpenGL
 Application: Per-pixel shading
Game Programming Practice
Why Parallel Computing?
 Performance of CPU increased 50% per year from 1986
to 2002
 Simply wait for the next generation of CPU in order to obtain
increased performance
 Single-processor performance improvement slowed
down to 20% since 2002
 The road to rapidly increasing performance lay in the
direction of parallelism
Game Programming Practice
Why Parallel Computing?
 Performance of CPU increased 50% per year from 1986
to 2002
 Simply wait for the next generation of CPU in order to obtain
increased performance
 Single-processor performance improvement slowed
down to 20% since 2002
 The road to rapidly increasing performance lay in the
direction of parallelism
 Put multiple processors on a single circuit rather than
developing ever-faster monolithic processor
Game Programming Practice
What is GPU ?
 GPU: Graphics Processing Unit
 Developed rapidly from being primitive drawing
devices to being major computing resources




Extremely powerful and flexible processor
Tremendous memory bandwidth and computational power
High level languages have emerged
Capable of general-purpose computation beyond graphics
applications
Game Programming Practice
Motivation
 In many respects GPU is more powerful than CPU
 Computational power: FLOPS (Floating point Operations Per
Second)
 Parallelism
 Bandwidth
 Performance growth rate
Game Programming Practice
Floating Point Calculation
 FLOPS: A common benchmark measurement for rating
the speed of FPU
 CPU
 Intel Core i7 980 XE (quad-core): 107.55 GFLOPS
 GPU




nVidia GeForce GTX 480: 2.02 TFLOPS
Modern GPUs support high precision
32-bit floating point throughout the pipeline
No support for a double precision format
Game Programming Practice
Parallelism
 Parallelism: allows simultaneous operations at the
same time
 CPU
 Do not adequately exploit parallelism
 Dual-core, quad-core
 GPU
 GeForce GTX 480: 512 kernels
Game Programming Practice
Bandwidth
 Peak performance of computer systems is often far in
excess of actual application performance
 The bandwidth between key components ultimately
dictates system performance
 CPU
 64bits DDR3-2133 dual-channel: 17GB/s
 GPU
 GeForce GTX 480: 384bits, 177.4GB/s
Game Programming Practice
Getting Faster and Faster
 CPU
 Annual growth ~ 1.5x -> decade growth ~60x
 Moore’s law
 GPU
 Annual growth ~2.0x -> decade growth > 1000x
 Faster than Moore’s law
 Multi-billion dollar video game market is a pressure cooker that
drives innovation
Game Programming Practice
Keys to High-Perf. Computing
 Efficient computation
 Maximize the hardware devoted to computation
 Allow parallelism
 Task parallelism
 Data parallelism
 Instruction parallelism
 Ensure each computation unit operates at maximum efficiency
Game Programming Practice
Keys to High-Perf. Computing
 Efficient communication
 Simply providing large amounts of computation is not sufficient
 PEs often spend most of the time waiting for data
 Minimize off-chip communication
Game Programming Practice
Stream Programming Model
 A programming model allowing high efficiency in
computation and communication
 Two basic components
 Stream
 All data is represented as a stream
 An ordered set of data of the same data type
 Kernels: operations on streams
 Applications are constructed by chaining multiple
kernels together
Game Programming Practice
Kernel
 Operates on entire streams of elements and produces
new streams
 Within a kernel, computations on one stream element
are never dependent on computations on another
element
 Input elements and intermediate computed data are stored
locally
 Fits perfectly onto data-parallel hardware
Game Programming Practice
Efficient Computation (1)
 Use of transistors can be divided to three categories:
 Control: direct the computation
 Datapath: perform computation
 Storage: store data
Game Programming Practice
Efficient Computation (2)
 Only simple control flow in kernel execution
 Devote most of transistors to datapath hardware rather than
control hardware
 Streams expose parallelism in the application
 Allows a hardware implementation to specialize
hardware
Game Programming Practice
Efficient Communication
 Off-chip communication is efficient
 Intermediate results between kernels are kept on-chip
to minimize off-chip communication
 High degree of latency tolerance
Game Programming Practice
Instruction-Stream-Based (CPU)
 Prescribes both the operation to be executed and the
required data
 Only a limited prefetch of the input data can occur
 Jumps are expected in the instruction stream
 L2 cache consumes lots of the transistors in CPU
Game Programming Practice
Data-Stream-Based (GPU)
 Separates two tasks:
 Configuring PEs
 Controlling data-flow to and from PEs
 Data elements can be assembled from memory before
processing
 Uses only small caches and devotes the majority of
transistors to computation
Game Programming Practice
Mapping Pipeline to Stream Model
 The stream formulation of the graphics pipeline
 All data as streams
 All computation as kernels
 Both user-programmable and nonprogrammable stages can be
expressed as kernels
Game Programming Practice
Fixed vs. Programmable
 Fixed
 Very fast
 Can not modify the pipeline, only can turn on/off some
functions
 Hard to implement advanced techniques on GPU
 Programmable
 Allows programmers to write shaders to change the pipeline
Game Programming Practice
Basic Programmable Graphics Hardware
 Three programmable kernels in
pipeline
 Vertex shader
 Geometry shader
 Pixel shader
 Load shaders through graphics
API
 The fixed pipeline are replaced
by shaders
Game Programming Practice
OpenGL 4.3 Pipelines
OpenGL 4.3
Pipelines
GPGPU programming pipeline
graphics rendering pipeline
Game Programming Practice
Vertex Processor
 MIMD: Multiple Instruction stream, Multiple Data
stream
 A number of processors that function asynchronously and
independently
Game Programming Practice
Vertex Shader: Basic Function
 Operate on a single input vertex and produce a single
output vertex
 Replace transformation & lighting unit
 Now you have to do everything by yourself
 Transformation
 Lighting
 Texture coordinates generation
 As a minimum, a vertex shader must output vertex
position in homogeneous clip space
Game Programming Practice
Vertex Shader: Advanced Function
 What else we can do?
 Displacement mapping
 Object deformation
 Vertex blending
Game Programming Practice
Vertex Shader: Limitations
 We can not




Add or delete any vertices
Change the primitive type
Change the order of vertices form the primitives
No knowledge of the type of primitive and neighboring vertices
Game Programming Practice
Fragment Processor
 SIMD: Single Instruction, Multiple Data
 Achieves data level parallelism
 “get this pixel, get the next one” -> “get lots of pixel”
Game Programming Practice
Fragment Shader: Basic Function
 Invoked once for each fragment covered by the
primitive
 Computes the final pixel color and depth
 Can output up to 8 32-bit 4-component data for the
current pixel location
Game Programming Practice
Fragment Shader: Advanced Function
 Enables rich shading techniques
 Per-pixel lighting, bump mapping, normal mapping
 Fluid simulation
…
Game Programming Practice
Fragment Shader: Limitations
 Dynamic branching less efficient than vertex proc.
 Can not change the screen coordinate of a fragment
 No arbitrary memory write
Game Programming Practice
Geometry Shader
 New for 2007
 Executed after vertex shaders
 Input: whole primitive, possibly
information
with
adjacent
 Invoked once for every primitive
 Output: multiple vertices forming a single selected
topology (tristrip, linestrip, pointlist)
 Output may be fed to rasterizer and/or to a vertex
buffer in memory
Game Programming Practice
Geometry Shader: Applications
 Point Sprite Expansion
 Single Pass Render-to-Cubemap
 Dynamic Particle Systems
 Fur/Fin Generation
 Shadow Volume Generation
Game Programming Practice
Programmable GPUs: Applications
 Graphics applications
 Per-pixel lighting
 Ray tracing
 Deformation
 GPGPU




Computer vision
Physically-based simulation
Image processing
Database queries
Game Programming Practice
GPGPU
 General-purpose Computation on GPUs
 Capable of performing more than the specific graphics
computations
 Goal: make the inexpensive power of the GPU available to
developers as a sort of computational coprocessor
 Example applications range from in-game physics simulation to
conventional computational science
Game Programming Practice
Shading Language
 Production rendering
 Geared towards maximum image quality
 Example: RenderMan
 Real-time rendering
 GLSL: OpenGL shading language
 HLSL: DirectX High-level shading language
 CG: C for Graphic, NVidia
Game Programming Practice
OpenGL Shading Language
 High level shading language based on C
 Not a hardware-specific language
 Cross platform compatibility on multiple OS
 Each hardware vender includes GLSL compiler in their
driver
Game Programming Practice
Before Using GLSL
 Check whether your GPU supports GLSL
 GLSL is part of OpenGL 2.0
 If OpenGL 2.0 is not available, then use OpenGL extensions
Game Programming Practice
Extensions Required
 GL_ARB_shader_object
 Adds API calls that are necessary to manage shader objects and
program objects
 GL_ARB_fragment_shader
 Adds functionality to define fragment shader objects
 GL_ARB_vertex_shader
 Adds functionality to define vertex shader objects
Game Programming Practice
GLEW 1/2
 GLEW: The OpenGL Extension Wrangler Library
(http://glew.sourceforge.net/)
 Initialize GLEW
#include <GL/glew.h>
#include <GL/glut.h>
...
glutInit(&argc, argv);
glutCreateWindow("GLEW Test");
GLenum err = glewInit();
if (GLEW_OK != err)
{
/* Problem: glewInit failed, something is seriously wrong. */
fprintf(stderr, "Error: %s\n", glewGetErrorString(err));
...
}
Game Programming Practice
GLEW 2/2
 Check extensions
if (GLEW_ARB_vertex_shader)
{
/* It is safe to use the GL_ARB_vertex_shader extension here. */
}
 Check core OpenGL functionality
if (GLEW_VERSION_2_0)
{
/* Yay! OpenGL 2.0 is supported! */
}
Game Programming Practice
Data Types
 Scalar
 bool, int, float
 Vector
 Supports 2D, 3D, 4D vector: vec{2,3,4}, ivec{2,3,4}, bvec{2,3,4}
 Matrix
 Square matrix: mat2, mat3, mat4
 mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, mat4x3
 Texture
 sampler1D, sampler2D, sampler3D
 samplerCube
 sampler1DShadow, sampler2DShadow
Game Programming Practice
Variables 1/3
 Pretty much the same as in C
float a,b;
// two float variables (the comments are like in C)
int
c = 2;
// initialize a variable when declaring it
vec3 g = vec3(1.0,2.0,3.0); //declare and initialize a vector
 Flexible when
variables
initializing
variables
using
other
vec2 a = vec2(1.0,2.0);
vec2 b = vec2(3.0,4.0);
vec4 c = vec4(a,b) // c = vec4(1.0,2.0,3.0,4.0);
Game Programming Practice
Variables 2/3
 Flexible when accessing a vector
 {x, y, z, w}: accessing vectors that represent points or normals
 {r, g, b, a}: accessing vectors that represent colors
 {s, t, p, q}: accessing vectors that represent texture coordinates
Game Programming Practice
Variables 3/3
 Accessing components beyond those declared for the
vector type is an error
vec4 a = vec4(1.0, 2.0, 3.0, 4.0);
float posX = a.x;
//posX = 1.0
float posY = a[1];
//posY = 2.0
float depth = a.w;
//depth = 4.0
Vec3 b = a.xxy;
// b = vec3(1.0, 1.0, 2.0)
Vec3 c = a.bra;
// b = vec3(3.0, 1.0, 4.0)
vec2 t = vec2(1.0, 2.0);
float tt = t.z;
//incorrect!
Game Programming Practice
Vector and Matrix Operations
 Operations are component-wise
vec3 u, v, w;
float f;
mat3 a1, a2, a3;
u.x = v.x + f;
u.y = v.y + f;
u.z = v.z + f;
u = v+ f;
u.x = v.x + w.x;
u.y = v.y + w.y;
u.z = v.z + w.z;
u = v + w;
u = v * a1;
a1 = a2 * a3;
u.x = dot(v, a1[0]);
u.y = dot(v, a1[1]);
u.z = dot(v, a1[2]);
Game Programming Practice
Control Flow Statements
 selection (if-else)
 iteration (for, while, and do-while)
 jumps (discard, return, break, and continue)
 discard is only allowed within fragment shaders
 discard causes the fragment to be discarded and no updates to
any buffers will occur
if (depth > 0.5)
discard;
Game Programming Practice
Function Definition
 The function main() is used as the entry point to a
shader executable
returnType functionName (type0 arg0, type1 arg1, ..., typen argn)
{
// do some computation
return returnValue;
}
Game Programming Practice
Important Build-in Variables 1/2
 gl_Position (vec4)
 Output of vertex shader
 Homogeneous vertex position
 Must write a value into this variable
 gl_FragCoord (vec4)
 Holds the window relative coordinates x, y, z, and 1/w values
for the fragment
 Read-only variable in fragment shader
Game Programming Practice
Important Build-in Variables 2/2
 gl_FragColor (vec4)
 Output of fragment shader
 Writing to gl_FragColor specifies the fragment color
 gl_FragDepth (float)
 Output of fragment shader
 Default value: gl_FragCoord.z
 If you write to gl_FragDepth, then it is your responsibility for
always writing it
Game Programming Practice
Build-in Functions
 Angle and trigonometry functions
 sin, cos, asin, acos …
 Exponential functions
 pow, exp, sqrt …
 Common functions
 abs, clamp, smoothstep …
 Geometric functions
 length, dot, cross …
Game Programming Practice
Build-in Functions
 Matrix functions
 outerProduct, transpose …
 Vector relational functions
 lessThan, equal …
 Texture lookup functions
 texture2D, texture2DLod…
 Fragment processing functions
 Noise functions
Game Programming Practice
Important Build-in Functions
 ftransform()
 For vertex shaders only
 Produces exactly the same result as would be produced by
OpenGL’s fixed functionality transform
gl_Position = ftransform()
 reflect(vec3 I, vec3 N)
 Computes reflection vector by incident vector I and normal
vector N
Game Programming Practice
First Example
 Vertex shader
void main()
{
gl_Position = ftransform();
}
 Fragment shader
void main()
{
gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0);
}
Game Programming Practice
Make Fun of Fragment Shader
void main()
{
vec4 t = vec4(1.0, 0.6, 0.3, 0.0);
gl_FragColor = t.xxxx;
//flexible vector accessing
}
void main()
{
gl_FragColor = vec4(gl_FragCoord.zzz, 1.0); //let’s view the depth map
}
void main()
{
if (gl_FragCoord.x > 320) discard;
//try discard
gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0);
}
Game Programming Practice
More Build-in Variables
 Vertex shader build-in attributes
 gl_Vertex, gl_Normal, gl_Color, gl_MultiTexCoord[] …
 Vertex shader build-in output variables
 gl_FrontColor, gl_TexCoord[] …
 Fragment shader build-in input variables
 gl_Color, gl_TexCoord[] …
 Built-In uniform state
 gl_ModelViewMatrix, gl_ProjectionMatrix …
Game Programming Practice
Example: Using Build-in Matrixes
void main()
{
gl_Position = ftransform();
}
void main()
{
gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;
}
void main()
{
gl_Position = gl_ModelViewMatrix * gl_Vertex;
gl_Position = gl_ProjectionMatrix * gl_Position;
}
Game Programming Practice
Example: Using Colors
 Vertex shader
void main()
{
gl_Position = ftransform();
gl_FrontColor = gl_Color;
}
 Fragment shader
void main()
{
gl_FragColor = gl_Color;
}
Game Programming Practice
Example: Using Texture Coordinates
 Vertex shader
void main()
{
gl_Position = ftransform();
gl_TexCoord[0] = vec4(gl_MultiTexCoord0.xy, 1.0, 0.0);
}
 Fragment shader
void main()
{
gl_FragColor = gl_TexCoord[0];
}
Game Programming Practice
gl_NormalMatrix
 Important to per-vertex and per-pixel lighting
 Transpose of the inverse of the upper leftmost 3x3 of
gl_ModelViewMatrix
 Converts normal vector from object space to eye space
Game Programming Practice
View Normal Vectors
 Vertex shader
void main()
{
gl_Position = ftransform();
gl_FrontColor = vec4(gl_Normal, 1.0);
}
void main()
{
gl_Position = ftransform();
gl_FrontColor = vec4(gl_NormalMatrix * gl_Normal, 1.0);
}
 Fragment shader
void main()
{
gl_FragColor = gl_Color;
}
Game Programming Practice
Communications
 Communication between OpenGL and shader
 One way communication
 Use uniform qualifier when declaring variables
 Communication between vertex and fragment shader
 Use varying qualifier when declaring variables
Game Programming Practice
Uniform
 Used to declare global variables
 Variable values are the same across the entire
primitive being processed
 Read-only
 Initialized externally either at link time or through the
API
uniform vec4 lightPosition;
uniform vec3 color = vec3(0.7, 0.7, 0.2); // value assigned at link time
Game Programming Practice
OpenGL Setup
Game Programming Practice
Creating Shader Object
_ShaderID = glCreateShader(GL_VERTEX_SHADER);
if (_ShaderID == 0) //glCreateShader() return 0 if it fails to create a shader object
{
printf("Fail to create shader object!\n");
exit(-1);
}
//load the shader source file to a string _pShaderSource
glShaderSource(_ShaderID, 1, (const GLchar **)&_pShaderSource, &fileLen);
CheckGLError(__FILE__, __LINE__);
glCompileShader(_ShaderID);
glGetShaderiv(_ShaderID, GL_COMPILE_STATUS, &ShaderStatus);
if (ShaderStatus == GL_FALSE)
{
printf("Fail to compile the shader: %s\n", vFileName);
exit(-1);
}
Game Programming Practice
Creating Program Object
_ProgramID = glCreateProgram();
if (_ProgramID == 0)
{
printf("Fail to create shader program object!\n");
exit(-1);
}
glAttachShader(_ProgramID, VertexShaderID); //attach vertex shader
CheckGLError(__FILE__, __LINE__);
glAttachShader(_ProgramID, FragShaderID); //attach fragment shader
CheckGLError(__FILE__, __LINE__);
glLinkProgram(_ProgramID);
glGetProgramiv(_ProgramID, GL_LINK_STATUS, &ProgramStatus);
if (ProgramStatus == GL_FALSE)
{
printf("Fail to link the program!\n");
exit(-1);
}
glUseProgram(_ProgramID);
Game Programming Practice
Initialize Uniform Variables
 Suppose an uniform variable is declared in shader:
uniform vec3 u_Color;
 Initialize uniform variable by OpenGL
loc = glGetUniformLocation(_ProgramID, “u_Color”);
if (loc == -1)
{
cout << "Error: can't find uniform variable! \n";
}
glUniform3f(loc, v0, v1, v2);
Game Programming Practice
Application: Per-Pixel Shading
 Three types of light in OpenGL
 Ambient light
 Diffuse light
 Specular light
 Fixed pipeline conducts vertex-based shading
 Fast but poor quality
 Per-pixel shading is possible by
programmable ability of modern GPU
utilizing
the
Game Programming Practice
Assignment
 Add specular light
Game Programming Practice