Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
GPU Programming Yanci Zhang Game Programming Practice Outline Parallel computing GPU overview OpenGL shading language overview Vertex / Geometry / Fragment shader Using GLSL in OpenGL Application: Per-pixel shading Game Programming Practice Why Parallel Computing? Performance of CPU increased 50% per year from 1986 to 2002 Simply wait for the next generation of CPU in order to obtain increased performance Single-processor performance improvement slowed down to 20% since 2002 The road to rapidly increasing performance lay in the direction of parallelism Game Programming Practice Why Parallel Computing? Performance of CPU increased 50% per year from 1986 to 2002 Simply wait for the next generation of CPU in order to obtain increased performance Single-processor performance improvement slowed down to 20% since 2002 The road to rapidly increasing performance lay in the direction of parallelism Put multiple processors on a single circuit rather than developing ever-faster monolithic processor Game Programming Practice What is GPU ? GPU: Graphics Processing Unit Developed rapidly from being primitive drawing devices to being major computing resources Extremely powerful and flexible processor Tremendous memory bandwidth and computational power High level languages have emerged Capable of general-purpose computation beyond graphics applications Game Programming Practice Motivation In many respects GPU is more powerful than CPU Computational power: FLOPS (Floating point Operations Per Second) Parallelism Bandwidth Performance growth rate Game Programming Practice Floating Point Calculation FLOPS: A common benchmark measurement for rating the speed of FPU CPU Intel Core i7 980 XE (quad-core): 107.55 GFLOPS GPU nVidia GeForce GTX 480: 2.02 TFLOPS Modern GPUs support high precision 32-bit floating point throughout the pipeline No support for a double precision format Game Programming Practice Parallelism Parallelism: allows simultaneous operations at the same time CPU Do not adequately exploit parallelism Dual-core, quad-core GPU GeForce GTX 480: 512 kernels Game Programming Practice Bandwidth Peak performance of computer systems is often far in excess of actual application performance The bandwidth between key components ultimately dictates system performance CPU 64bits DDR3-2133 dual-channel: 17GB/s GPU GeForce GTX 480: 384bits, 177.4GB/s Game Programming Practice Getting Faster and Faster CPU Annual growth ~ 1.5x -> decade growth ~60x Moore’s law GPU Annual growth ~2.0x -> decade growth > 1000x Faster than Moore’s law Multi-billion dollar video game market is a pressure cooker that drives innovation Game Programming Practice Keys to High-Perf. Computing Efficient computation Maximize the hardware devoted to computation Allow parallelism Task parallelism Data parallelism Instruction parallelism Ensure each computation unit operates at maximum efficiency Game Programming Practice Keys to High-Perf. Computing Efficient communication Simply providing large amounts of computation is not sufficient PEs often spend most of the time waiting for data Minimize off-chip communication Game Programming Practice Stream Programming Model A programming model allowing high efficiency in computation and communication Two basic components Stream All data is represented as a stream An ordered set of data of the same data type Kernels: operations on streams Applications are constructed by chaining multiple kernels together Game Programming Practice Kernel Operates on entire streams of elements and produces new streams Within a kernel, computations on one stream element are never dependent on computations on another element Input elements and intermediate computed data are stored locally Fits perfectly onto data-parallel hardware Game Programming Practice Efficient Computation (1) Use of transistors can be divided to three categories: Control: direct the computation Datapath: perform computation Storage: store data Game Programming Practice Efficient Computation (2) Only simple control flow in kernel execution Devote most of transistors to datapath hardware rather than control hardware Streams expose parallelism in the application Allows a hardware implementation to specialize hardware Game Programming Practice Efficient Communication Off-chip communication is efficient Intermediate results between kernels are kept on-chip to minimize off-chip communication High degree of latency tolerance Game Programming Practice Instruction-Stream-Based (CPU) Prescribes both the operation to be executed and the required data Only a limited prefetch of the input data can occur Jumps are expected in the instruction stream L2 cache consumes lots of the transistors in CPU Game Programming Practice Data-Stream-Based (GPU) Separates two tasks: Configuring PEs Controlling data-flow to and from PEs Data elements can be assembled from memory before processing Uses only small caches and devotes the majority of transistors to computation Game Programming Practice Mapping Pipeline to Stream Model The stream formulation of the graphics pipeline All data as streams All computation as kernels Both user-programmable and nonprogrammable stages can be expressed as kernels Game Programming Practice Fixed vs. Programmable Fixed Very fast Can not modify the pipeline, only can turn on/off some functions Hard to implement advanced techniques on GPU Programmable Allows programmers to write shaders to change the pipeline Game Programming Practice Basic Programmable Graphics Hardware Three programmable kernels in pipeline Vertex shader Geometry shader Pixel shader Load shaders through graphics API The fixed pipeline are replaced by shaders Game Programming Practice OpenGL 4.3 Pipelines OpenGL 4.3 Pipelines GPGPU programming pipeline graphics rendering pipeline Game Programming Practice Vertex Processor MIMD: Multiple Instruction stream, Multiple Data stream A number of processors that function asynchronously and independently Game Programming Practice Vertex Shader: Basic Function Operate on a single input vertex and produce a single output vertex Replace transformation & lighting unit Now you have to do everything by yourself Transformation Lighting Texture coordinates generation As a minimum, a vertex shader must output vertex position in homogeneous clip space Game Programming Practice Vertex Shader: Advanced Function What else we can do? Displacement mapping Object deformation Vertex blending Game Programming Practice Vertex Shader: Limitations We can not Add or delete any vertices Change the primitive type Change the order of vertices form the primitives No knowledge of the type of primitive and neighboring vertices Game Programming Practice Fragment Processor SIMD: Single Instruction, Multiple Data Achieves data level parallelism “get this pixel, get the next one” -> “get lots of pixel” Game Programming Practice Fragment Shader: Basic Function Invoked once for each fragment covered by the primitive Computes the final pixel color and depth Can output up to 8 32-bit 4-component data for the current pixel location Game Programming Practice Fragment Shader: Advanced Function Enables rich shading techniques Per-pixel lighting, bump mapping, normal mapping Fluid simulation … Game Programming Practice Fragment Shader: Limitations Dynamic branching less efficient than vertex proc. Can not change the screen coordinate of a fragment No arbitrary memory write Game Programming Practice Geometry Shader New for 2007 Executed after vertex shaders Input: whole primitive, possibly information with adjacent Invoked once for every primitive Output: multiple vertices forming a single selected topology (tristrip, linestrip, pointlist) Output may be fed to rasterizer and/or to a vertex buffer in memory Game Programming Practice Geometry Shader: Applications Point Sprite Expansion Single Pass Render-to-Cubemap Dynamic Particle Systems Fur/Fin Generation Shadow Volume Generation Game Programming Practice Programmable GPUs: Applications Graphics applications Per-pixel lighting Ray tracing Deformation GPGPU Computer vision Physically-based simulation Image processing Database queries Game Programming Practice GPGPU General-purpose Computation on GPUs Capable of performing more than the specific graphics computations Goal: make the inexpensive power of the GPU available to developers as a sort of computational coprocessor Example applications range from in-game physics simulation to conventional computational science Game Programming Practice Shading Language Production rendering Geared towards maximum image quality Example: RenderMan Real-time rendering GLSL: OpenGL shading language HLSL: DirectX High-level shading language CG: C for Graphic, NVidia Game Programming Practice OpenGL Shading Language High level shading language based on C Not a hardware-specific language Cross platform compatibility on multiple OS Each hardware vender includes GLSL compiler in their driver Game Programming Practice Before Using GLSL Check whether your GPU supports GLSL GLSL is part of OpenGL 2.0 If OpenGL 2.0 is not available, then use OpenGL extensions Game Programming Practice Extensions Required GL_ARB_shader_object Adds API calls that are necessary to manage shader objects and program objects GL_ARB_fragment_shader Adds functionality to define fragment shader objects GL_ARB_vertex_shader Adds functionality to define vertex shader objects Game Programming Practice GLEW 1/2 GLEW: The OpenGL Extension Wrangler Library (http://glew.sourceforge.net/) Initialize GLEW #include <GL/glew.h> #include <GL/glut.h> ... glutInit(&argc, argv); glutCreateWindow("GLEW Test"); GLenum err = glewInit(); if (GLEW_OK != err) { /* Problem: glewInit failed, something is seriously wrong. */ fprintf(stderr, "Error: %s\n", glewGetErrorString(err)); ... } Game Programming Practice GLEW 2/2 Check extensions if (GLEW_ARB_vertex_shader) { /* It is safe to use the GL_ARB_vertex_shader extension here. */ } Check core OpenGL functionality if (GLEW_VERSION_2_0) { /* Yay! OpenGL 2.0 is supported! */ } Game Programming Practice Data Types Scalar bool, int, float Vector Supports 2D, 3D, 4D vector: vec{2,3,4}, ivec{2,3,4}, bvec{2,3,4} Matrix Square matrix: mat2, mat3, mat4 mat2x3, mat2x4, mat3x2, mat3x4, mat4x2, mat4x3 Texture sampler1D, sampler2D, sampler3D samplerCube sampler1DShadow, sampler2DShadow Game Programming Practice Variables 1/3 Pretty much the same as in C float a,b; // two float variables (the comments are like in C) int c = 2; // initialize a variable when declaring it vec3 g = vec3(1.0,2.0,3.0); //declare and initialize a vector Flexible when variables initializing variables using other vec2 a = vec2(1.0,2.0); vec2 b = vec2(3.0,4.0); vec4 c = vec4(a,b) // c = vec4(1.0,2.0,3.0,4.0); Game Programming Practice Variables 2/3 Flexible when accessing a vector {x, y, z, w}: accessing vectors that represent points or normals {r, g, b, a}: accessing vectors that represent colors {s, t, p, q}: accessing vectors that represent texture coordinates Game Programming Practice Variables 3/3 Accessing components beyond those declared for the vector type is an error vec4 a = vec4(1.0, 2.0, 3.0, 4.0); float posX = a.x; //posX = 1.0 float posY = a[1]; //posY = 2.0 float depth = a.w; //depth = 4.0 Vec3 b = a.xxy; // b = vec3(1.0, 1.0, 2.0) Vec3 c = a.bra; // b = vec3(3.0, 1.0, 4.0) vec2 t = vec2(1.0, 2.0); float tt = t.z; //incorrect! Game Programming Practice Vector and Matrix Operations Operations are component-wise vec3 u, v, w; float f; mat3 a1, a2, a3; u.x = v.x + f; u.y = v.y + f; u.z = v.z + f; u = v+ f; u.x = v.x + w.x; u.y = v.y + w.y; u.z = v.z + w.z; u = v + w; u = v * a1; a1 = a2 * a3; u.x = dot(v, a1[0]); u.y = dot(v, a1[1]); u.z = dot(v, a1[2]); Game Programming Practice Control Flow Statements selection (if-else) iteration (for, while, and do-while) jumps (discard, return, break, and continue) discard is only allowed within fragment shaders discard causes the fragment to be discarded and no updates to any buffers will occur if (depth > 0.5) discard; Game Programming Practice Function Definition The function main() is used as the entry point to a shader executable returnType functionName (type0 arg0, type1 arg1, ..., typen argn) { // do some computation return returnValue; } Game Programming Practice Important Build-in Variables 1/2 gl_Position (vec4) Output of vertex shader Homogeneous vertex position Must write a value into this variable gl_FragCoord (vec4) Holds the window relative coordinates x, y, z, and 1/w values for the fragment Read-only variable in fragment shader Game Programming Practice Important Build-in Variables 2/2 gl_FragColor (vec4) Output of fragment shader Writing to gl_FragColor specifies the fragment color gl_FragDepth (float) Output of fragment shader Default value: gl_FragCoord.z If you write to gl_FragDepth, then it is your responsibility for always writing it Game Programming Practice Build-in Functions Angle and trigonometry functions sin, cos, asin, acos … Exponential functions pow, exp, sqrt … Common functions abs, clamp, smoothstep … Geometric functions length, dot, cross … Game Programming Practice Build-in Functions Matrix functions outerProduct, transpose … Vector relational functions lessThan, equal … Texture lookup functions texture2D, texture2DLod… Fragment processing functions Noise functions Game Programming Practice Important Build-in Functions ftransform() For vertex shaders only Produces exactly the same result as would be produced by OpenGL’s fixed functionality transform gl_Position = ftransform() reflect(vec3 I, vec3 N) Computes reflection vector by incident vector I and normal vector N Game Programming Practice First Example Vertex shader void main() { gl_Position = ftransform(); } Fragment shader void main() { gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0); } Game Programming Practice Make Fun of Fragment Shader void main() { vec4 t = vec4(1.0, 0.6, 0.3, 0.0); gl_FragColor = t.xxxx; //flexible vector accessing } void main() { gl_FragColor = vec4(gl_FragCoord.zzz, 1.0); //let’s view the depth map } void main() { if (gl_FragCoord.x > 320) discard; //try discard gl_FragColor = vec4(1.0, 1.0, 1.0, 1.0); } Game Programming Practice More Build-in Variables Vertex shader build-in attributes gl_Vertex, gl_Normal, gl_Color, gl_MultiTexCoord[] … Vertex shader build-in output variables gl_FrontColor, gl_TexCoord[] … Fragment shader build-in input variables gl_Color, gl_TexCoord[] … Built-In uniform state gl_ModelViewMatrix, gl_ProjectionMatrix … Game Programming Practice Example: Using Build-in Matrixes void main() { gl_Position = ftransform(); } void main() { gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex; } void main() { gl_Position = gl_ModelViewMatrix * gl_Vertex; gl_Position = gl_ProjectionMatrix * gl_Position; } Game Programming Practice Example: Using Colors Vertex shader void main() { gl_Position = ftransform(); gl_FrontColor = gl_Color; } Fragment shader void main() { gl_FragColor = gl_Color; } Game Programming Practice Example: Using Texture Coordinates Vertex shader void main() { gl_Position = ftransform(); gl_TexCoord[0] = vec4(gl_MultiTexCoord0.xy, 1.0, 0.0); } Fragment shader void main() { gl_FragColor = gl_TexCoord[0]; } Game Programming Practice gl_NormalMatrix Important to per-vertex and per-pixel lighting Transpose of the inverse of the upper leftmost 3x3 of gl_ModelViewMatrix Converts normal vector from object space to eye space Game Programming Practice View Normal Vectors Vertex shader void main() { gl_Position = ftransform(); gl_FrontColor = vec4(gl_Normal, 1.0); } void main() { gl_Position = ftransform(); gl_FrontColor = vec4(gl_NormalMatrix * gl_Normal, 1.0); } Fragment shader void main() { gl_FragColor = gl_Color; } Game Programming Practice Communications Communication between OpenGL and shader One way communication Use uniform qualifier when declaring variables Communication between vertex and fragment shader Use varying qualifier when declaring variables Game Programming Practice Uniform Used to declare global variables Variable values are the same across the entire primitive being processed Read-only Initialized externally either at link time or through the API uniform vec4 lightPosition; uniform vec3 color = vec3(0.7, 0.7, 0.2); // value assigned at link time Game Programming Practice OpenGL Setup Game Programming Practice Creating Shader Object _ShaderID = glCreateShader(GL_VERTEX_SHADER); if (_ShaderID == 0) //glCreateShader() return 0 if it fails to create a shader object { printf("Fail to create shader object!\n"); exit(-1); } //load the shader source file to a string _pShaderSource glShaderSource(_ShaderID, 1, (const GLchar **)&_pShaderSource, &fileLen); CheckGLError(__FILE__, __LINE__); glCompileShader(_ShaderID); glGetShaderiv(_ShaderID, GL_COMPILE_STATUS, &ShaderStatus); if (ShaderStatus == GL_FALSE) { printf("Fail to compile the shader: %s\n", vFileName); exit(-1); } Game Programming Practice Creating Program Object _ProgramID = glCreateProgram(); if (_ProgramID == 0) { printf("Fail to create shader program object!\n"); exit(-1); } glAttachShader(_ProgramID, VertexShaderID); //attach vertex shader CheckGLError(__FILE__, __LINE__); glAttachShader(_ProgramID, FragShaderID); //attach fragment shader CheckGLError(__FILE__, __LINE__); glLinkProgram(_ProgramID); glGetProgramiv(_ProgramID, GL_LINK_STATUS, &ProgramStatus); if (ProgramStatus == GL_FALSE) { printf("Fail to link the program!\n"); exit(-1); } glUseProgram(_ProgramID); Game Programming Practice Initialize Uniform Variables Suppose an uniform variable is declared in shader: uniform vec3 u_Color; Initialize uniform variable by OpenGL loc = glGetUniformLocation(_ProgramID, “u_Color”); if (loc == -1) { cout << "Error: can't find uniform variable! \n"; } glUniform3f(loc, v0, v1, v2); Game Programming Practice Application: Per-Pixel Shading Three types of light in OpenGL Ambient light Diffuse light Specular light Fixed pipeline conducts vertex-based shading Fast but poor quality Per-pixel shading is possible by programmable ability of modern GPU utilizing the Game Programming Practice Assignment Add specular light Game Programming Practice