Unit 3 - Matrix Multiplication

What are the two levels of hierarchy in CUDA’s execution model?
What built-in CUDA variable gives the index of a thread within a block?
What built-in CUDA variable gives the index of a block within a grid?
How do you specify the grid and block dimensions in a CUDA kernel call?
How many dimensions can CUDA grids and blocks have?
What is the maximum number of threads per block in CUDA?
What is the maximum gridDim.x in CUDA?
What is the maximum gridDim.y and gridDim.z in CUDA?
What is the default value for the y and z dimensions when using a 1D dim3 grid?
What happens if a CUDA grid is too large for the hardware?

What kind of problems benefit from a 2D thread grid?
How do you compute the global row index of a thread in a 2D block?
How do you compute the global column index of a thread in a 2D block?
Why do CUDA image processing kernels often launch more threads than necessary?
What does an if-condition typically check in a CUDA image processing kernel?
What happens if threads access out-of-bounds memory in CUDA?
How does CUDA handle dynamically allocated 2D arrays?