Parallel Programming With CUDA Tutorial (Part-3: Matrix Multiplication)

Why This Problem?

Shared Resource:

Problem With Shared Resource:

  1. Avoid critical section altogether. This is the ideal solution.
  2. But sometimes it is not possible to avoid critical sections. In that case, we would require thread synchronization.

Sequential Matrix Multiplication

void cpuMatMul(int N,double *x, double *y, double *ans)
{
for(int i=0;i<N;i++)
{
for(int j=0;j<N;j++)
{
for(int k=0;k<N;k++)
{
ans[i*N+j]+=(x[i*N+k]*y[k*N+j]);
}
}
}
}

Parallel Matrix Multiplication

__global__
void GPUmatmul(int N, double *x, double *y, double *ans)
{
//calculates unique thread ID in the block
int t= (blockDim.x*blockDim.y)*threadIdx.z
+(threadIdx.y*blockDim.x)+(threadIdx.x);
//calculates unique block ID in the grid
int b= (gridDim.x*gridDim.y)*blockIdx.z
+(blockIdx.y*gridDim.x)+(blockIdx.x);
//block size (this is redundant though)
int T= blockDim.x*blockDim.y*blockDim.z;
//grid size (this is redundant though)
int B= gridDim.x*gridDim.y*gridDim.z;
//Each cell in the matrix is assigned to a different thread.
//Each thread do O(N*number of asssigned cell) computation.
//Assigned cells of different threads does not overlape with
//each other. And so no need for synchronization.
for (int i=b;i<N;i+=B)
{
for(int j=t;j<N;j+=T)
{
for(int k=0;k<N;k++)
{
ans[i*N+j]+=(x[i*N+k]*y[k*N+j]);
}
}
}
}
t is the thread number of a thread inside a particular block. 
b is the block number of a block inside the grid.
B is total number of block.
T is total number of threads per block.
for (int i=b;i<N;i+=B)
for(int j=t;j<N;j+=T)
Example: Let B=2 and T=2. Size of ans is 3*3.
(Thread 0 of block 0): will do computation on (0,0)(0,2)(2,0)(2,2).
(Thread 1 of block 0): will do computation on (0,1)(2,1).
(Thread 0 of block 1): will do computation on (1,0)(1,2).
(Thread 1 of block 1): will do computation on (1,1).
As we can see if N is perfectly devisible by both T and B each threads will be assigned equal number of cells.
GPUmatmul<<<dim3(16,16,16), dim3(16,8,8)>>>(N, x, y,ans);
cudaDeviceSynchronize();

Comparison

Code

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store