OpenMP Matrix Multiplication Issues

Question

I am trying to multiple the values of a matrix. I have both a parallel and sync version. But the parallel version is longer than the sync version. Current the sync takes around 90 seconds and the parallel over 100. Which makes no sense to me. My logic was to split the matrix into 4 parts from the first 4

Accepted Answer

Here it is:// Syncvoid matrixMulti() {    #pragma omp parallel for collapse(2)    for(int row = 0 ; row < N ; row++){        for(int col = 0; col < N ; col++){            double resultValue = 0;            for(int transNumber = 0 ; transNumber < N ; transNumber++) {                resultValue += firstMatrix [row] [transNumber] * secondMatrix [transNumber] [col] ;            }            matrixMultiResult [row] [col] = resultValue;                }    }}Update: Here is what I got on an 8 core system using gcc 10.3 -O3 -fopenmp flags (I show you the program&#8217;s output and result of linux time command) :main() was changed to measure the time with omp_get_wtime() because in linux clock() measures processor time:double t1 = omp_get_wtime(); matrixMulti();    double t2 = omp_get_wtime(); printf("time: %f", t2-t1);Serial program:time: 25.895234real    0m33.296suser    0m33.139ssys     0m0.152susing: #pragma omp parallel for time: 3.573521 real    0m11.120s user    0m32.205s sys     0m0.136susing: #pragma omp parallel for collapse(2)time: 5.466674real    0m12.786suser    0m49.978ssys     0m0.248sThe results suggest that initialization of matrix takes ca. 8 s, so it may also be worth parallelizing. Without collapse(2) the program runs faster, so do not use collapse(2) clause.Note that on your system you may got different speed improvement or even decrease depending on your hardware. Speed of matrix multiplication strongly depends on the speed of memory read/write. Shared-Memory Multicore systems (i.e most PCs, laptops) may not show any speed increase upon parallelization of this program, but Distributed-Memory Multicore systems (i.e. high-end serves) definitely show performance increase. For more details please read e.g. this.Update2: On Ryzen 7 5800X I got 41.6 s vs 1.68 s, which is a bigger increase than the number of cores. It is because more cache memory is available when all the cores is used.

Advertisement

Answer