The Jacobi method is a usual method to solve partial equations in HPC situations such as pollution diffuses throughout a pipe applying the Laplace’s equation for diffusion. A way to parallelise this solution when it is coded is by using the geometric decomposition with Epiphany.
From now, the serial version of the code has a runtime of 4256258 ns. We are going to use OpenMP directives to check how faster is going to be the runtime of the parallel version.
Since loops provided a chance to parallel the code, we are going to use the directive above the two loops with the option collapse (2) and declare all the variables involved as private or shared. The reduction also was set to use the value of bnorm outside the parallel region.There is another parallel region defined by loops to find the value of rnorm (where the reduction was applied). The default nonce will force the declaration of private and shared variables. Notice that there is no need to put brackets after the parallel for directives.The parallel for was also applied in the jacobi iteration. In this case, it was not necessary to use the firstprivate variable which takes the previous value before entering to the loop, it can change inside the loop, and have the same value after the loop as it was entered before.Due to the race condition of threads that are running in the program, there is a swapping zone that ought to be controlled by an OpenMP directive one by one. It is possible to achieve this with the single directive to block other threads job while one is executed.* The program was run with 16 threads (export OMP_NUM_THREADS=16) and different schedules were set such as schedule(static), schedule(static,4), schedule(dynamic), schedule(dynamic,4) with the respective runtimes. Finding the lowest runtime with the static schedule: 431520 ns.
The parallel version then turned to be 10 times faster than the serial version!