Modern CPUs are normally equipped with multi-cores, but to take full advantage it for your Matlab program you normally require the parallel computing toolbox. Fortunately, you have other option too. For this article, we will look into using the OpenMP to achieve the speed up you are looking for. OpenMP provides API for shared-memory parallel programming in C/C++ and Fortran. It supports multiple platform (Windows, Linux, MacOS), which is a perfect candidate.
Before proceed with the post, you would need to install the C compiler for MEX. Here is a list of compatible C/C++ compiler for Matlab. For my case, I am using the MinGW in Windows Platform.
Step 1: Add the OpenMP pragma in the C code
The first step is to modify the C code to let OpenMP know where it should pitch in to perform the parallel computing. For more instructions on how to use the OpenMP, you can refer to their official website. In this post, I will only demonstrate with a simple example for the procedure.
In the place where the parallel suppose to happen, add the statement similar to this.
1 |
#pragma omp parallel for private(i) |
The full example code (test_openmp.c) for this post is below. The function is to do a whole bunch of add operation to each elements of the temp[6] array, then the sum of 6 elements are output. We instruct it to do parallel computing for variable i.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
#include "stdio.h" #include "stdlib.h" #include "string.h" #include "mex.h" #ifdef MX_API_VER #if MX_API_VER < 0x07030000 typedef int mwIndex; #endif #endif void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[] ) { int prob_estimate_flag = 0; char *filename; const char *error_msg; double result = 0; double temp[6] = {0, 0, 0, 0, 0, 0}; int i, j; #pragma omp parallel for private(i) for(i=0; i<6; i++) { temp[i] =3 * i; for(j=0; j<1E9; j++) temp[i] = temp[i]+j; } for(i=0; i<6; i++) result = result + temp[i]; plhs[0] = mxCreateDoubleScalar(result); return; } |
Step 2: Compile the C file using Mex
There is some small dependency on the platform and C compiler you are using.
2.1 Running in Windows OS
2.1.1 Using MSVC for Mex
For MSVC on Windows, add /openmp to the compilation flags:
1 2 3 |
mex -v COMPFLAGS="$COMPFLAGS /openmp" mexAdd.cpp |
2.1.1 Using MinGW for Mex
For MinGW in Windows, add
1 2 3 |
mex -V CFLAGS="$CFLAGS -std=c99 -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" test_openmp.c |
2.2 Running in Linux OS
For GCC on Linux systems, add -fopenmp
to both CFLAGS
and LDFLAGS
:
1 2 3 |
mex -v CFLAGS='$CFLAGS -fopenmp' -LDFLAGS='$LDFLAGS -fopenmp' mexAdd.cpp |
Of course, we can also use the Makefile to compile the C code instead of in the Mex. The
1 |
-I$(MATLABDIR)/extern/include |
need to be added into the gcc
flags.
2.3 Set the OPM_NUM_THREAD
You can either set it in Matlab or in the environment variable.
Set it in Matlab
1 2 3 4 5 6 7 8 |
% 8 cores setenv OMP_NUM_THREADS 8 time svm-train -c 8 -g 0.5 -m 1000 real-sim 175.90sec %1 core: setenv OMP_NUM_THREADS 1 time svm-train -c 8 -g 0.5 -m 1000 real-sim 588.89sec |
Set it using environment variable
For example in Linux OS, before run the Matlab:
1 2 |
export OMP_NUM_THREADS=8 Matlab |
Step 3: Test the Function
I am using MinGW-w64 (version 19.1.0) with Matlab R2019a in Windows 10.
Here is a comparison of using OpenMP vs not using OpenMP
With OpenMP (MinGW example)
1 2 3 |
mex -V CFLAGS="$CFLAGS -std=c99 -fopenmp" LDFLAGS="$LDFLAGS -fopenmp" test_openmp.c |
Here is the result for the run:
1 2 |
tic; result = test_openmp; toc Elapsed time is 1.377278 seconds. |
Without OpenMP (MinGW example)
1 2 3 |
mex -V CFLAGS="$CFLAGS -std=c99" test_openmp.c |
Here is the result for the run:
1 2 |
tic; result = test_openmp; toc Elapsed time is 6.706331 seconds. |
The speed up from the OpenMP is apparent (5X). Enjoy it.