Building Numpy with MKL

Numpy and SciPy are very powerful Python extensions for numerical and scientific computing. One can easily install their generic versions via standard Ubuntu commands:

ilya@lin1:/tmp$ sudo pip install numpy 

The problem with generic versions is that they are not optimized for a particular platform and don’t use platform-specific libraries that you might have installed. If you do have optimized mathematical libraries, in particular Intel’s MKL, using them with Numpy/SciPy can make a huge difference in performance.

To illustrate it we can compare generic Numpy with the version that uses MKL using a very simple benchmark. The following program performs a multiplication of two dense matrices followed by an eigenvalue decomposition.

ilya@lin1:/tmp$ cat 
import time
import numpy as np

A = np.random.rand(2000,2000)
B = np.random.rand(2000,2000)

print('Matrix multiplication')
time1 = time.time()
clock1 = time.clock()
C =,B) 
clock2 = time.clock()
time2 = time.time()
print('  Elapsed time: %.02f sec.' % (time2-time1) )
print('  CPU time: %.02f sec.' % (clock2-clock1) )

print('Eigenvalue computation')
time1 = time.time()
clock1 = time.clock()
clock2 = time.clock()
time2 = time.time()
print('  Elapsed time: %.02f sec.' % (time2-time1) )
print('  CPU time: %.02f sec.' % (clock2-clock1) )

After installing Numpy I am getting the following run times for these two operations on a two-core Skylake system (Intel i5-6260U CPU @ 1.80GHz):

ilya@lin1:/tmp$ python
Matrix multiplication
  Elapsed time: 17.62 sec.
  CPU time: 17.62 sec.
Eigenvalue computation
  Elapsed time: 82.66 sec.
  CPU time: 82.67 sec.

Here the elapsed time is just the wall clock time and CPU time is the time the CPU cores are busy. CPU time can be lower than the elapsed time if cores go idle or it can exceed the elapsed time on a parallel system if the program uses multiple cores concurrently. Here the two measurements match meaning that Numpy keeps a core busy throughout the computation but executes it serially.

Now we can uninstall Numpy

ilya@lin1:/tmp$ sudo pip uninstall numpy 

then download the source version from and build it locally with MKL. We untar the Numpy distribution file and create the site.cfg file using site.cfg.example as a template and pointing to the location of MKL libraries (following Intel Developer Zone suggestions):

ilya@lin1:~/Tools/numpy/numpy-1.10.4$ diff site.cfg site.cfg.example 
< [mkl]
< include_dirs = /opt/intel/compilers_and_libraries/linux/mkl/include
< library_dirs = /opt/intel/compilers_and_libraries/linux/mkl/lib/intel64
< mkl_libs = mkl_rt
< lapack_libs = 

Now it’s time to compile and install the optimized Numpy.

ilya@lin1:~/Tools/numpy/numpy-1.10.4$ export LD_LIBRARY_PATH=/opt/intel/compilers_and_libraries/linux/mkl/lib/intel64:${LD_LIBRARY_PATH}
ilya@lin1:~/Tools/numpy/numpy-1.10.4$ python config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel build
ilya@lin1:~/Tools/numpy/numpy-1.10.4$ sudo python config --compiler=intel build_clib --compiler=intel build_ext --compiler=intel install

After the installation is complete we can rerun the same test program to see the effect of MKL.

ilya@lin1:/tmp$ python
Matrix multiplication
  Elapsed time: 1.86 sec.
  CPU time: 3.66 sec.
Eigenvalue computation
  Elapsed time: 11.46 sec.
  CPU time: 21.06 sec.

We can see almost a tenfold improvement in performance for both operations. We can also notice that the CPU time is almost twice the elapsed time which indicates that the computations are running in parallel now using both cores of the system.

Similarly we can download and build SciPy (

SciPy inherits library configuration from NumPy so we don’t need any additional steps to point it to MKL

ilya@lin1:~/Tools/scipy$ unzip 
ilya@lin1:~/Tools/scipy$ cd scipy-0.16.1/
ilya@lin1:~/Tools/scipy/scipy-0.16.1$ python config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem build 
ilya@lin1:~/Tools/scipy/scipy-0.16.1$ sudo python config --compiler=intelem --fcompiler=intelem build_clib --compiler=intelem --fcompiler=intelem build_ext --compiler=intelem --fcompiler=intelem install

Once we are at it, we can install scikit-learn as well

$ sudo pip uninstall -U scikit-learn

and a few other packages

$ sudo apt-get install libfreetype6-dev
$ sudo pip install matplotlib
$ sudo pip install nltk

Note that if you have multiple Python installations (e.g. 2.7 and 3.5), you need to build and install optimized Numpy, SciPy and other packages for all of them.

Leave a Reply

Your email address will not be published. Required fields are marked *