Python

Python Numba compiler (Make numerical code runs super fast)

Numba is a powerful JIT(Just-In-Time) compiler used to accelerate the speed of large numerical calculations in Python.
It uses the industry-standard LLVM library to compile the machine code at runtime for optimization.
Numba enables certain numerical algorithms in Python to reach the speed of compiled languages like C or FORTRAN.
It is an easy-to-use compiler that has several advantages such as:

  1. Optimizing scientific code – Numba can be used along with NumPy to optimize the performance of mathematical calculations. For different types of numerical algorithms, arrays and layouts used, Numba generates specially optimized code for better performance.
  2. Use across various platform configurations – Numba is tested and maintained across 200 platform configurations. It offers great flexibility as the main code can be written in Python while Numba handles the specifics for compilation at runtime.
    It supports Windows/Mac/Linux OS, Python 3.7-3.10, and processors such as Intel and AMDx86.
  3. Parallelization – Numba can be used for running NumPy on multiple cores and to write parallel GPU algorithms in Python.
    Python is used across a variety of disciplines such as Machine Learning, Artificial Intelligence, Data Science, etc., and across various industries such as finance, healthcare, etc.
    Using large data sets is the norm in such disciplines and Numba can help address the slow runtime speed due to the interpreted nature of Python.

 

 

Installing Numba

You can install Numba using pip, run pip install numba in your terminal.
In case you are using pip3 (with Python3), use the pip3 install numba command.
All the dependencies required for Numba will also be installed with the pip install. You can also install it using conda, with conda install numba.
In case you need to install Numba from the source, you can clone the repo with git clone git://github.com/numba/numba.git and install it with the following command:
python setup.py install

 

Use Numba with Python

Numba exhibits its best performance when it is used along with NumPy arrays and to optimize constructs such as loops and functions.
Using it on simple mathematical operations will not yield the best potential for the library.
The most common way of using Numba with Python code is to use Numba’s decorators to compile your Python functions.
The most common of these decorators is the @jit decorator.

There are two compilation modes in which Numba’s @jit decorator operates. the nopython mode and the object mode.
nopython mode can be used by setting the nopython parameter of the jit decorator True.In this mode, the entire function will be compiled into machine code at run time and executed without the involvement of the Python interpreter.
If the nopython parameter is not set to True, then the object mode will be used by default.
This mode identifies and compiles the loops in the function at run time while the rest of the function is executed by the Python interpreter.
It is generally not recommended to use the object mode.
In fact, the nopython mode is so popular that there is a separate decorator called @njit which defaults to this mode and you don’t need to specify the nopython parameter separately.

from numba import jit

import numpy as np

arr = np.random.random(size=(40,25))

@jit(nopython=True) #tells Python to optimize following function

def numba_xlogx(x): 

    log_x = np.zeros_like(x) #array to store log values

    for i in range(x.shape[0]):   

        for j in range(x.shape[1]):

            log_x[i][j] = np.log(x[i][j])

    return x * log_x 

arr_l = numba_xlogx(arr)

print(arr[:5,:5],"\n")

print(arr_l[:5,:5])

Output:

numba in nopython mode

 

Recursion in Numba

Numba can be used with recursive functions where self-recursion is used with explicit type annotation for the function in use.
The below example demonstrates the Fibonacci series implementation using recursive call.
The function fibonacci_rec calls itself and is a self-recursion function.

As Numba is currently limited to self-recursion, this code will execute without a hitch.

from numba import jit

import numpy as np

@jit(nopython=True)

def fibonacci_rec(n):

    if n <= 1:

        return n

    else:

        return(fibonacci_rec(n-1) + fibonacci_rec(n-2))

num = 5

print("Fibonacci series:")

for i in range(num):

    print(fibonacci_rec(i))

Output:

numba on self-recursive code

Running a mutual-recursion of two functions, however, is a bit tricky.
The code below demonstrates a mutual-recursion function. The function second calls the function one within its function body and vice-versa.
The type inference of function second is dependent on the type inference of function one and that of one is dependent on the second.
Naturally, this leads to a cyclic dependency and the type inference cannot be resolved as the type inference for a function is suspended when waiting for the function type of the called function.
This will thus throw an error when running with Numba.

from numba import jit

import numpy as np

import time

@jit(nopython=True)

def second(y):

    if y > 0:

        return one(y)

    else:

        return 1

def one(y):

    return second(y - 1)

second(4)

print('done')

Output:

numba failure on mutual-recursive code

It is, however, possible to implement a mutually recursive function in case one of the functions has a return statement that does not have a recursive call and is the terminating statement for the function.
This function needs to be compiled first for successful execution of the program with Numba or there will be an error.
In the code demonstrated below, as the function terminating_func has the statement without a recursive call, it needs to be compiled first by Numba to ensure the successful execution of the program.
Although the functions are recursive, this trick will throw no error.

from numba import jit

import numpy as np

@jit

def terminating_func(x):

    if x > 0:

        return other1(x)

    else:

        return 1

@jit

def other1(x):

    return other2(x)

@jit

def other2(x):

    return terminating_func(x - 1)

terminating_func(5)

print("done") 

Output:

numba trick on mutual-recursive code

 

Numba vs Python – Speed comparison

The whole purpose of using Numba is to generate a compiled version of Python code and thus gain significant improvement in speed of execution over pure Python interpreted code.
Let us do a comparison of one of the code samples used above with and without Numba’s @jit decorator in nopython mode.

Let us first run the code in pure Python and measure its time.

from numba import jit

import numpy as np

arr = np.random.random(size=(1000,1000))

def python_xlogx(x): #the method defined in python without numba

    log_x = np.zeros_like(x)

    for i in range(x.shape[0]):   

        for j in range(x.shape[1]):

            log_x[i][j] = np.log(x[i][j])

    return x * log_x 

We have defined the method, let’s now measure its time of execution

%%timeit -r 5 -n 10
arr_l = python_xlogx(arr)

Output:

speed of pure python function

Note that here we are using the %%timeit magic command of Jupyter notebooks.
You can place this command at the top of any code cell to measure its speed of execution.
It runs the same code several times and computes the mean and standard deviation of the execution time.
You can additionally specify the number of runs and the number of loops in each run using the -r and -n options respectively.

Now let us apply Numba’s jit to the same function(with different name) and measure its speed.

@jit(nopython=True) #now using Numba 

def numba_xlogx(x): 

    log_x = np.zeros_like(x) #array to store log values

    for i in range(x.shape[0]):   

        for j in range(x.shape[1]):

            log_x[i][j] = np.log(x[i][j])

    return x * log_x 

Time to call this function and measure its performance!

%%timeit -r 5 -n 10

arr_l = numba_xlogx(arr)

Output:

speed of numba python function

As can be seen from the two outputs above, while Python takes an average of 2.96s to execute the function code, the Numba compiled code of the same function takes just about 22ms on average, thus giving us a speed-up of more than 100 times!

 

Using Numba with CUDA

Most modern computation-intensive applications rely on increasingly powerful GPUs to parallelize their computations with the help of large memories on GPUs and get the results much faster.
For example, training a complex neural network that takes weeks or months on CPUs, can be accelerated with GPUs to do the same training in just a few days or hours.

Nvidia provides a powerful toolkit or API called ‘CUDA’ for programming on their GPUs.
Most of the modern Deep Learning frameworks such as Pytorch, Tensorflow, etc. make use of the CUDA toolkit and provide the option to switch any computation between CPUs and GPUs.

Our Numba compiler is not behind, it makes use of any available CUDA-supported GPUs to further accelerate our computations.
It has the cuda module to enable computations on the GPU.
But before using it, you need to additionally install the CUDA toolkit with pip3 install cudatoolkit or conda install cudatoolkit

First of all, let’s find out if we have any available CUDA GPU on our machine that we can use with Numba.

from numba import cuda
print(f"number of gpus:",len(cuda.gpus))
print(f"list of gpus:",cuda.gpus.lst)

Output:

checking if cuda gpu is available

Note that if there are no GPUs on our machine, we will get the CudaSupportError exception with CUDA_ERROR_NO_DEVICE error.
So it’s a good idea to put such codes in try-catch blocks.

Next, depending on how many GPUs we have and which one is currently free for use (i.e not being used by other users/processes), we can select/activate a certain GPU for Numba operations using the select_device method.
We can verify our selection using the numba.gpus.current attribute.

from numba import cuda

print("GPU available:", cuda.is_available())

print("currently active gpu:", cuda.gpus.current)

#selecting device
cuda.select_device(0)

print("currently active gpu:", cuda.gpus.current)

Output:

checking and selecting active gpu

You can also optionally describe the GPU hardware by calling the numba.cuda.detect() method

from numba import cuda

print(cuda.detect())

Output:

describing gpu

Now let us try to accelerate a complex operation involving a series of element-wise matrix multiplications using the powerful combination of Numba and CUDA.
We can apply the @numba.cuda.jit decorator to our function to instruct Numba to use the currently active CUDA GPU for the function.
The functions defined to use GPU are called kernels, and they are invoked in a special way. We define ‘number_of_blocks’ and ‘threads_per_block’ and use them to invoke the kernel. The number of threads running the code will be equal to the product of the these two values.
Also note that the kernels cannot return a value, so any value that we expect from the function should be written in a mutable data structure passed as a parameter to the kernel function.

from numba import cuda, jit

import numpy as np

a = np.random.random(size=(50,100,100)) #defining 50 2D arrays

b = np.random.random(size=(50,100,100)) #another 50 2d arrays

result = np.zeros((50,)) #array to store the result

def mutiply_python(a,b, result):

  n,h,w = a.shape
  
  for i in range(n):

    result[i] = 0 #computing sum of elements of product

    for j in range(h):

      for k in range(w):

        result[i] += a[i,j,k]*b[i,j,k]

@cuda.jit()

def mutiply_numba_cuda(a,b, result):

  n,h,w = a.shape
  
  for i in range(n):

    result[i] = 0 #computing sum of elements of product

    for j in range(h):

      for k in range(w):

        result[i] += a[i,j,k]*b[i,j,k]

Now let’s run each of the two functions and measure their time.
Note that the code used here may not be the best candidate for GPU parallelization, and so the markup in time over pure Python code may not be representative of the best gain we can achieve through CUDA.

%%timeit -n 5 -r 10

mutiply_python(a,b,result)

Output:

running pure python vs gpu

%%timeit -n 5 -r 10

n_block, n_thread = 10,50

mutiply_numba_cuda[n_block, n_thread](a,b,result)

Output:

running cuda code vs python

Note that a lot of Python methods and NumPy operations are still not supported by CUDA with Numba. An exhaustive list of supported Python features can be found here.

 

Numba import error: Numba needs numpy 1.21 or less

Since Numba depends extensively on NumPy, it can work well only with certain versions of NumPy.
Currently, it works for NumPy versions<1.21. If you have a NumPy version above 1.21, and you try to import Numba, you will get the above error.
You can check your current NumPy version using numpy.__version__

import numpy as np

print(f"Current NumPy version: {np.__version__}")

from numba import jit

Output:

numba import error due to incompatible numpy

As you can see, I have the NumPy version 1.23.1 installed and so I get an error when I import numba.jit.
To circumvent this error, you can downgrade the NumPy version using pip as pip3 install numpy=1.21.
Once this installation is successful, your Numba imports will work fine.

Leave a Reply

Your email address will not be published.