# Python Numba compiler (Make numerical code runs super fast)

Numba is a powerful JIT(Just-In-Time) compiler used to accelerate the speed of large numerical calculations in Python.

It uses the industry-standard LLVM library to compile the machine code at runtime for optimization.

Numba enables certain numerical algorithms in Python to reach the speed of compiled languages like C or FORTRAN.

It is an easy-to-use compiler that has several advantages such as:

**Optimizing scientific code**Â – Numba can be used along with NumPy to optimize the performance of mathematical calculations. For different types of numerical algorithms, arrays and layouts used, Numba generates specially optimized code for better performance.**Use across various platform configurations**Â – Numba is tested and maintained across 200 platform configurations. It offers great flexibility as the main code can be written in Python while Numba handles the specifics for compilation at runtime.

It supports Windows/Mac/Linux OS, Python 3.7-3.10, and processors such as Intel and AMDx86.**Parallelization**Â – Numba can be used for running NumPy on multiple cores and to write parallel GPU algorithms in Python.

Python is used across a variety of disciplines such as Machine Learning, Artificial Intelligence, Data Science, etc., and across various industries such as finance, healthcare, etc.

Using large data sets is the norm in such disciplines and Numba can help address the slow runtime speed due to the interpreted nature of Python.

Table of Contents

## Installing Numba

You can install Numba using pip, runÂ `pip install numba`

Â in your terminal.

In case you are using pip3 (with Python3), use theÂ `pip3 install numba`

Â command.

All the dependencies required for Numba will also be installed with the pip install. You can also install it using conda, withÂ `conda install numba`

.

In case you need to install Numba from the source, you can clone the repo withÂ `git clone git://github.com/numba/numba.gitÂ `

and install it with the following command:

`python setup.py install`

## Use Numba with Python

Numba exhibits its best performance when it is used along with NumPy arrays and to optimize constructs such as loops and functions.

Using it on simple mathematical operations will not yield the best potential for the library.

The most common way of using Numba with Python code is to use Numba’s decorators to compile your Python functions.

The most common of these decorators is theÂ `@jit`

Â decorator.

There are two compilation modes in which Numba’sÂ `@jit`

Â decorator operates. theÂ `nopython`

Â mode and theÂ `object`

Â mode.

`nopython`

Â mode can be used by setting theÂ `nopython`

Â parameter of theÂ `jit`

Â decoratorÂ `True`

.In this mode, the entire function will be compiled into machine code at run time and executed without the involvement of the Python interpreter.

If theÂ `nopython`

Â parameter is not set to True, then theÂ `object`

Â mode will be used by default.

This mode identifies and compiles the loops in the function at run time while the rest of the function is executed by the Python interpreter.

It is generally not recommended to use the object mode.

In fact, theÂ `nopython`

Â mode is so popular that there is a separate decorator calledÂ `@njit`

Â which defaults to this mode and you don’t need to specify theÂ `nopython`

Â parameter separately.

from numba import jit import numpy as np arr = np.random.random(size=(40,25)) @jit(nopython=True) #tells Python to optimize following function def numba_xlogx(x): log_x = np.zeros_like(x) #array to store log values for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x arr_l = numba_xlogx(arr) print(arr[:5,:5],"\n") print(arr_l[:5,:5])

**Output:**

## Recursion in Numba

Numba can be used with recursive functions where self-recursion is used with explicit type annotation for the function in use.

The below example demonstrates the Fibonacci series implementation using recursive call.

The functionÂ `fibonacci_rec`

Â calls itself and is a self-recursion function.

As Numba is currently limited to self-recursion, this code will execute without a hitch.

from numba import jit import numpy as np @jit(nopython=True) def fibonacci_rec(n): if n <= 1: return n else: return(fibonacci_rec(n-1) + fibonacci_rec(n-2)) num = 5 print("Fibonacci series:") for i in range(num): print(fibonacci_rec(i))

**Output:**

Running a mutual recursion of two functions, however, is a bit tricky.

The code below demonstrates a mutual-recursion function. The functionÂ `second`

Â calls the functionÂ `one`

Â within its function body and vice-versa.

The type inference of functionÂ `second`

Â is dependent on the type inference of functionÂ `one`

Â and that ofÂ `one`

Â is dependent on theÂ `second`

.

Naturally, this leads to a cyclic dependency and the type inference cannot be resolved as the type inference for a function is suspended when waiting for the function type of the called function.

This will thus throw an error when running with Numba.

from numba import jit import numpy as np import time @jit(nopython=True) def second(y): if y > 0: return one(y) else: return 1 def one(y): return second(y - 1) second(4) print('done')

**Output:**

It is, however, possible to implement a mutually recursive function in case one of the functions has a return statement that does not have a recursive call and is the terminating statement for the function.

This function needs to be compiled first for successful execution of the program with Numba or there will be an error.

In the code demonstrated below, as the functionÂ `terminating_func`

Â has the statement without a recursive call, it needs to be compiled first byÂ `Numba`

Â to ensure the successful execution of the program.

Although the functions are recursive, this trick will throw no error.

from numba import jit import numpy as np @jit def terminating_func(x): if x > 0: return other1(x) else: return 1 @jit def other1(x): return other2(x) @jit def other2(x): return terminating_func(x - 1) terminating_func(5) print("done")

**Output:**

## Numba vs Python – Speed comparison

The whole purpose of using Numba is to generate a compiled version of Python code and thus gain significant improvement in speed of execution over pure Python interpreted code.

Let us do a comparison of one of the code samples used above with and without Numba’sÂ `@jit`

Â decorator inÂ `nopython`

Â mode.

Let us first run the code in pure Python and measure its time.

from numba import jit import numpy as np arr = np.random.random(size=(1000,1000)) def python_xlogx(x): #the method defined in python without numba log_x = np.zeros_like(x) for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x

We have defined the method, let’s now measure its time of execution

%%timeit -r 5 -n 10 arr_l = python_xlogx(arr)

**Output:**

Note that here we are using theÂ `%%timeit`

Â magic command of Jupyter notebooks.

You can place this command at the top of any code cell to measure its speed of execution.

It runs the same code several times and computes the mean and standard deviation of the execution time.

You can additionally specify the number of runs and the number of loops in each run using theÂ `-r`

Â andÂ `-n`

Â options respectively.

Now let us apply Numba’sÂ `jit`

to the same function(with a different name) and measure its speed.

@jit(nopython=True) #now using Numba def numba_xlogx(x): log_x = np.zeros_like(x) #array to store log values for i in range(x.shape[0]): for j in range(x.shape[1]): log_x[i][j] = np.log(x[i][j]) return x * log_x

Time to call this function and measure its performance!

%%timeit -r 5 -n 10 arr_l = numba_xlogx(arr)

**Output:**

As can be seen from the two outputs above, while Python takes an average of 2.96s to execute the function code, the Numba compiled code of the same function takes just about 22ms on average, thus giving us a speed-up of more than 100 times!

## Using Numba with CUDA

Most modern computation-intensive applications rely on increasingly powerful GPUs to parallelize their computations with the help of large memories on GPUs and get the results much faster.

For example, training a complex neural network that takes weeks or months on CPUs, can be accelerated with GPUs to do the same training in just a few days or hours.

Nvidia provides a powerful toolkit or API called ‘CUDA’ for programming on their GPUs.

Most of the modern Deep Learning frameworks such as Pytorch, Tensorflow, etc. make use of the CUDA toolkit and provide the option to switch any computation between CPUs and GPUs.

Our Numba compiler is not behind, it makes use of any available CUDA-supported GPUs to further accelerate our computations.

It has theÂ `cuda`

Â module to enable computations on the GPU.

But before using it, you need to additionally install the CUDA toolkit withÂ `pip3 install cudatoolkit`

Â orÂ `conda install cudatoolkit`

First of all, let’s find out if we have any available CUDA GPU on our machine that we can use with Numba.

from numba import cuda print(f"number of gpus:",len(cuda.gpus)) print(f"list of gpus:",cuda.gpus.lst)

**Output:**

Note that if there are no GPUs on our machine, we will get theÂ `CudaSupportError`

Â exception withÂ `CUDA_ERROR_NO_DEVICE`

Â error.

So it’s a good idea to put such codes in try-catch blocks.

Next, depending on how many GPUs we have and which one is currently free for use (i.e not being used by other users/processes), we can select/activate a certain GPU for Numba operations using theÂ `select_device`

Â method.

We can verify our selection using theÂ `numba.gpus.current`

Â attribute.

from numba import cuda print("GPU available:", cuda.is_available()) print("currently active gpu:", cuda.gpus.current) #selecting device cuda.select_device(0) print("currently active gpu:", cuda.gpus.current)

**Output:**

You can also optionally describe the GPU hardware by calling theÂ `numba.cuda.detect() method`

from numba import cuda print(cuda.detect())

**Output:**

Now let us try to accelerate a complex operation involving a series of element-wise matrix multiplications using the powerful combination of Numba and CUDA.

We can apply theÂ `@numba.cuda.jit`

Â decorator to our function to instruct Numba to use the currently active CUDA GPU for the function.

The functions defined to use GPU are called kernels, and they are invoked in a special way. We define ‘number_of_blocks’ and ‘threads_per_block’ and use them to invoke the kernel. The number of threads running the code will be equal to the product of these two values.

Also note that the kernels cannot return a value, so any value that we expect from the function should be written in a mutable data structure passed as a parameter to the kernel function.

from numba import cuda, jit import numpy as np a = np.random.random(size=(50,100,100)) #defining 50 2D arrays b = np.random.random(size=(50,100,100)) #another 50 2d arrays result = np.zeros((50,)) #array to store the result def mutiply_python(a,b, result): n,h,w = a.shape for i in range(n): result[i] = 0 #computing sum of elements of product for j in range(h): for k in range(w): result[i] += a[i,j,k]*b[i,j,k] @cuda.jit() def mutiply_numba_cuda(a,b, result): n,h,w = a.shape for i in range(n): result[i] = 0 #computing sum of elements of product for j in range(h): for k in range(w): result[i] += a[i,j,k]*b[i,j,k]

Now let’s run each of the two functions and measure their time.

Note that the code used here may not be the best candidate for GPU parallelization, and so the markup in time over pure Python code may not be representative of the best gain we can achieve through CUDA.

%%timeit -n 5 -r 10 mutiply_python(a,b,result)

**Output:**

%%timeit -n 5 -r 10 n_block, n_thread = 10,50 mutiply_numba_cuda[n_block, n_thread](a,b,result)

**Output:**

Note that a lot of Python methods and NumPy operations are still not supported by CUDA with Numba. An exhaustive list of supported Python features can be foundÂ here.

## Numba import error: Numba needs numpy 1.21 or less

Since Numba depends extensively on NumPy, it can work well only with certain versions of NumPy.

Currently, it works for NumPy versions<`1.21`

. If you have a NumPy version above 1.21, and you try to import Numba, you will get the above error.

You can check your current NumPy version usingÂ `numpy.__version__`

import numpy as np print(f"Current NumPy version: {np.__version__}") from numba import jit

**Output:**

As you can see, I have the NumPy versionÂ `1.23.1`

Â installed and so I get an error when I importÂ `numba.jit`

.

To circumvent this error, you can downgrade the NumPy version usingÂ `pip`

Â asÂ `pip3 install numpy=1.21`

.

Once this installation is successful, your Numba imports will work fine.

## Further Reading

For further reading, you can check the official docs from here:

https://numba.readthedocs.io/en/stable/index.html

Mokhtar is the founder of LikeGeeks.com. He works as a Linux system administratorÂ since 2010. He is responsible for maintaining, securing, and troubleshooting Linux servers for multiple clients around the world. He loves writing shell and Python scripts to automate his work.