Accelerate Your Python Code: A Practical Guide to Numba

Numba is an open-source JIT compiler that translates a subset of Python and NumPy code into fast machine code.

It’s used to speed up the numerical computations by using the industry-standard LLVM compiler library and optimizing execution speed.

In this tutorial, we will explore how to make your Python code run faster and more efficiently using Numba.

 

 

Why Use Numba?

Numba is particularly powerful for scientific and mathematical computations where speed is crucial. Here are some of the main reasons why you might want to use Numba:

  1. Performance Improvement: Numba can significantly boost the performance of Python functions, especially those using NumPy.
  2. Ease of Use: By adding a simple decorator to your Python functions, you can enjoy optimized performance without having to write complex code.
  3. Flexibility: It supports various hardware such as CPUs and GPUs, providing flexibility in computation.
  4. Integration: Numba integrates seamlessly with popular Python libraries like NumPy, making it easy to incorporate into existing projects.

 

Installation and Setup

You can install Numba via conda or pip. Here are the commands for both:
Using conda:

conda install numba

Output:

Solving environment: done
...

The code output indicates that the package has been successfully installed using the Conda package manager.
Using pip:

pip install numba

Output:

Collecting numba
  Downloading ...
  Successfully installed numba-x.x.x

This output demonstrates the successful installation of Numba using pip.
After installing Numba, you can verify the installation by importing it in a Python script.

import numba
print(numba.__version__)

Output:

0.57.1

The code output here displays the installed version of Numba, verifying that the installation was successful.

 

The @jit Decorator

The @jit decorator is one of the core features of Numba. It allows you to compile a Python function to machine code.
Here’s an example:

from numba import jit

@jit(nopython=True)
def add_numbers(a, b):
    return a + b
result = add_numbers(10, 20)
print(result)

Output:

30

By using the @jit decorator before the function definition, you are enabling Numba to compile the function into machine code. We used nopython=Ture which will be explained next.

The code takes two parameters a and b, adds them, and returns the result.

The output, 30, confirms that the function operates as expected.

 

nopython mode

The nopython mode is a special compilation mode in Numba that generates code that does not access the Python C API.

This mode is designed to maximize performance, and it achieves this by fully translating the decorated function into machine code, and not calling back into the Python interpreter.

from numba import jit

@jit(nopython=True) # Set "nopython" mode for best performance
def add(a, b):
    return a + b
print(add(1, 2)) # Output will be 3

When using nopython=True, if the code contains any constructs that cannot be translated into pure machine code, Numba will raise an error.

This ensures that the code is entirely free of any Python interaction, allowing it to be optimized to a greater extent.

 

Object Mode

Object mode is used when Numba cannot compile the function entirely in nopython mode. In this mode, Numba generates code that includes calls to the Python interpreter, leading to lower performance improvements.

You can explicitly use object mode by setting the nopython argument to False:

@jit(nopython=False)
def add_objects(x, y):
    return x + y

 

Object Mode vs nopython Mode

Both modes have distinct characteristics, benefits, and use cases.

nopython mode:

  • Pros: Significant performance improvement, highly optimized code.
  • Cons: Limited to a subset of Python features, can lead to compilation errors if unsupported features are used.

Object mode:

  • Pros: Can handle a wider variety of Python code.
  • Cons: Lower performance improvement compared to nopython mode.

 

The @njit Decorator

Numba’s @njit decorator is an alias for @jit(nopython=True).

When using the @njit decorator, Numba attempts to compile the decorated function in “no-Python” mode.

Here’s an example:

from numba import njit
import numpy as np

@njit
def add(a, b):
    return a + b
print(add(1, 2))

Output:

3

 

Speeding up NumPy code with Numba

You can use Numba to accelerate NumPy code. Let’s take a look at an example:

import numpy as np
from numba import njit

@njit
def multiply_arrays(a, b):
    return a * b
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])
result = multiply_arrays(a, b)
print(result)

Output:

[4 10 18]

In this code, you use the @njit decorator to compile a function that multiplies two NumPy arrays element-wise.

 

Universal Functions (UFuncs) with Numba

Universal Functions, or UFuncs, are a feature in NumPy that allows element-wise operations on arrays. These functions operate on an element-by-element basis and can be applied to arrays of varying shapes and sizes.

You can create a custom UFunc using the @vectorize decorator from Numba. Here’s an example that defines a UFunc to add two arrays element-wise:

from numba import vectorize
import numpy as np

@vectorize
def add_arrays(x, y):
    return x + y
array1 = np.array([1, 2, 3])
array2 = np.array([10, 20, 30])
result = add_arrays(array1, array2)
print(result)

Output:

11 21 31

The @vectorize decorator compiles the function to a UFunc, allowing it to be used with NumPy arrays, just like built-in UFuncs.

You can provide specific signatures to control the input and output types of the UFunc. For example:

@vectorize(['int64(int64, int64)'])
def add_arrays(x, y):
    return x + y

This ensures that the UFunc accepts only 64-bit integers and returns a 64-bit integer.

 

Compiling Functions for the CPU

Numba allows you to compile functions specifically for the CPU to enhance their performance. Here’s how you can do it:

from numba import njit

@njit(target_backend='cpu')
def multiply_numbers(a, b):
    return a * b
result = multiply_numbers(5, 6)
print(result)

Output:

30

By setting the target_backend parameter to ‘cpu’ in the @njit decorator, you instruct Numba to compile the function for the CPU.

The function multiplies two numbers, and the output, 30, confirms that the operation is performed correctly, with the benefit of optimized execution for the CPU.

 

Compiling Functions for the GPU

Numba is not only limited to optimizing CPU performance; it also allows you to leverage the power of GPU (Graphics Processing Unit).

GPUs are highly effective for parallel computations and can offer significant speed-ups for certain types of mathematical calculations.

With Numba, you can use CUDA programming to compile functions that will run on your GPU. Here’s an example:

First, ensure you have a compatible GPU and have installed the necessary CUDA toolkit.

from numba import cuda

@cuda.jit
def multiply_arrays(an_array, another_array, result_array):
    pos = cuda.grid(1)
    if pos < result_array.size:
        result_array[pos] = an_array[pos] * another_array[pos]

an_array = np.array([1, 2, 3, 4, 5])
another_array = np.array([10, 20, 30, 40, 50])
result_array = np.zeros(5)

# Define threads and blocks
threadsperblock = 32
blockspergrid = (result_array.size + (threadsperblock - 1)) // threadsperblock

# Execute the GPU function
multiply_arrays[blockspergrid, threadsperblock](an_array, another_array, result_array)
print(result_array)

Output:

[ 10. 40. 90. 160. 250.]

In this example, you’re using the @cuda.jit decorator to define a GPU function that multiplies two arrays element-wise.

The function is then executed with a specific number of threads and blocks suitable for GPU execution.

Working with GPUs through Numba offers an exciting way to accelerate your code even further.

 

Function Signatures

You can further optimize your functions by specifying the types of the input parameters. Here’s an example of how to do this:

from numba import njit, int32

@njit(int32(int32, int32))
def subtract_numbers(a, b):
    return a - b
result = subtract_numbers(20, 5)
print(result)

Output:

15

In the code above, the @njit decorator takes a function signature int32(int32, int32), which means that both the input parameters and the return value are of the int32 type.

Specifying the types enables the compiler to generate more optimized code.

The function subtracts two numbers, and the output, 15.

 

Speeding up loops with Numba

Numba can greatly accelerate loops in Python. Here’s how you can optimize a loop using Numba:

from numba import njit

@njit
def sum_of_squares(n):
    total = 0
    for i in range(n):
        total += i ** 2
    return total

result = sum_of_squares(10000)
print(result)

Output:

333283335000

The code above calculates the sum of the squares of the numbers from 0 to 9999. By using the @njit decorator, the loop runs significantly faster than it would in pure Python.

 

Supported Python features

Numba supports many Python features, but not all. Here are some of the supported features:

  1. Loops: For-loops and While-loops with break and continue statements.
  2. Conditional Statements: If-else statements.
  3. Built-in Functions: Functions like min, max, sum, etc.
  4. NumPy Functions: A wide range of NumPy functions and operations.

Keep in mind that some complex Python features might not be supported by Numba.

Always check the official Numba documentation for more information on supported features for Python and supported features for NumPy.

 

Caching Compiled Functions

Caching can save substantial time in subsequent runs by storing the compiled machine code and reusing it, thus avoiding the compilation overhead.

Here’s how to enable caching with Numba’s @njit decorator:

from numba import jit
import numpy as np

@jit(cache=True)
def sum_elements(arr):
    total = 0
    for item in arr:
        total += item
    return total
array = np.array([1, 2, 3, 4, 5])
result = sum_elements(array)
print(result)

Output:

15

By setting the cache=True parameter inside the @jit decorator, Numba will store the compiled version of the function on disk under __pycache__ directory.

The next time you call this function, Numba will retrieve the compiled code from the cache instead of recompiling it, thus reducing the execution time.

However, you must ensure that the compiled function remains compatible with the specific inputs and that changes in the environment or code don’t invalidate the cached version.

 

Measuring Performance

You can measure the performance difference between using Numba and normal Python code by timing the execution of a function with and without Numba’s JIT compilation.

import timeit
from numba import jit
import random

def normal_factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

@jit(nopython=True)
def numba_factorial(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

n = random.randint(1000, 10000) # You can choose any large value

# Measure normal Python function
normal_time = timeit.timeit('normal_factorial(n)', globals=globals(), number=1000)

# Measure Numba function
numba_time = timeit.timeit('numba_factorial(n)', globals=globals(), number=1000)

print(f"Normal Python execution time (average over 1000 runs): {normal_time:.6f} seconds")
print(f"Numba execution time (average over 1000 runs): {numba_time:.6f} seconds")

Output:

Normal Python execution time (average over 1000 runs): 16.698106 seconds
Numba execution time (average over 1000 runs): 0.343191 seconds

The number=1000 means that the code will run the functions 1000 times and take the average.

Using Numba make the code significantly faster.

 

When to Use Numba and when not

Using Numba can significantly speed up numerical computations, but it’s not always the best choice. Here’s when you might want to consider using or avoiding Numba:

When to Use Numba

  • You have performance-critical code that involves numerical computations.
  • You want to speed up loops or array operations.
  • You’re working with large datasets that require efficient processing.

When Not to Use Numba

  • Your code relies on unsupported Python features.
  • Your code is not computationally intensive, and the overhead of compiling might outweigh the benefits.
  • You need compatibility with platforms or interpreters that Numba doesn’t support.

 

Resources

https://numba.pydata.org/numba-doc/latest/user/index.html

Leave a Reply

Your email address will not be published. Required fields are marked *