Accelerate Your Python Code: A Practical Guide to Numba
Numba is an open-source JIT compiler that translates a subset of Python and NumPy code into fast machine code.
It’s used to speed up the numerical computations by using the industry-standard LLVM compiler library and optimizing execution speed.
In this tutorial, we will explore how to make your Python code run faster and more efficiently using Numba.
- 1 Why Use Numba?
- 2 Installation and Setup
- 3 The @jit Decorator
- 4 nopython mode
- 5 Object Mode
- 6 Object Mode vs nopython Mode
- 7 The @njit Decorator
- 8 Speeding up NumPy code with Numba
- 9 Universal Functions (UFuncs) with Numba
- 10 Compiling Functions for the CPU
- 11 Compiling Functions for the GPU
- 12 Function Signatures
- 13 Speeding up loops with Numba
- 14 Supported Python features
- 15 Caching Compiled Functions
- 16 Measuring Performance
- 17 When to Use Numba and when not
- 18 Resources
Why Use Numba?
Numba is particularly powerful for scientific and mathematical computations where speed is crucial. Here are some of the main reasons why you might want to use Numba:
- Performance Improvement: Numba can significantly boost the performance of Python functions, especially those using NumPy.
- Ease of Use: By adding a simple decorator to your Python functions, you can enjoy optimized performance without having to write complex code.
- Flexibility: It supports various hardware such as CPUs and GPUs, providing flexibility in computation.
- Integration: Numba integrates seamlessly with popular Python libraries like NumPy, making it easy to incorporate into existing projects.
Installation and Setup
You can install Numba via conda or pip. Here are the commands for both:
Using conda:
conda install numba
Output:
Solving environment: done ...
The code output indicates that the package has been successfully installed using the Conda package manager.
Using pip:
pip install numba
Output:
Collecting numba Downloading ... Successfully installed numba-x.x.x
This output demonstrates the successful installation of Numba using pip.
After installing Numba, you can verify the installation by importing it in a Python script.
import numba print(numba.__version__)
Output:
0.57.1
The code output here displays the installed version of Numba, verifying that the installation was successful.
The @jit Decorator
The @jit
decorator is one of the core features of Numba. It allows you to compile a Python function to machine code.
Here’s an example:
from numba import jit @jit(nopython=True) def add_numbers(a, b): return a + b result = add_numbers(10, 20) print(result)
Output:
30
By using the @jit
decorator before the function definition, you are enabling Numba to compile the function into machine code. We used nopython=Ture
which will be explained next.
The code takes two parameters a
and b
, adds them, and returns the result.
The output, 30, confirms that the function operates as expected.
nopython mode
The nopython
mode is a special compilation mode in Numba that generates code that does not access the Python C API.
This mode is designed to maximize performance, and it achieves this by fully translating the decorated function into machine code, and not calling back into the Python interpreter.
from numba import jit @jit(nopython=True) # Set "nopython" mode for best performance def add(a, b): return a + b print(add(1, 2)) # Output will be 3
When using nopython=True
, if the code contains any constructs that cannot be translated into pure machine code, Numba will raise an error.
This ensures that the code is entirely free of any Python interaction, allowing it to be optimized to a greater extent.
Object Mode
Object mode is used when Numba cannot compile the function entirely in nopython mode. In this mode, Numba generates code that includes calls to the Python interpreter, leading to lower performance improvements.
You can explicitly use object mode by setting the nopython
argument to False
:
@jit(nopython=False) def add_objects(x, y): return x + y
Object Mode vs nopython Mode
Both modes have distinct characteristics, benefits, and use cases.
nopython mode:
- Pros: Significant performance improvement, highly optimized code.
- Cons: Limited to a subset of Python features, can lead to compilation errors if unsupported features are used.
Object mode:
- Pros: Can handle a wider variety of Python code.
- Cons: Lower performance improvement compared to nopython mode.
The @njit Decorator
Numba’s @njit
decorator is an alias for @jit(nopython=True)
.
When using the @njit
decorator, Numba attempts to compile the decorated function in “no-Python” mode.
Here’s an example:
from numba import njit import numpy as np @njit def add(a, b): return a + b print(add(1, 2))
Output:
3
Speeding up NumPy code with Numba
You can use Numba to accelerate NumPy code. Let’s take a look at an example:
import numpy as np from numba import njit @njit def multiply_arrays(a, b): return a * b a = np.array([1, 2, 3]) b = np.array([4, 5, 6]) result = multiply_arrays(a, b) print(result)
Output:
[4 10 18]
In this code, you use the @njit
decorator to compile a function that multiplies two NumPy arrays element-wise.
Universal Functions (UFuncs) with Numba
Universal Functions, or UFuncs, are a feature in NumPy that allows element-wise operations on arrays. These functions operate on an element-by-element basis and can be applied to arrays of varying shapes and sizes.
You can create a custom UFunc using the @vectorize
decorator from Numba. Here’s an example that defines a UFunc to add two arrays element-wise:
from numba import vectorize import numpy as np @vectorize def add_arrays(x, y): return x + y array1 = np.array([1, 2, 3]) array2 = np.array([10, 20, 30]) result = add_arrays(array1, array2) print(result)
Output:
11 21 31
The @vectorize
decorator compiles the function to a UFunc, allowing it to be used with NumPy arrays, just like built-in UFuncs.
You can provide specific signatures to control the input and output types of the UFunc. For example:
@vectorize(['int64(int64, int64)']) def add_arrays(x, y): return x + y
This ensures that the UFunc accepts only 64-bit integers and returns a 64-bit integer.
Compiling Functions for the CPU
Numba allows you to compile functions specifically for the CPU to enhance their performance. Here’s how you can do it:
from numba import njit @njit(target_backend='cpu') def multiply_numbers(a, b): return a * b result = multiply_numbers(5, 6) print(result)
Output:
30
By setting the target_backend
parameter to ‘cpu’ in the @njit
decorator, you instruct Numba to compile the function for the CPU.
The function multiplies two numbers, and the output, 30, confirms that the operation is performed correctly, with the benefit of optimized execution for the CPU.
Compiling Functions for the GPU
Numba is not only limited to optimizing CPU performance; it also allows you to leverage the power of GPU (Graphics Processing Unit).
GPUs are highly effective for parallel computations and can offer significant speed-ups for certain types of mathematical calculations.
With Numba, you can use CUDA programming to compile functions that will run on your GPU. Here’s an example:
First, ensure you have a compatible GPU and have installed the necessary CUDA toolkit.
from numba import cuda @cuda.jit def multiply_arrays(an_array, another_array, result_array): pos = cuda.grid(1) if pos < result_array.size: result_array[pos] = an_array[pos] * another_array[pos] an_array = np.array([1, 2, 3, 4, 5]) another_array = np.array([10, 20, 30, 40, 50]) result_array = np.zeros(5) # Define threads and blocks threadsperblock = 32 blockspergrid = (result_array.size + (threadsperblock - 1)) // threadsperblock # Execute the GPU function multiply_arrays[blockspergrid, threadsperblock](an_array, another_array, result_array) print(result_array)
Output:
[ 10. 40. 90. 160. 250.]
In this example, you’re using the @cuda.jit
decorator to define a GPU function that multiplies two arrays element-wise.
The function is then executed with a specific number of threads and blocks suitable for GPU execution.
Working with GPUs through Numba offers an exciting way to accelerate your code even further.
Function Signatures
You can further optimize your functions by specifying the types of the input parameters. Here’s an example of how to do this:
from numba import njit, int32 @njit(int32(int32, int32)) def subtract_numbers(a, b): return a - b result = subtract_numbers(20, 5) print(result)
Output:
15
In the code above, the @njit
decorator takes a function signature int32(int32, int32)
, which means that both the input parameters and the return value are of the int32
type.
Specifying the types enables the compiler to generate more optimized code.
The function subtracts two numbers, and the output, 15.
Speeding up loops with Numba
Numba can greatly accelerate loops in Python. Here’s how you can optimize a loop using Numba:
from numba import njit @njit def sum_of_squares(n): total = 0 for i in range(n): total += i ** 2 return total result = sum_of_squares(10000) print(result)
Output:
333283335000
The code above calculates the sum of the squares of the numbers from 0 to 9999. By using the @njit
decorator, the loop runs significantly faster than it would in pure Python.
Supported Python features
Numba supports many Python features, but not all. Here are some of the supported features:
- Loops: For-loops and While-loops with break and continue statements.
- Conditional Statements: If-else statements.
- Built-in Functions: Functions like
min
,max
,sum
, etc. - NumPy Functions: A wide range of NumPy functions and operations.
Keep in mind that some complex Python features might not be supported by Numba.
Always check the official Numba documentation for more information on supported features for Python and supported features for NumPy.
Caching Compiled Functions
Caching can save substantial time in subsequent runs by storing the compiled machine code and reusing it, thus avoiding the compilation overhead.
Here’s how to enable caching with Numba’s @njit
decorator:
from numba import jit import numpy as np @jit(cache=True) def sum_elements(arr): total = 0 for item in arr: total += item return total array = np.array([1, 2, 3, 4, 5]) result = sum_elements(array) print(result)
Output:
15
By setting the cache=True
parameter inside the @jit
decorator, Numba will store the compiled version of the function on disk under __pycache__ directory.
The next time you call this function, Numba will retrieve the compiled code from the cache instead of recompiling it, thus reducing the execution time.
However, you must ensure that the compiled function remains compatible with the specific inputs and that changes in the environment or code don’t invalidate the cached version.
Measuring Performance
You can measure the performance difference between using Numba and normal Python code by timing the execution of a function with and without Numba’s JIT compilation.
import timeit from numba import jit import random def normal_factorial(n): result = 1 for i in range(1, n + 1): result *= i return result @jit(nopython=True) def numba_factorial(n): result = 1 for i in range(1, n + 1): result *= i return result n = random.randint(1000, 10000) # You can choose any large value # Measure normal Python function normal_time = timeit.timeit('normal_factorial(n)', globals=globals(), number=1000) # Measure Numba function numba_time = timeit.timeit('numba_factorial(n)', globals=globals(), number=1000) print(f"Normal Python execution time (average over 1000 runs): {normal_time:.6f} seconds") print(f"Numba execution time (average over 1000 runs): {numba_time:.6f} seconds")
Output:
Normal Python execution time (average over 1000 runs): 16.698106 seconds Numba execution time (average over 1000 runs): 0.343191 seconds
The number=1000
means that the code will run the functions 1000 times and take the average.
Using Numba make the code significantly faster.
When to Use Numba and when not
Using Numba can significantly speed up numerical computations, but it’s not always the best choice. Here’s when you might want to consider using or avoiding Numba:
When to Use Numba
- You have performance-critical code that involves numerical computations.
- You want to speed up loops or array operations.
- You’re working with large datasets that require efficient processing.
When Not to Use Numba
- Your code relies on unsupported Python features.
- Your code is not computationally intensive, and the overhead of compiling might outweigh the benefits.
- You need compatibility with platforms or interpreters that Numba doesn’t support.
Resources
https://numba.pydata.org/numba-doc/latest/user/index.html
Mokhtar is the founder of LikeGeeks.com. He is a seasoned technologist and accomplished author, with expertise in Linux system administration and Python development. Since 2010, Mokhtar has built an impressive career, transitioning from system administration to Python development in 2015. His work spans large corporations to freelance clients around the globe. Alongside his technical work, Mokhtar has authored some insightful books in his field. Known for his innovative solutions, meticulous attention to detail, and high-quality work, Mokhtar continually seeks new challenges within the dynamic field of technology.