NumPy Sorting Secrets: Tips and Tricks
Many of Python’s popular libraries use NumPy under the hood as a fundamental pillar of their infrastructure. Beyond slicing, dicing, and manipulating arrays, the NumPy library offers various functions that allow you to sort elements in an array.
Sorting an array is useful in many applications of computer science.
It lets you organize data in ordered form, look up elements quickly, and store data in a space-efficient manner.
Once you’ve installed the package, import it by running the following command:
NumPy Sort Algorithms
The numpy.sort() function allows you to sort an array using various sorting algorithms. You can specify the kind of algorithm to use by setting the ‘kind’ parameter.
The default uses ‘quicksort’. Other sorting algorithms that NumPy supports include mergesort, heapsort, introsort, and stable.
If you set the kind parameter to ‘stable’, the function automatically chooses the best stable sorting algorithm based upon the array data type.
In general, ‘mergesort’ and ‘stable’ are both mapped to timesort and radixsort under the cover, depending on the data type.
The sorting algorithms can be characterized by their average running speed, space complexity, and worst-case performance.
Moreover, a stable sorting algorithm keeps the items in their relative order, even when they have the same keys. Here is a summary of the properties of NumPy’s sorting algorithms.
|Kind of Algorithm||Average Speed||Worst Case||Worst Space||
It is worth noting that NumPy’s numpy.sort() function returns a sorted copy of an array. However, this is not the case when sorting along the last axis.
It is also faster to sort along the last axis and requires less space compared to other axes.
Let’s create an array of numbers and sort it using our choice of algorithm. The numpy.sort() function takes in an argument to set the ‘kind’ parameter to our choice of algorithm.
a = [1,2,8,9,6,1,3,6] numpy.sort(a, kind='quicksort')
Sort in Ascending Order
By default, NumPy sorts arrays in ascending order. You can simply pass your array to the numpy.sort() function that takes an array-like object as an argument.
The function returns a copy of the sorted array rather than sorting it in-place. If you want to sort an array in-place, you need to create an ndarray object using the numpy.array() function.
First, let’s construct an ndarray object.
a = numpy.array([1,2,1,3])
To sort an array in-place, we can use the sort method from the ndarray class:
a.sort(axis= -1, kind=None, order=None)
Sort by making a copy of the array
By using numpy.sort function, you can sort any array-like object without needing to create an ndarray object. This will return a copy of the array of the same type and shape as the original array.
a = [1,2,1,3] numpy.sort(a)
Sort in Descending Order
If you want to sort an array in descending order, you can make use of the same numpy.sort() function. Using the array syntax array[::-1] lets you reverse the array.
To sort an ndarray in-place, call numpy.ndarray.sort().
a = numpy.array([1,2,1,3]) a[::-1].sort() print(a)
Sort by making a copy of the array
Alternatively, you can use numpy.sort(array)[::-1] to create a copy of a reverse array that is sorted from the largest to smallest value.
a = [1,2,1,3] print(numpy.sort(a)[::-1])
Sort 2D Array
In the previous example, our array is a 1D object. The method takes an optional parameter ‘axis’ that is used to specify the axis along which to sort the array.
This is used when working with multidimensional arrays. It takes an integer as an argument. If no argument is passed, it uses the default value that is set to -1.
This returns an array that is sorted along the last axis. Alternatively, you can specify the axis along which to sort by setting this parameter to the corresponding integer value.
Before specifying the axis, you need to understand how NumPy axes work.
In NumPy, arrays are analogous to matrices in math. They consist of axes that are similar to the axes in a Cartesian coordinate system.
In a 2D NumPy array, the axes could be identified as a 2-dimensional Cartesian coordinate system that has an x-axis and the y axis.
The x-axis is the row axis which is represented as 0. It runs downwards in direction. The y-axis is the column axis that runs horizontally in direction.
To sort a 2D NumPy array by a row or column, you can set the axis parameter to 0 or 1, respectively.
Let’s begin by creating a 2D NumPy array:
a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]]) numpy.sort(a, axis= 1, kind=None, order=None)
Sort 3D Array
Sorting a 3D array is quite similar to sorting a 2D array. We worked with a 2D array in the previous example. If we create a 3D array, we will have 3 axes.
In that case, the x-axis is represented as 0, the y-axis is represented as 1, and the z-axis is represented as 2.
Let’s create a 3D NumPy array.
a = numpy.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]], [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]], [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])
Next, we can set the axis=2 to sort along the third axis.
numpy.sort(a, axis= 2, kind=None, order=None)
Sort by Column
There are various ways to sort a NumPy array by a column. You can set the ‘axis’ parameter or the ‘order’ parameter in the numpy.sort() function.
In the above example, we learned how to sort an array along with all its columns by setting the ‘axis’ parameter to 1. We can sort an array along a particular column using the ‘order’ attribute.
Sort Using Order
You can sort a NumPy array based on a field or a sequence of fields, provided that you define it with fields in the array’s dtype.
This is especially useful when working with columns in a spreadsheet where you wish to sort the table using the field of a specific column.
The numpy.sort() let’s you do this easily. It allows you to pass the field as a string in the ‘order’ parameter.
numpy.sort(a, axis=- 1, kind=None, order=None)
Let’s create an array with fields defined as ‘name’, ‘age’, and ‘score’.
dtype = [('name', 'S10'), ('age', int), ('score', float)] values = [('Alice', 18, 78), ('Bob', 19, 80), ('James', 17, 81)] a = numpy.array(values, dtype=dtype)
You can then specify which field to sort by passing it as a string to the ‘order’ parameter.
Sort by Multiple Columns
If you wish to sort the array by more than one field, you can define the sort order by using multiple fields as the ‘order’ parameter.
You can specify which fields to compare by passing the argument as a list to the ‘order’ parameter. It is not necessary to specify all fields as NumPy uses the unspecified fields in the order in which they come up in the dtype.
numpy.sort(a, order=['score', 'name'])
Sort by Row
Just as you sort a 2D NumPy array by column (by setting axis=1), you can set the axis parameter to 0 to sort the array by row. Using the same example as above, we can sort the 2D array by rows as:
a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]]) numpy.sort(a, axis= 0, kind=None, order=None)
The above method sorts all the rows in the array. If you want to sort only a specific row of the array, you will need to index that row.
The numpy.argsort() function comes in handy in such cases. It performs an indirect sort along the specified axis and returns an array of indices in sorted order.
Note that the function doesn’t return the sorted array. Rather, it returns an array of the same shape that contains the indices in sorted order.
You can then pass the values returned to the original array to change the positioning of rows.
Using the same array as above:
a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])
Let’s sort it by the 3rd row, i.e. the row at index position 2.
indices = numpy.argsort(a)
We can pass the result to our array to retrieve a sorted array based on the 2nd row.
sorted = a[:, indices] print(sorted)
Sort by Column till Specified Row or from Specific Row
You can sort an array till a specified row or from a specific row rather than sorting the whole array. This is easy to do with the  operator.
For instance, consider the following array.
a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])
If you only wish to sort the first 2 rows of the array, you can pass a sliced array to numpy.sort() function.
index = 2 numpy.sort(a[:index])
This returns a sorted slice of the original array.
Similarly, if you wish to sort from the 2nd and 3rd rows of the array, you can do it as follows:
Now, if you want to sort a column of the array only using a range of rows, you can use the same  operator to slice the column.
Using the same array as above, if we wish to sort first 3 rows of the 2nd column, we can slice the array as:
a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]]) sort_array = a[0:3, 1] numpy.sort(sort_array)
Sort by Datetime
If you’re working with data that has an element of time, you may want to sort it based upon the date or time.
Python has a module for working with time data that makes it easy to work with. You can then sort the data using numpy.sort().
Firstly, let’s import the datetime module.
Next, we can create a NumPy array that stores datetime objects.
a = numpy.array([datetime.datetime(2021, 1, 1, 12, 0), datetime.datetime(2021, 9, 1, 12, 0), datetime.datetime(2021, 5, 1, 12, 0)])
To sort the array, we can pass it to numpy.sort().
Sort with Lambda
In Python, you can create an anonymous function using the ‘lambda’ keyword. Such functions are useful when you only need to use them temporarily in your code.
NumPy supports the usage of lambda functions within an array. You can pass the function to iterate over each element in the array.
Consider a case where we want to retrieve even elements from an array. Furthermore, we want to sort the resulting even array.
We can use a lambda function to first filter out the values and pass it to numpy.sort().
Let’s begin by creating an array.
a = [2,3,6,4,2,8,9,5,2,0,1,9] even = list(filter(lambda x: x%2==0, a)) numpy.sort(even)
Sort with NaN Values
By default, NumPy sorts the array in a way that NaN values are pushed to the last. This creates ambiguity when you want to retrieve the index of the minimum or the maximum element in the array.
For instance, take a look at the following code snippet:
a = numpy.array([35, 55, 33, 17])
If we want to retrieve the smallest element in the array, we can use the numpy.argmin() function. But, if the array contains NaN values, the numpy.argmin() function returns the index of the NaN value as the smallest element.
a = numpy.array([35, numpy.nan, 33, 17]) numpy.argmin(a)
Similarly, when you want to retrieve the index of the largest array, numpy.argmax() also returns the index of the NaN value as the largest element.
When dealing with NaN values in an array, we should use numpy.nanargmin() and numpy.nanargmax() instead. These functions return the indices of the minimum and maximum values in the specified axis, while ignoring all NaN values.
Here, the functions will return the correct index of the minimum and maximum values in the above array.
Sort NumPy Array Containing Floats
NumPy handles float data type seamlessly, and sorting one does not require any extra work. You can pass a float array the same way as you pass any other array.
a = numpy.array([[10.3, 11.42, 10.002, 22.2], [7.08, 7.089, 10.20, 12.2], [7.4, 8.09, 3.6, 17]]) numpy.sort(a)
NumPy’s wide range of sorting functions make it easy to sort arrays for any task. Whether you’re working with a 1-D array or a multidimensional array, NumPy sorts it for you efficiently and in a concise code.
Here, we have discussed just a few capabilities of NumPy’s sort functions.
Mokhtar is the founder of LikeGeeks.com. He works as a Linux system administrator since 2010. He is responsible for maintaining, securing, and troubleshooting Linux servers for multiple clients around the world. He loves writing shell and Python scripts to automate his work.