Python

Sort NumPy arrays in Python

Many of Python’s popular libraries use NumPy under the hood as a fundamental pillar of their infrastructure. Beyond slicing, dicing, and manipulating arrays, the NumPy library offers various functions that allow you to sort elements in an array.

Sorting an array is useful in many applications of computer science.

It lets you organize data in ordered form, look up elements quickly, and store data in a space-efficient manner.

Once you’ve installed the package, import it by running the following command:

import numpy

 

 

NumPy Sort Algorithms

The numpy.sort() function allows you to sort an array using various sorting algorithms. You can specify the kind of algorithm to use by setting the ‘kind’ parameter.

The default uses ‘quicksort’. Other sorting algorithms that NumPy supports include mergesort, heapsort, introsort, and stable.

If you set the kind parameter to ‘stable’, the function automatically chooses the best stable sorting algorithm based upon the array data type.

In general, ‘mergesort’ and ‘stable’ are both mapped to timesort and radixsort under the cover, depending on the data type.

The sorting algorithms can be characterized by their average running speed, space complexity, and worst-case performance.

Moreover, a stable sorting algorithm keeps the items in their relative order, even when they have the same keys. Here is a summary of the properties of NumPy’s sorting algorithms.

Kind of AlgorithmAverage SpeedWorst CaseWorst Space 

Stable

quicksort1O(n^2)0no
mergesort2O(n*log(n))~n/2yes
timesort2O(n*log(n))~n/2yes
heapsort3O(n*log(n))0no

It is worth noting that NumPy’s numpy.sort() function returns a sorted copy of an array. However, this is not the case when sorting along the last axis.

It is also faster to sort along the last axis and requires less space compared to other axes.

Let’s create an array of numbers and sort it using our choice of algorithm. The numpy.sort() function takes in an argument to set the ‘kind’ parameter to our choice of algorithm.

a = [1,2,8,9,6,1,3,6]

numpy.sort(a, kind='quicksort')

NumPy sort algorithm kind

 

Sort in Ascending Order

By default, NumPy sorts arrays in ascending order. You can simply pass your array to the numpy.sort() function that takes an array-like object as an argument.

The function returns a copy of the sorted array rather than sorting it in-place. If you want to sort an array in-place, you need to create an ndarray object using the numpy.array() function.

Sort in-place

First, let’s construct an ndarray object.

a = numpy.array([1,2,1,3])

To sort an array in-place, we can use the sort method from the ndarray class:

a.sort(axis= -1, kind=None, order=None)

NumPy sort array in-place

Sort by making a copy of the array

By using numpy.sort function, you can sort any array-like object without needing to create an ndarray object. This will return a copy of the array of the same type and shape as the original array.

a = [1,2,1,3]

numpy.sort(a)

NumPy sort array in-place

 

Sort in Descending Order

If you want to sort an array in descending order, you can make use of the same numpy.sort() function. Using the array syntax array[::-1] lets you reverse the array.

Sort in-place

To sort an ndarray in-place, call numpy.ndarray.sort().

a = numpy.array([1,2,1,3])

a[::-1].sort()

print(a)

NumPy sort array in-place

Sort by making a copy of the array

Alternatively, you can use numpy.sort(array)[::-1] to create a copy of a reverse array that is sorted from the largest to smallest value.

a = [1,2,1,3]

print(numpy.sort(a)[::-1])

NumPy sort array copy

 

Sort 2D Array

In the previous example, our array is a 1D object. The method takes an optional parameter ‘axis’ that is used to specify the axis along which to sort the array.

This is used when working with multidimensional arrays. It takes an integer as an argument. If no argument is passed, it uses the default value that is set to -1.

This returns an array that is sorted along the last axis. Alternatively, you can specify the axis along which to sort by setting this parameter to the corresponding integer value.

Before specifying the axis, you need to understand how NumPy axes work.

NumPy Axes

In NumPy, arrays are analogous to matrices in math. They consist of axes that are similar to the axes in a Cartesian coordinate system.

In a 2D NumPy array, the axes could be identified as a 2-dimensional Cartesian coordinate system that has an x-axis and the y axis.

The x-axis is the row axis which is represented as 0. It runs downwards in direction. The y-axis is the column axis that runs horizontally in direction.

To sort a 2D NumPy array by a row or column, you can set the axis parameter to 0 or 1, respectively.

Let’s begin by creating a 2D NumPy array:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])

numpy.sort(a, axis= 1, kind=None, order=None) 

NumPy sort array 2D

 

Sort 3D Array

Sorting a 3D array is quite similar to sorting a 2D array. We worked with a 2D array in the previous example. If we create a 3D array, we will have 3 axes.

In that case, the x-axis is represented as 0, the y-axis is represented as 1, and the z-axis is represented as 2.

Let’s create a 3D NumPy array.

a = numpy.array([[[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]], [[12, 11, 13, 23], [23, 7, 12, 14], [31, 34, 33, 17]], [[10, 6, 13, 22], [34, 7, 20, 14], [31, 34, 33, 7]]])

Next, we can set the axis=2 to sort along the third axis.

numpy.sort(a, axis= 2, kind=None, order=None) 

NumPy sort array 3D

 

Sort by Column

There are various ways to sort a NumPy array by a column. You can set the ‘axis’ parameter or the ‘order’ parameter in the numpy.sort() function.

In the above example, we learned how to sort an array along with all its columns by setting the ‘axis’ parameter to 1. We can sort an array along a particular column using the ‘order’ attribute.

Sort Using Order

You can sort a NumPy array based on a field or a sequence of fields, provided that you define it with fields in the array’s dtype.

This is especially useful when working with columns in a spreadsheet where you wish to sort the table using the field of a specific column.

The numpy.sort() let’s you do this easily. It allows you to pass the field as a string in the ‘order’ parameter.

numpy.sort(a, axis=- 1, kind=None, order=None) 

Let’s create an array with fields defined as ‘name’, ‘age’, and ‘score’.

dtype = [('name', 'S10'), ('age', int), ('score', float)]

values =  [('Alice', 18, 78), ('Bob', 19, 80), ('James', 17, 81)]

a = numpy.array(values, dtype=dtype)

You can then specify which field to sort by passing it as a string to the ‘order’ parameter.

numpy.sort(a, order='score')

NumPy sort by column order

 

Sort by Multiple Columns

If you wish to sort the array by more than one field, you can define the sort order by using multiple fields as the ‘order’ parameter.

You can specify which fields to compare by passing the argument as a list to the ‘order’ parameter. It is not necessary to specify all fields as NumPy uses the unspecified fields in the order in which they come up in the dtype.

numpy.sort(a, order=['score', 'name'])

NumPy sort by multiple column order

 

Sort by Row

Just as you sort a 2D NumPy array by column (by setting axis=1), you can set the axis parameter to 0 to sort the array by row. Using the same example as above, we can sort the 2D array by rows as:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17]])

numpy.sort(a, axis= 0, kind=None, order=None) 

NumPy sort by row

The above method sorts all the rows in the array. If you want to sort only a specific row of the array, you will need to index that row.

The numpy.argsort() function comes in handy in such cases. It performs an indirect sort along the specified axis and returns an array of indices in sorted order.

Note that the function doesn’t return the sorted array. Rather, it returns an array of the same shape that contains the indices in sorted order.

You can then pass the values returned to the original array to change the positioning of rows.

Using the same array as above:

a = numpy.array([[10, 11, 13, 22],  [23, 7, 20, 14],  [31, 11, 33, 17]])

Let’s sort it by the 3rd row, i.e. the row at index position 2.

indices = numpy.argsort(a[2])

We can pass the result to our array to retrieve a sorted array based on the 2nd row.

sorted = a[:, indices]

print(sorted)

NumPy sort by specific row

 

Sort by Column till Specified Row or from Specific Row

You can sort an array till a specified row or from a specific row rather than sorting the whole array. This is easy to do with the [] operator.

For instance, consider the following array.

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])

If you only wish to sort the first 2 rows of the array, you can pass a sliced array to numpy.sort() function.

index = 2
numpy.sort(a[:index])

This returns a sorted slice of the original array.

NumPy sort till specific row

Similarly, if you wish to sort from the 2nd and 3rd rows of the array, you can do it as follows:

numpy.sort(a[1:3])

NumPy sort in a range of rows

Now, if you want to sort a column of the array only using a range of rows, you can use the same [] operator to slice the column.

Using the same array as above, if we wish to sort first 3 rows of the 2nd column, we can slice the array as:

a = numpy.array([[10, 11, 13, 22], [23, 7, 20, 14], [31, 11, 33, 17], [17, 12, 33, 16]])

sort_array = a[0:3, 1]

numpy.sort(sort_array)

NumPy sort in a range of rows by column

 

Sort by Datetime

If you’re working with data that has an element of time, you may want to sort it based upon the date or time.

Python has a module for working with time data that makes it easy to work with. You can then sort the data using numpy.sort().

Firstly, let’s import the datetime module.

import datetime

Next, we can create a NumPy array that stores datetime objects.

a = numpy.array([datetime.datetime(2021, 1, 1, 12, 0), datetime.datetime(2021, 9, 1, 12, 0), datetime.datetime(2021, 5, 1, 12, 0)])

To sort the array, we can pass it to numpy.sort().

numpy.sort(a)

NumPy sort by datetime

 

Sort with Lambda

In Python, you can create an anonymous function using the ‘lambda’ keyword. Such functions are useful when you only need to use them temporarily in your code.

NumPy supports the usage of lambda functions within an array. You can pass the function to iterate over each element in the array.

Consider a case where we want to retrieve even elements from an array. Furthermore, we want to sort the resulting even array.

We can use a lambda function to first filter out the values and pass it to numpy.sort().

Let’s begin by creating an array.

a = [2,3,6,4,2,8,9,5,2,0,1,9]

even = list(filter(lambda x: x%2==0, a))

numpy.sort(even)

NumPy sort using lambda

 

Sort with NaN Values

By default, NumPy sorts the array in a way that NaN values are pushed to the last. This creates ambiguity when you want to retrieve the index of the minimum or the maximum element in the array.

For instance, take a look at the following code snippet:

a = numpy.array([35, 55, 33, 17])

If we want to retrieve the smallest element in the array, we can use the numpy.argmin() function. But, if the array contains NaN values, the numpy.argmin() function returns the index of the NaN value as the smallest element.

a = numpy.array([35, numpy.nan, 33, 17])

numpy.argmin(a)

Similarly, when you want to retrieve the index of the largest array, numpy.argmax() also returns the index of the NaN value as the largest element.

numpy.argmax(a)

NumPy argmix and argmax

When dealing with NaN values in an array, we should use numpy.nanargmin() and numpy.nanargmax() instead. These functions return the indices of the minimum and maximum values in the specified axis, while ignoring all NaN values.

Here, the functions will return the correct index of the minimum and maximum values in the above array.

numpy.nanargmin(a)
numpy.nanargmax(a)

NumPy nanargmix and nanargmax

 

Sort NumPy Array Containing Floats

NumPy handles float data type seamlessly, and sorting one does not require any extra work. You can pass a float array the same way as you pass any other array.

a = numpy.array([[10.3, 11.42, 10.002, 22.2], [7.08, 7.089, 10.20, 12.2], [7.4, 8.09, 3.6, 17]])

numpy.sort(a)

NumPy sort float

 

Conclusion

NumPy’s wide range of sorting functions make it easy to sort arrays for any task. Whether you’re working with a 1-D array or a multidimensional array, NumPy sorts it for you efficiently and in a concise code.

Here, we have discussed just a few capabilities of NumPy’s sort functions. To explore other possibilities, you can check out NumPy’s official documentation.

Leave a Reply

Your email address will not be published.