Sorting A Numpy Array Based On Data From Another Array

July 09, 2024 Post a Comment

I have two sets of array data and result. result contains the same elements in data but with an extra column and in unsorted order. I want to rearrange the result array so that it

Solution 1:

The numpy_indexed package (disclaimer: I am its author) can be used to efficiently and elegantly solve these kind of problems:

import numpy_indexed as npi
result[npi.indices(result[:, :-1], data)]

npi.indices is essentially a vectorized equivalent of list.index; so for each element (row) in data, we get where that same row is located in result, minus the last column.

Note that this solution works for any number of columns, and is fully vectorized (ie, no python loops anywhere).

Solution 2:

Approach #1

Here's an approach considering each row as an indexing tuple and then finding the matching indices between data and result corresponding to those linear index equivalents. These indices would represent the new order of rows, which when indexed into result would give us the desired output. The implementation would look like this -

# Slice outfromresult everything except the lastcolumn       
r =result[:,:-1]       

# Get linear indices equivalent ofeachrowfrom r and data
ID1 = np.ravel_multi_index(r.T,r.max(0)+1)
ID2 = np.ravel_multi_index(data.T,r.max(0)+1)

# Searchfor ID2 in ID1 and use those indices index intoresultout=result[np.where(ID1[:,None] == ID2)[1]]

Approach #2

If all the rows from data are guaranteed to be in result, you can use another approach based on just argsort, like so -

# Slice out from result everything except the last column       r = result[:,:-1]       

# Get linear indices equivalent of each row from r and dataID1 = np.ravel_multi_index(r.T,r.max(0)+1)
ID2 = np.ravel_multi_index(data.T,r.max(0)+1)   

sortidx_ID1 = ID1.argsort()
sortidx_ID2 = ID2.argsort()
out = result[sortidx_ID1[sortidx_ID2]]

Sample run for a bit more generic case -

Baca Juga

In [37]: data
Out[37]: 
array([[ 3,  2,  1,  5],
       [ 4,  9,  2,  4],
       [ 7,  3,  9, 11],
       [ 5,  9,  4,  4]])

In [38]: result
Out[38]: 
array([[ 7,  3,  9, 11, 55],
       [ 4,  9,  2,  4,  8],
       [ 3,  2,  1,  5,  7],
       [ 5,  9,  4,  4, 88]])

In [39]: r = result[:,:-1]
    ...: ID1 = np.ravel_multi_index(r.T,r.max(0)+1)
    ...: ID2 = np.ravel_multi_index(data.T,r.max(0)+1)
    ...: 

In [40]: result[np.where(ID1[:,None] == ID2)[1]] # Approach 1
Out[40]: 
array([[ 3,  2,  1,  5,  7],
       [ 4,  9,  2,  4,  8],
       [ 7,  3,  9, 11, 55],
       [ 5,  9,  4,  4, 88]])

In [41]: sortidx_ID1 = ID1.argsort()  # Approach 2
    ...: sortidx_ID2 = ID2.argsort()
    ...: 

In [42]: result[sortidx_ID1[sortidx_ID2]]
Out[42]: 
array([[ 3,  2,  1,  5,  7],
       [ 4,  9,  2,  4,  8],
       [ 7,  3,  9, 11, 55],
       [ 5,  9,  4,  4, 88]])

Solution 3:

Just to try to clarify what you are doing. With an index list [2,1,0,3] I can reorder the rows of result thus:

In [37]: result[[2,1,0,3],:]
Out[37]: 
array([[0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0],
       [0, 1, 1, 0, 1],
       [0, 1, 0, 1, 0]])

In [38]: result[[2,1,0,3],:4]==data
Out[38]: 
array([[ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True],
       [ True,  True,  True,  True]], dtype=bool)

I don't see how argsort or sort is going to help come up with this indexing order.

With np.lexsort I can order the rows of both arrays the same:

In [54]: data[np.lexsort(data.T),:]
Out[54]: 
array([[1, 0, 0, 0],
       [0, 1, 0, 0],
       [0, 1, 1, 0],
       [0, 1, 0, 1]])

In [55]: result[np.lexsort(result[:,:-1].T),:]
Out[55]: 
array([[1, 0, 0, 0, 0],
       [0, 1, 0, 0, 1],
       [0, 1, 1, 0, 1],
       [0, 1, 0, 1, 0]])

I found by trial and error that I needed to use the transpose. We need to check the docs of lexsort to understand why.

A little more trial and error produces:

In [66]: i=np.lexsort(data.T)
In [67]: j=np.lexsort(result[:,:-1].T)
In [68]: j[i]
Out[68]: array([2, 1, 0, 3], dtype=int64)

In [69]: result[j[i],:]
Out[69]: 
array([[0, 1, 0, 0, 1],
       [1, 0, 0, 0, 0],
       [0, 1, 1, 0, 1],
       [0, 1, 0, 1, 0]])

This is a tentative solution. It needs to be tested on other samples. And needs to be explained.

Learn Python Programming

Sorting A Numpy Array Based On Data From Another Array

Solution 1:

Solution 2:

Solution 3:

Post a Comment for "Sorting A Numpy Array Based On Data From Another Array"