Sorting A Numpy Array Based On Data From Another Array
Solution 1:
The numpy_indexed package (disclaimer: I am its author) can be used to efficiently and elegantly solve these kind of problems:
import numpy_indexed as npi
result[npi.indices(result[:, :-1], data)]
npi.indices is essentially a vectorized equivalent of list.index; so for each element (row) in data, we get where that same row is located in result, minus the last column.
Note that this solution works for any number of columns, and is fully vectorized (ie, no python loops anywhere).
Solution 2:
Approach #1
Here's an approach considering each row as an indexing tuple and then finding the matching indices between data
and result
corresponding to those linear index equivalents. These indices would represent the new order of rows, which when indexed into result would give us the desired output. The implementation would look like this -
# Slice outfromresult everything except the lastcolumn
r =result[:,:-1]
# Get linear indices equivalent ofeachrowfrom r and data
ID1 = np.ravel_multi_index(r.T,r.max(0)+1)
ID2 = np.ravel_multi_index(data.T,r.max(0)+1)
# Searchfor ID2 in ID1 and use those indices index intoresultout=result[np.where(ID1[:,None] == ID2)[1]]
Approach #2
If all the rows from data
are guaranteed to be in result
, you can use another approach based on just argsort
, like so -
# Slice out from result everything except the last column r = result[:,:-1]
# Get linear indices equivalent of each row from r and dataID1 = np.ravel_multi_index(r.T,r.max(0)+1)
ID2 = np.ravel_multi_index(data.T,r.max(0)+1)
sortidx_ID1 = ID1.argsort()
sortidx_ID2 = ID2.argsort()
out = result[sortidx_ID1[sortidx_ID2]]
Sample run for a bit more generic case -
In [37]: data
Out[37]:
array([[ 3, 2, 1, 5],
[ 4, 9, 2, 4],
[ 7, 3, 9, 11],
[ 5, 9, 4, 4]])
In [38]: result
Out[38]:
array([[ 7, 3, 9, 11, 55],
[ 4, 9, 2, 4, 8],
[ 3, 2, 1, 5, 7],
[ 5, 9, 4, 4, 88]])
In [39]: r = result[:,:-1]
...: ID1 = np.ravel_multi_index(r.T,r.max(0)+1)
...: ID2 = np.ravel_multi_index(data.T,r.max(0)+1)
...:
In [40]: result[np.where(ID1[:,None] == ID2)[1]] # Approach 1
Out[40]:
array([[ 3, 2, 1, 5, 7],
[ 4, 9, 2, 4, 8],
[ 7, 3, 9, 11, 55],
[ 5, 9, 4, 4, 88]])
In [41]: sortidx_ID1 = ID1.argsort() # Approach 2
...: sortidx_ID2 = ID2.argsort()
...:
In [42]: result[sortidx_ID1[sortidx_ID2]]
Out[42]:
array([[ 3, 2, 1, 5, 7],
[ 4, 9, 2, 4, 8],
[ 7, 3, 9, 11, 55],
[ 5, 9, 4, 4, 88]])
Solution 3:
Just to try to clarify what you are doing. With an index list [2,1,0,3]
I can reorder the rows of result
thus:
In [37]: result[[2,1,0,3],:]
Out[37]:
array([[0, 1, 0, 0, 1],
[1, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 1, 0, 1, 0]])
In [38]: result[[2,1,0,3],:4]==data
Out[38]:
array([[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True],
[ True, True, True, True]], dtype=bool)
I don't see how argsort
or sort
is going to help come up with this indexing order.
With np.lexsort
I can order the rows of both arrays the same:
In [54]: data[np.lexsort(data.T),:]
Out[54]:
array([[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 1, 0],
[0, 1, 0, 1]])
In [55]: result[np.lexsort(result[:,:-1].T),:]
Out[55]:
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 1],
[0, 1, 1, 0, 1],
[0, 1, 0, 1, 0]])
I found by trial and error that I needed to use the transpose. We need to check the docs of lexsort
to understand why.
A little more trial and error produces:
In [66]: i=np.lexsort(data.T)
In [67]: j=np.lexsort(result[:,:-1].T)
In [68]: j[i]
Out[68]: array([2, 1, 0, 3], dtype=int64)
In [69]: result[j[i],:]
Out[69]:
array([[0, 1, 0, 0, 1],
[1, 0, 0, 0, 0],
[0, 1, 1, 0, 1],
[0, 1, 0, 1, 0]])
This is a tentative solution. It needs to be tested on other samples. And needs to be explained.
Post a Comment for "Sorting A Numpy Array Based On Data From Another Array"