Skip to content Skip to sidebar Skip to footer

Checking In Between Values With Numpy Python

I am trying to convert the code down below to the Numpy version. The vanilla python code checks the previous and current values of Formating and checks to see if any of the Numbers

Solution 1:

The answer from a previous question:

In [173]: Numbers = np.array([3, 4, 5, 7, 8, 10,20])
     ...: Formating = np.array([0, 2 , 5, 12, 15, 22])
     ...: x = np.sort(Numbers);
     ...: l = np.searchsorted(x, Formating, side='left')
     ...: 
In [174]: l
Out[174]: array([0, 0, 2, 6, 6, 7])
In [175]: for i in range(len(l)-1):
     ...:     if l[i] >= l[i+1]:
     ...:         print('Numbers between %d,%d = _0_' % (Formating[i], Formating[i+1]))
     ...:     else:
     ...:         print('Numbers between %d,%d = %s' % (Formating[i], Formating[i+1], ','.jo
     ...: in(map(str, list(x[l[i]:l[i+1]])))))
     ...: 
Numbers between 0,2 = _0_
Numbers between 2,5 = 3,4
Numbers between 5,12 = 5,7,8,10
Numbers between 12,15 = _0_
Numbers between 15,22 = 20

Something that works fine with lists - in fact faster with lists than arrays:

In[182]: foriinrange(len(Formating)-1):
     ...:     print([x for x in Numbers if (Formating[i]<=x<Formating[i+1])])
     ...: 
[][3, 4][5, 7, 8, 10][][20]

A version with iteration on Formating, but not Numbers. Rather similar to the version using searchsorted. I'm not sure which will be faster:

In [177]: for i in range(len(Formating)-1):
     ...:     idx = (Formating[i]<=Numbers)&(Numbers<Formating[i+1])
     ...:     print(Numbers[idx])
     ...: 
[]
[34]
[ 57810]
[]
[20]

We could get the idx mask for all values of Formating at once:

In [183]: mask=(Formating[:-1,None]<=Numbers)&(Numbers<Formating[1:,None])
In [184]: mask
Out[184]: 
array([[False, False, False, False, False, False, False],
       [ True,  True, False, False, False, False, False],
       [False, False,  True,  True,  True,  True, False],
       [False, False, False, False, False, False, False],
       [False, False, False, False, False, False,  True]])
In [185]: N=Numbers[:,None].repeat(5,1).T   # 5= len(Formating)-1In [186]: N
Out[186]: 
array([[ 3,  4,  5,  7,  8, 10, 20],
       [ 3,  4,  5,  7,  8, 10, 20],
       [ 3,  4,  5,  7,  8, 10, 20],
       [ 3,  4,  5,  7,  8, 10, 20],
       [ 3,  4,  5,  7,  8, 10, 20]])
In [187]: np.ma.masked_array(N,~mask)
Out[187]: 
masked_array(
  data=[[--, --, --, --, --, --, --],
        [3, 4, --, --, --, --, --],
        [--, --, 5, 7, 8, 10, --],
        [--, --, --, --, --, --, --],
        [--, --, --, --, --, --, 20]],
  mask=[[ True,  True,  True,  True,  True,  True,  True],
        [False, False,  True,  True,  True,  True,  True],
        [ True,  True, False, False, False, False,  True],
        [ True,  True,  True,  True,  True,  True,  True],
        [ True,  True,  True,  True,  True,  True, False]],
  fill_value=999999)

Your lists are apparent there. But the list display still requires iteraiton:

In[188]: forrowinmask:
     ...:     print(Numbers[row])
[][3 4][ 5  7  8 10][][20]

I'll let you time test these alternatives with this or more realistic data. I suspect a pure list version is fastest for small problems, but I'm not sure how the others will scale.

edit

Following questions ask about sums. np.ma.sum, or the masked arrays own sum method, sums the unmasked values, effectively filling the masked values with 0.

In [253]: np.ma.masked_array(N,~mask).sum(axis=1)
Out[253]: 
masked_array(data=[--, 7, 30, --, 20],
             mask=[ True, False, False,  True, False],
       fill_value=999999)

In [256]: np.ma.masked_array(N,~mask).filled(0)
Out[256]: 
array([[ 0,  0,  0,  0,  0,  0,  0],
       [ 3,  4,  0,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  8, 10,  0],
       [ 0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 20]])

Actually we don't need to use the masked array mechanism to get here (though it can be nice visually):

In [258]: N*mask
Out[258]: 
array([[ 0,  0,  0,  0,  0,  0,  0],
       [ 3,  4,  0,  0,  0,  0,  0],
       [ 0,  0,  5,  7,  8, 10,  0],
       [ 0,  0,  0,  0,  0,  0,  0],
       [ 0,  0,  0,  0,  0,  0, 20]])
In [259]: (N*mask).sum(axis=1)
Out[259]: array([ 0,  7, 30,  0, 20])

Post a Comment for "Checking In Between Values With Numpy Python"