Skip to content Skip to sidebar Skip to footer

Vectorizing Euclidean Distance Computation - NumPy

my question regards the vectorization of my code. I have one array that holds 3D-coordinates and one array that holds the information of edges that connect the coordinates: In [8]:

Solution 1:

Approach #1

We could slice out the first and second columns altogether for indexing into coords instead of iterating for each element along them and perform the euclidean distance computations that involves element-wise squaring and summing along each row and then getting the element-wise square-root. Finally, we need to sum all those values for one scalar as shown in the original code.

Thus, one vectorized implementation would be -

np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

There's a built-in in NumPy to do those distance computing operations as np.linalg.norm. In terms of performance, I would think it would be comparable to what we have just listed earlier. For the sake of completeness, the implementation would be -

np.linalg.norm(coords[edges[:, 0]] - coords[edges[:, 1]],axis=1).sum()

Approach #2

Tweaking the earlier approach, we could use np.einsum that in one step would perform both squaring and summing along each row and as such would be a bit more efficient.

The implementation would look something like this -

s = coords[edges[:, 0]] - coords[edges[:, 1]]
out = np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

Runtime test

Function definitions -

def networkLength(coords, edges): # Original code from question
   distancesNetwork = np.array([])    
   for i in range(edges.shape[0]):
        distancesNetwork = np.append(distancesNetwork, \
        distance.euclidean(coords[edges[i, 0]], coords[edges[i, 1]]))
   return sum(distancesNetwork)

def vectorized_app1(coords, edges):
    return np.sqrt(((coords[edges[:, 0]] - coords[edges[:, 1]])**2).sum(1)).sum()

def vectorized_app2(coords, edges):
    s = coords[edges[:, 0]] - coords[edges[:, 1]]
    return np.sqrt(np.einsum('ij,ij->i',s,s)).sum()

Verification and Timings -

In [114]: # Setup bigger inputs
     ...: coords = np.random.rand(100,3)
     ...: edges = np.random.randint(0,100,(10000,2))

# Verify results across all approaches
In [115]: networkLength(coords, edges)
Out[115]: 6607.8829431403547

In [116]: vectorized_app1(coords, edges)
Out[116]: 6607.8829431403337

In [117]: vectorized_app2(coords, edges)
Out[117]: 6607.8829431403337

In [118]: %timeit networkLength(coords, edges)
     ...: %timeit vectorized_app1(coords, edges)
     ...: %timeit vectorized_app2(coords, edges)
     ...: 
1 loops, best of 3: 519 ms per loop
1000 loops, best of 3: 822 µs per loop
1000 loops, best of 3: 668 µs per loop

Post a Comment for "Vectorizing Euclidean Distance Computation - NumPy"