Skip to content Skip to sidebar Skip to footer

How To Find The Closest Match Based On 2 Keys From One Dataframe To Another?

I have 2 dataframes I'm working with. One has a bunch of locations and coordinates (longitude, latitude). The other is a weather data set with data from weather stations all over t

Solution 1:

Let's say you have a distance function dist that you want to minimize:

def dist(lat1, long1, lat2, long2):
    return np.abs((lat1-lat2)+(long1-long2))

For a given location, you can find the nearest station as follows:

lat = 39.463744
long = -76.119411
weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)

This will calculate the distance to all weather stations. Using idxmin you can find the closest station name:

distances = weather.apply(
    lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
    axis=1)
weather.loc[distances.idxmin(), 'StationName']

Let's put all this in a function:

def find_station(lat, long):
    distances = weather.apply(
        lambda row: dist(lat, long, row['Latitude'], row['Longitude']), 
        axis=1)
    return weather.loc[distances.idxmin(), 'StationName']

You can now get all the nearest stations by applying it to the locations dataframe:

locations.apply(
    lambda row: find_station(row['Latitude'], row['Longitude']), 
    axis=1)

Output:

0         WALTHAM
1         WALTHAM
2    PORTST.LUCIE
3         WALTHAM
4    PORTST.LUCIE

Solution 2:

So I appreciate that this is a bit messy, but I used something similar to match genetic data between tables. It relies on the location file longitude and latitude being within 5 of those on the weather file, but these can be changed if need be.

rows=range(location.shape[0])
weath_rows = range(weather.shape[0])
for r in rows:
    lat = location.iloc[r,1]
    max_lat = lat +5
    min_lat = lat -5
    lon = location.iloc[r,2]
    max_lon = lon +5
    min_lon = lon -5
    for w in weath_rows:
        if (min_lat <= weather.iloc[w,2] <= max_lat) and (min_lon <= weather.iloc[w,3] <= max_lon):
            location['Station_Name'] = weather.iloc[w,1]

Post a Comment for "How To Find The Closest Match Based On 2 Keys From One Dataframe To Another?"