How To Convert A Pandas Dataframe Into A Numpy Array With The Column Names
This must use vectorized methods, nothing iterative I would like to create a numpy array from pandas dataframe. My code: import pandas as pd _df = pd.DataFrame({'itme': ['book',
Solution 1:
- do a quick search for a val by their "item" and "color" with one of the following options:
- Use pandas Boolean indexing
- Convert the dataframe into a
numpy.recarry
usingpandas.DataFrame.to_records
, and also use Boolean indexing
.item
is a method for bothpandas
andnumpy
, so don't use'item'
as a column name. It has been changed to'_item'
.- As an FYI,
numpy
is apandas
dependency, and much ofpandas
vectorized functionality directly corresponds tonumpy
.
import pandas as pd
import numpy as np
# test data
df = pd.DataFrame({'_item': ['book', 'book' , 'car', 'car', 'bike', 'bike'], 'color': ['green', 'blue' , 'red', 'green' , 'blue', 'red'], 'val' : [-22.7, -109.6, -57.19, -11.2, -25.6, -33.61]})
# Use pandas Boolean index to
selected = df[(df._item == 'book') & (df.color == 'blue')]
# print(selected)
_item color val
book blue -109.6
# Alternatively, create a recarray
v = df.to_records(index=False)
# display(v)
rec.array([('book', 'green', -22.7 ), ('book', 'blue', -109.6 ),
('car', 'red', -57.19), ('car', 'green', -11.2 ),
('bike', 'blue', -25.6 ), ('bike', 'red', -33.61)],
dtype=[('_item', 'O'), ('color', 'O'), ('val', '<f8')])
# search the recarray
selected = v[(v._item == 'book') & (v.color == 'blue')]
# print(selected)
[('book', 'blue', -109.6)]
Update in response to OP edit
- You must first reshape the dataframe using
pandas.DataFrame.pivot
, and then use the previously mentioned methods.
dfp = df.pivot(index='_item', columns='color', values='val')
# display(dfp)
color blue green red
_item
bike -25.6 NaN -33.61
book -109.6 -22.7 NaN
car NaN -11.2 -57.19
# create a numpy recarray
v = dfp.to_records(index=True)
# display(v)
rec.array([('bike', -25.6, nan, -33.61),
('book', -109.6, -22.7, nan),
('car', nan, -11.2, -57.19)],
dtype=[('_item', 'O'), ('blue', '<f8'), ('green', '<f8'), ('red', '<f8')])
# select data
selected = v.blue[(v._item == 'book')]
# print(selected)
array([-109.6])
Post a Comment for "How To Convert A Pandas Dataframe Into A Numpy Array With The Column Names"