Pandas Dataframe Count Unique List
If the type of a column in dataframe is int, float or string, we can get its unique values with columnName.unique(). But what if this column is a list, e.g. [1, 2, 3]. How could I
Solution 1:
I think you can convert values to tuples and then unique
works nice:
df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]})
print (df)
col
0 [1, 1, 2]
1 [2, 1, 3, 3]
2 [1, 1, 2]
3 [1, 1, 2]
print (df['col'].apply(tuple).unique())
[(1, 1, 2) (2, 1, 3, 3)]
L = [list(x) for x in df['col'].apply(tuple).unique()]
print (L)
[[1, 1, 2], [2, 1, 3, 3]]
Solution 2:
You cannot apply unique()
on a non-hashable type such as list. You need to convert to a hashable type to do that.
A better solution using the latest version of pandas is to use duplicated()
and you avoid iterating over the values to convert to list again.
df[~df.col.apply(tuple).duplicated()]
That would return as lists the unique values.
Post a Comment for "Pandas Dataframe Count Unique List"