Skip to content Skip to sidebar Skip to footer

Pandas Dataframe Count Unique List

If the type of a column in dataframe is int, float or string, we can get its unique values with columnName.unique(). But what if this column is a list, e.g. [1, 2, 3]. How could I

Solution 1:

I think you can convert values to tuples and then unique works nice:

df = pd.DataFrame({'col':[[1,1,2],[2,1,3,3],[1,1,2],[1,1,2]]})
print (df)
            col
0     [1, 1, 2]
1  [2, 1, 3, 3]
2     [1, 1, 2]
3     [1, 1, 2]

print (df['col'].apply(tuple).unique())

[(1, 1, 2) (2, 1, 3, 3)]

L = [list(x) for x in df['col'].apply(tuple).unique()]
print (L)

[[1, 1, 2], [2, 1, 3, 3]]

Solution 2:

You cannot apply unique() on a non-hashable type such as list. You need to convert to a hashable type to do that.

A better solution using the latest version of pandas is to use duplicated() and you avoid iterating over the values to convert to list again.

df[~df.col.apply(tuple).duplicated()]

That would return as lists the unique values.

Post a Comment for "Pandas Dataframe Count Unique List"