Pandas Data Frame Removing The First Row Of Every Numbers

December 15, 2022 Post a Comment

So, basically I have a data frame that has the first column looks like this: #1 #2 #2 #3 #3 #3 #3 #4 #4 #5 As you can see, first column is consisting of randomly repeated numbers

Solution 1:

Let's assume you have a dataframe with two columns named df

Setup

col1 = """#1
#2
#2
#3
#3
#3
#3
#4
#4
#5""".splitlines()

df = pd.DataFrame(dict(col1=col1, col2=3.14))

df

  col1  col2
0   #1  3.14
1   #2  3.14
2   #2  3.14
3   #3  3.14
4   #3  3.14
5   #3  3.14
6   #3  3.14
7   #4  3.14
8   #4  3.14
9   #5  3.14

Solution
We can use Numpy's unique function with the return_index set to True. What that does is return the position of the first instance of each unique value. We then use that to identify index values and drop them.

_, i = np.unique(df.col1.values, return_index=True)
df.drop(df.index[i]).assign(col1=lambda d: d.col1.str[1:])

  col1  col2
2    2  3.14
4    3  3.14
5    3  3.14
6    3  3.14
8    4  3.14

Solution 2:

Use duplicated with boolean indexing, last remove # by position with str[1:] or by str.strip:

print (df)
    a
0  #1
1  #2
2  #2
3  #3
4  #3
5  #3
6  #3
7  #4
8  #4
9  #5

df = df.loc[df['a'].duplicated(), 'a'].str[1:]
print (df)
2    2
4    3
5    3
6    3
8    4
Name: a, dtype: object

Or:

df = df.loc[df['a'].duplicated(), 'a'].str.strip('#')
print (df)
2    2
4    3
5    3
6    3
8    4
Name: a, dtype: object

Detail:

print (df['a'].duplicated())
0    False
1    False
2     True
3    False
4     True
5     True
6     True
7    False
8     True
9    False
Name: a, dtype: bool

EDIT:

df = df[df['a'].duplicated()]
df['a'] = df['a'].str.strip('#')

Learn Python Programming

Pandas Data Frame Removing The First Row Of Every Numbers

Solution 1:

Solution 2:

Post a Comment for "Pandas Data Frame Removing The First Row Of Every Numbers"