Pandas Data Frame Removing The First Row Of Every Numbers
So, basically I have a data frame that has the first column looks like this: #1 #2 #2 #3 #3 #3 #3 #4 #4 #5 As you can see, first column is consisting of randomly repeated numbers
Solution 1:
Let's assume you have a dataframe with two columns named df
Setup
col1 = """#1
#2
#2
#3
#3
#3
#3
#4
#4
#5""".splitlines()
df = pd.DataFrame(dict(col1=col1, col2=3.14))
df
col1 col2
0 #1 3.14
1 #2 3.14
2 #2 3.14
3 #3 3.14
4 #3 3.14
5 #3 3.14
6 #3 3.14
7 #4 3.14
8 #4 3.14
9 #5 3.14
Solution
We can use Numpy's unique
function with the return_index
set to True
. What that does is return the position of the first instance of each unique value. We then use that to identify index values and drop them.
_, i = np.unique(df.col1.values, return_index=True)
df.drop(df.index[i]).assign(col1=lambda d: d.col1.str[1:])
col1 col2
2 2 3.14
4 3 3.14
5 3 3.14
6 3 3.14
8 4 3.14
Solution 2:
Use duplicated
with boolean indexing
, last remove #
by position with str[1:]
or by str.strip
:
print (df)
a
0 #1
1 #2
2 #2
3 #3
4 #3
5 #3
6 #3
7 #4
8 #4
9 #5
df = df.loc[df['a'].duplicated(), 'a'].str[1:]
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object
Or:
df = df.loc[df['a'].duplicated(), 'a'].str.strip('#')
print (df)
2 2
4 3
5 3
6 3
8 4
Name: a, dtype: object
Detail:
print (df['a'].duplicated())
0 False
1 False
2 True
3 False
4 True
5 True
6 True
7 False
8 True
9 False
Name: a, dtype: bool
EDIT:
df = df[df['a'].duplicated()]
df['a'] = df['a'].str.strip('#')
Post a Comment for "Pandas Data Frame Removing The First Row Of Every Numbers"