Skip to content Skip to sidebar Skip to footer

How To Save Words In A Csv File Tokenized From Articles With Sentence Id Number?

I am trying to extract all words from articles stored in CSV file and write sentence id number and containing words to a new CSV file. What I have tried so far, import pandas as pd

Solution 1:

Just need to iterate through the words and write a new line for each.

Going to be a bit unpredictable since you have commas as "words" as well - might want to consider another delimiter or strip the commas from your words list.

EDIT: This seems like a little cleaner way to do it.

import pandas as pd
from nltk.tokenize import sent_tokenize, word_tokenize

df = pd.read_csv(r"D:\data.csv", nrows=10)
sentences = tokenizer.tokenize(df['articles'[row]])
f = open('output.csv','w+')
stcNum = 1for stc in sentences:
  for word in stc:
    prntLine = ','if word == stc[0]:
      prntLine = str(stcNum) + prntLine
    prntLine = prntLine + word + '\r\n'
    f.write(prntLine)
  stcNum += 1

f.close()

output.csv:

1,The
,ultimate
,productivity
,hack
,is
,saying
,no
,.
2,Not
,doing
,something
,will
,always
,be
,faster
,than
,doing
,it
,.
3,This
,statement
,reminds
,me
,of
,the
,old
,computer
,programming
,saying
,,     # <<< Most CSV parsers will see thisas3 empty columns
,“
,Remember
,that
,there
,is
,no
,code
,faster
,than
,no
,code
,.
,”

Post a Comment for "How To Save Words In A Csv File Tokenized From Articles With Sentence Id Number?"