Comparing Two Text File And Find Out The Related Word In Python
i am having two text file named search.txt and log.txt which contain some data like below. search.txt 19:00:15 , mouse , FALSE 19:00:15 , branded luggage bags and trolley , TRUE
Solution 1:
While it is still unclear what you mean by a "partial query", the code below can do that, simply by you redefining a partial query in the function filter_out_common_queries
. E.g. if you are looking for an exact match of the query in search.txt
, you could replace # add your logic here
by return [' '.join(querylist), ]
.
import datetime as dt
from collections import defaultdict
deffilter_out_common_queries(querylist):
# add your logic herereturn querylist
queries_time = defaultdict(list) # personally, I'd use 'set' as the default factorywithopen('log.txt') as f:
for line in f:
fields = [ x.strip() for x in line.split(',') ]
timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
queries_time[fields[1]].append(timestamp)
withopen('search.txt') as inputf, open('search_output.txt', 'w') as outputf:
for line in inputf:
fields = [ x.strip() for x in line.split(',') ]
timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
queries = filter_out_common_queries(fields[1].split()) # "adidas watches for men" -> "adidas" "watches" "for" "men". "for" is a very generic keyword. You should do well to filter these out
results = []
for q in queries:
poss_timestamps = queries_time[q]
for ts in poss_timestamps:
if timestamp - dt.timedelta(seconds=15) <= ts <= timestamp:
results.append(q)
outputf.write(line.strip() + " - {}\n".format(results))
Output based on your input data:
19:00:15 , mouse , FALSE - []
19:00:15 , branded luggage bags and trolley , TRUE - []
19:00:15 , Leather shoes for men , FALSE - []
19:00:15 , printers , TRUE - []
19:00:16 , adidas watches for men , TRUE - ['adidas', 'adidas', 'adidas', 'adidas', 'adidas', 'adidas']
19:00:16 , Mobile Charger Stand/Holder black , FALSE - ['black']
19:00:16 , watches for men , TRUE - []
Remark that a match for 'black' in "Mobile Charger Stand/Holder black" was found. That's because in the code above, I looked for each separate word in itself.
Edit: To implement your comment, you would redefine filter_out_common_queries
like so:
deffilter_out_common_queries(querylist):
basequery = ' '.join(querylist)
querylist = []
for n inrange(2,len(basequery)+1):
querylist.append(basequery[:n])
return querylist
Solution 2:
- Read
log.txt
file and get all keywords count from this file by usingsplit()
method andcollections
module. Target second word of each line of the log file. - Now we have all keywords with counter.
- Read
search.txt
file by line. - Get target word from the each line i.e. second word by split by
,
. - Use
filter
andlambda
to searched keywords form the selected text(4) - Get Count value from our dictionary and use string formatting and join method to create new line according to requirement.
- Write create line into new file.
Code:
p1 = "/home/infogrid/Desktop/search.txt"
p2 = "/home/infogrid/Desktop/log.txt"
p3 = "/home/infogrid/Desktop/search_output.txt"from collections import Counter
cnt = Counter()
withopen(p2, "rb") as fp:
for i in fp.readlines():
cnt[(i.split(",")[1].strip())] += 1
search_keys = cnt.keys()
withopen(p1, "rb") as fp:
withopen(p3,"wb") as fp3:
for i in fp.readlines():
i = i.strip()
tmp = i.split(",")[1].strip()
tmp1 = filter(lambda x: x in tmp, search_keys)
fp3.write("%s - [%s]\n"%\
(i, ",".join([",".join([j]*cnt[j]) for j in tmp1])))
Output:
19:00:15 , mouse , FALSE - []19:00:15 , branded luggage bags and trolley , TRUE - []19:00:15 , Leather shoes for men , FALSE - []19:00:15 , printers , TRUE - []19:00:16 , adidas watches for men , TRUE - [adidas,adidas,adidas,adidas,adidas]19:00:16 , Mobile Charger Stand/Holder black , FALSE - []19:00:16 , watches for men , TRUE - []
Note: Try your self first.
Post a Comment for "Comparing Two Text File And Find Out The Related Word In Python"