Skip to content Skip to sidebar Skip to footer

Comparing Two Text File And Find Out The Related Word In Python

i am having two text file named search.txt and log.txt which contain some data like below. search.txt 19:00:15 , mouse , FALSE 19:00:15 , branded luggage bags and trolley , TRUE

Solution 1:

While it is still unclear what you mean by a "partial query", the code below can do that, simply by you redefining a partial query in the function filter_out_common_queries. E.g. if you are looking for an exact match of the query in search.txt, you could replace # add your logic here by return [' '.join(querylist), ].

import datetime as dt
from collections import defaultdict

deffilter_out_common_queries(querylist):
    # add your logic herereturn querylist

queries_time = defaultdict(list)  # personally, I'd use 'set' as the default factorywithopen('log.txt') as f:
    for line in f:
        fields = [ x.strip() for x in line.split(',') ]
        timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
        queries_time[fields[1]].append(timestamp)  

withopen('search.txt') as inputf, open('search_output.txt', 'w') as outputf:
    for line in inputf:
        fields = [ x.strip() for x in line.split(',') ]
        timestamp = dt.datetime.strptime(fields[0], "%H:%M:%S")
        queries = filter_out_common_queries(fields[1].split())  # "adidas watches for men" -> "adidas" "watches" "for" "men". "for" is a very generic keyword. You should do well to filter these out
        results = []
        for q in queries:
            poss_timestamps = queries_time[q]
            for ts in poss_timestamps:
                if timestamp - dt.timedelta(seconds=15) <= ts <= timestamp:
                    results.append(q)
        outputf.write(line.strip() + " - {}\n".format(results))

Output based on your input data:

19:00:15  , mouse , FALSE - []
19:00:15  , branded luggage bags and trolley , TRUE - []
19:00:15  , Leather shoes for men , FALSE - []
19:00:15  , printers , TRUE - []
19:00:16  , adidas watches for men , TRUE - ['adidas', 'adidas', 'adidas', 'adidas', 'adidas', 'adidas']
19:00:16  , Mobile Charger Stand/Holder black , FALSE - ['black']
19:00:16  , watches for men , TRUE - []

Remark that a match for 'black' in "Mobile Charger Stand/Holder black" was found. That's because in the code above, I looked for each separate word in itself.

Edit: To implement your comment, you would redefine filter_out_common_queries like so:

deffilter_out_common_queries(querylist):
    basequery = ' '.join(querylist)
    querylist = []
    for n inrange(2,len(basequery)+1):
        querylist.append(basequery[:n])
    return querylist

Solution 2:

  1. Read log.txt file and get all keywords count from this file by using split() method and collections module. Target second word of each line of the log file.
  2. Now we have all keywords with counter.
  3. Read search.txt file by line.
  4. Get target word from the each line i.e. second word by split by ,.
  5. Use filter and lambda to searched keywords form the selected text(4)
  6. Get Count value from our dictionary and use string formatting and join method to create new line according to requirement.
  7. Write create line into new file.

Code:

p1 = "/home/infogrid/Desktop/search.txt"
p2 = "/home/infogrid/Desktop/log.txt"
p3 = "/home/infogrid/Desktop/search_output.txt"from collections import Counter

cnt = Counter()
withopen(p2, "rb") as fp:
    for i in fp.readlines():
        cnt[(i.split(",")[1].strip())] += 1
search_keys = cnt.keys()

withopen(p1, "rb") as fp:
    withopen(p3,"wb") as fp3:
        for i in fp.readlines():
            i = i.strip()
            tmp = i.split(",")[1].strip()
            tmp1 = filter(lambda x: x in tmp, search_keys)
            fp3.write("%s - [%s]\n"%\
                      (i, ",".join([",".join([j]*cnt[j]) for j in tmp1])))

Output:

19:00:15  , mouse , FALSE - []19:00:15  , branded luggage bags and trolley , TRUE - []19:00:15  , Leather shoes for men , FALSE - []19:00:15  , printers , TRUE - []19:00:16  , adidas watches for men , TRUE - [adidas,adidas,adidas,adidas,adidas]19:00:16  , Mobile Charger Stand/Holder black , FALSE - []19:00:16  , watches for men , TRUE - []

Note: Try your self first.

Post a Comment for "Comparing Two Text File And Find Out The Related Word In Python"