Improving Performance Of A Function In Python
Solution 1:
Your 'idtile's appear to be in a certain order. That is, the sample data suggests that once you traverse through a certain 'idtile' and hit the next, there is no chance that a line with that 'idtile' will show up again. If this is the case, you may break the for
loop once you finish dealing with the 'idtile' you want and hit a different one. Off the top of my head:
loopkiller = falsefor line infile(name, mode="r"):
element = line.split()
if (int(element[0]),int(element[1])) == idtile:
lst.append(element[2:])
dy, dx = int(element[0]),int(element[1])
loopkiller = true
elif loopkiller:
break;
This way, once you are done with a certain 'idtile', you stop; whereas in your example, you keep on reading until the end of the file.
If your idtiles appear in a random order, maybe you could try writing an ordered version of your file first.
Also, evaluating the digits of your idtiles seperately may help you traverse the file faster. Supposing your idtile
is a two-tuple of one-digit and three-digit integers, perhaps something along the lines of:
for line in file(name, mode="r"):
element = line.split()
ifint(element[0][0]) == idtile[0]:
if element[1][0] == str(idtile[1])[0]:
if element[1][1] == str(idtile[1])[1]:
if element[1][2] == str(idtile[1])[2]:
dy, dx = int(element[0]),int(element[1])
else go_forward(walk)
else go_forward(run)
else go_forward(sprint)
else go_forward(warp)
Solution 2:
I would suggest to compare the times used for your full reading procedure and for just reading lines and doing nothing to them. If those times are close, the only thing you can really do is to change approach (splitting your files etc.), for what you can probably optimize is data processing time, not file reading time.
I also see two moments in your code that are worth fixing:
withopen(name) as f:
for line in f:
pass#Here goes the loop body
Use
with
to explicitly close your file. Your solution should work in CPython, but that depends on implementation and may not be that effective always.You perform transformation of a string to
int
twice. It is a relatively slow operation. Remove the second one by reusing the result.
P.S. It looks like an array of depth or height values for a set of points on Earth surface, and the surface is split in tiles. :-)
Solution 3:
I suggest you change your code so that you read the big file once and write (temporary) files for each tile id. Something like:
defcreate_files(name, idtiles=None):
files = {}
for line inopen(name):
elements = line.split()
idtile = (int(elements[0]), int(elements[1]))
if idtiles isnotNoneand idtile notin idtiles:
continueif idtile notin files:
files[idtile] = open("tempfile_{}_{}".format(elements[0], elements[1]), "w")
print >>files[idtile], line
for f in files.itervalues():
f.close()
return files
create_files()
will return a {(tilex, tiley): fileobject}
dictionary.
A variant that closes the files after writing each line, to work around the "Too many open files" error. This variant returns a {(tilex, tiley: filename}
dictionary. Will probably be a bit slower.
defcreate_files(name, idtiles=None):
files = {}
for line inopen(name):
elements = line.split()
idtile = (int(elements[0]), int(elements[1]))
if idtiles isnotNoneand idtile notin idtiles:
continue
filename = "tempfile_{}_{}".format(elements[0], elements[1])
files[idtile] = filename
withopen(filename, "a") as f:
print >>f, line
return files
Solution 4:
My solution is split the large text file into many small binary file for each idtile. To read the text file faster, you can use pandas:
import pandas as pd
import numpy as np
n = 400000 # read n rows as one block
for df in pd.read_table(large_text_file, sep=" ", comment=",", header=None, chunksize=n):
for key, g in df.groupby([0, 1]):
fn = "%d_%d.tmp" % key
with open(fn, "ab") as f:
data = g.ix[:, 2:5].values
data.tofile(f)
Then you can get content of one binary file by:
np.fromfile("0_273.tmp").reshape(-1, 4)
Solution 5:
You can avoid doing the split()
and int()
on every line by doing a string comparison instead:
deffile_filter(name,idtile):
lst = []
id_str = "%d %d " % idtile
withopen(name) as f:
for line in f:
if line.startswith(id_str):
element = line.split() # add value
lst.append(element[2:])
dy, dx = int(element[0]),int(element[1])
return(lst, dy, dx)
Post a Comment for "Improving Performance Of A Function In Python"