Split File After X Lines At Blank Line
Solution 1:
from itertools import groupby
withopen(myfile, 'r') as f:
chunks = [[x.strip() for x in v] for k, v in
groupby(f, lambda x: x.strip()) if k]
Solution 2:
If you want to write new chunk1.txt ... chunkN.txt for each chunk, you could do this in a manner like this:
defchunk_file(name, lines_per_chunk, chunks_per_file):
defwrite_chunk(chunk_no, chunk):
withopen("chunk{}.txt".format(chunk_no), "w") as outfile:
outfile.write("".join(i for i in chunk))
count, chunk_no, chunk_count, chunk = 1, 1, 0, []
withopen(name, "r") as f:
for row in f:
if count > lines_per_chunk and row == "\n":
chunk_count += 1
count = 1
chunk.append("\n")
if chunk_count == chunks_per_file:
write_chunk(chunk_no, chunk)
chunk = []
chunk_count = 0
chunk_no += 1else:
count += 1
chunk.append(row)
if chunk:
write_chunk(chunk_no, chunk)
chunk_file("test.txt", 3, 1)
You have to specify the lines which belong to a chunk, after which a newline is anticipated.
Say you want to chunk this file:
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
The first chunk differs strongly in line count from the second chunk. (7 lines vs 3 lines)
The output for this example would be chunk1.txt:
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
Some Data belonnging to chunk 1
And chunk2.txt:
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
More Data, belonnging to chunk 2
This approach assumes that lines_per_chunk is a minimum chunk size, so it works even if the chunks have different line counts. We are only looking for a blank line to end the chunk, when the minimum chunk size is reached. In the above example it is no problem, that there is a blank line on line 2, since the minimum chunk size is not reached yet. If a blank line occurs on line 4 and the chunk data continues afterwards, there would be a problem, since the criterion specified (line numbers and blank lines) could not identify chunks alone.
Post a Comment for "Split File After X Lines At Blank Line"