Skip to content Skip to sidebar Skip to footer

How To Split A List Into Sublists Based On A Separator, Similar To Str.split()?

Given a list like: [a, SEP, b, c, SEP, SEP, d] how do I split it into a list of sublists: [[a], [b, c], [], [d]] Effectively I need an equivalent of str.split() for lists. I can

Solution 1:

A simple generator will work for all of the cases in your question:

defsplit(sequence, sep):
    chunk = []
    for val in sequence:
        if val == sep:
            yield chunk
            chunk = []
        else:
            chunk.append(val)
    yield chunk

Solution 2:

My first ever Python program :)

from pprint import pprint
my_array = ["a", "SEP", "SEP", "SEP"]
my_temp = []
my_final = []
for item in my_array:
  if item != "SEP":
    my_temp.append(item)
  else:
    my_final.append(my_temp);
    my_temp = []
pprint(my_final);

Solution 3:

I am not sure if there's an easy itertools.groupby solution, but here is an iterative approach that should work:

def mySplit(iterable, sep):
    output = []
    sepcount = 0
    current_output = []
    for i, elem in enumerate(iterable):
        if elem != sep:
            sepcount = 0
            current_output.append(elem)
            if (i==(len(iterable)-1)):
                output.append(current_output)
        else:
            if current_output: 
                output.append(current_output)
                current_output = []

            sepcount+=1if (i==0) or (sepcount > 1):
                output.append([])
            if (i==(len(iterable)-1)):
                output.append([])

    returnoutput

Testing on your examples:

testLists = [
    ['a', 'SEP', 'b', 'c', 'SEP', 'SEP', 'd'],
    ["a", "SEP", "SEP", "SEP"],
    ["SEP"],
    ["a", "b", "c"]
]

for tl in testLists:
    print(mySplit(tl, sep="SEP"))
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

This is analogous to the result you would get if examples were actually strings and you used str.split(sep):

fortlintestLists:
    print("".join(tl).split("SEP"))
#['a', 'bc', '', 'd']
#['a', '', '', '']
#['', '']
#['abc']

By the way, if the elements in your lists were always guaranteed to be strings, you could simply do:

for tl in testLists:
    print([list(x) for x in"".join(tl).split("SEP")])
#[['a'], ['b', 'c'], [], ['d']]
#[['a'], [], [], []]
#[[], []]
#[['a', 'b', 'c']]

But the mySplit() function is more general.

Solution 4:

For list or tuple objects you can use the following:

def split(seq, sep):
    start, stop = 0, -1while start < len(seq):
        try:
            stop = seq.index(sep, start)
        except ValueError:
            yield seq[start:]break
        yield seq[start:stop]
        start = stop + 1else:
        if stop == len(seq) - 1:
            yield []

I won't work with a generator but it's fast.

Solution 5:

You can use itertools.takewhile:

defsplit(seq, sep):
    seq, peek = iter(seq), sep
    whileTrue:
        try:
            peek = next(seq)
        except StopIteration:
            breakyieldlist(it.takewhile(sep.__ne__, it.chain((peek,), seq)))
    if peek == sep:
        yield []

The it.chain part is to find out when the seq is exhausted. Note that with this approach it's easy to yield generators instead of lists if desired.

Post a Comment for "How To Split A List Into Sublists Based On A Separator, Similar To Str.split()?"