Skip to content Skip to sidebar Skip to footer

Convert Python String With Newlines And Tabs To Dictionary

I'm a bit stuck with this particular problem I'm having. I have a working solution, but I don't think it's very Pythonic. I have a raw text output like this: Key 1 Value 1 K

Solution 1:

Batteries are included - defaultdict deals with auto-hydrating a new key's value as a list and we leverage str's iswhitespace method to check for indentation (otherwise we could have used a regular expression):

from collections import defaultdict

data = """
Key 1   
  Value 1 
Key 2   
  Value 2 
Key 3   
  Value 3a  
  Value 3b
  Value 3c 
Key 4   
  Value 4a  
  Value 4b
"""

result = defaultdict(list)
current_key = Nonefor line in data.splitlines():
    ifnot line: continue# Filter out blank lines# If the line is not indented then it is a key# Save it and move onifnot line[0].isspace():
        current_key = line.strip()
        continue# Otherwise, add the value# (minus leading and trailing whitespace)# to our results
    result[current_key].append(line.strip())

# result is now a defaultdict
defaultdict(<class'list'>,
    {'Key 1': ['Value 1'],
     'Key 2': ['Value 2'], 
     'Key 3': ['Value 3a', 'Value 3b', 'Value 3c'],
     'Key 4': ['Value 4a', 'Value 4b']})

Solution 2:

itertools.groupby is useful here. You can group adjacent lines by their indent, then insert adjacent indented lines to a dict in one go using extend:

my_str = """Key 1\n\tValue 1\nKey 2\n\tValue 2\nKey 3\n\tValue 3a \n\tValue 3b \n\tValue 3c\nKey 4\n\tValue 4a \n\tValue 4b"""defget_indent(line):
    returnlen(line) - len(line.lstrip())

res = {}
for indent, tokens in itertools.groupby(my_str.splitlines(), lambda line: get_indent):
    if indent == 0:
        cur_key = list(tokens)[0]
        res[cur_key] = []
    else:
        res[cur_key].extend( token.strip() for token in tokens )

print(res)
{'Key 3': ['Value 3a', 'Value 3b', 'Value 3c'],
 'Key 4': ['Value 4a', 'Value 4b'],
 'Key 2': ['Value 2'],
 'Key 1': ['Value 1']}

Solution 3:

I find that whenever one starts chaining together a bunch of operations in a single line (as in your "result.setdefault..." line), you muddy up what may be very simple problem.

str ="Key 1\n\tValue 1\nKey 2\n\tValue 2\nKey 3\n\tValue 3a \n\tValue 3b \n\tValue 3c\nKey 4\n\tValue 4a \n\tValue 4b"

output = str.replace('\n\t', ',').replace('\n',';')
result = {}
for group in output.split(';'):
    values = group.split(',')
    key = values[0]
    result[key] = []
    for v in values[1:]:
        result[key].append(v)
print result

Yields:

{'Key 1': ['Value 1'], 'Key 2': ['Value 2'], 'Key 3': ['Value 3a ', 'Value 3b ', 'Value 3c'], 'Key 4': ['Value 4a ', 'Value 4b']}

Solution 4:

Obviously you cannot remove the \n and \t from your raw text output, however you might have the possibility to add/include more chars in it such that this

Key1     
  Value 1Key2     
  Value 2Key3  
  Value 3a  
  Value 3b

will look like this

"Key 1":[      
  Value 1   
],   
"Key 2":[     
  Value 2  
],  
"Key 3":[
  Value 3a,  
  Value 3b
]    

Then you can use the json parser in the following way

importjsonmyDict= json.loads(my_str)  

Post a Comment for "Convert Python String With Newlines And Tabs To Dictionary"