Skip to content Skip to sidebar Skip to footer

Replace String Between Tags If String Begins With "1"

I have a huge XML file (about 100MB) and each line contains something along the lines of 10005991. So for example: textextextextext10005991

Solution 1:

You can use the re module

>>>text = 'textextextextext<tag>10005991</tag>textextextextext'>>>re.sub(r'<tag>1(\d+)</tag>','<tag>YYYYY</tag>',text)
'textextextextext<tag>YYYYY</tag>textextextextext'

re.sub will replace the matched text with the second argument.

Quote from the doc

Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged.

Usage may be like:

withopen("file") as f:
    for i in f:
       withopen("output") as f2:
           f2.write(re.sub(r'<tag>1(\d+)</tag>','<tag>YYYYY</tag>',i))

Solution 2:

You can use regex but as you have a multi-line string you need to use re.DOTALL flag , and in your pattern you can use positive look-around for match string between tags:

>>>print re.sub(r'(?<=<tag>)1\d+(?=</?tag>)',r'YYYYYY',s,re.DOTALL,re.MULTILINE)
textextextextext<tag>YYYYYY<tag>textextextextext
textextextextext<tag>20005992</tag>textextextextext
textextextextext<tag>YYYYYY</tag>textextextextext
textextextextext<tag>20005994</tag>textextextextext

re.DOTALL

Make the '.' special character match any character at all, including a newline; without this flag, '.' will match anything except a newline.

Also as @Bhargav Rao have did in his answer you can use grouping instead look-around :

>>>print re.sub(r'<tag>(1\d+)</?tag>',r'<tag>YYYYYY</?tag>',s,re.DOTALL,re.MULTILINE)
textextextextext<tag>YYYYYY</?tag>textextextextext
textextextextext<tag>20005992</tag>textextextextext
textextextextext<tag>YYYYYY</?tag>textextextextext
textextextextext<tag>20005994</tag>textextextextext

Solution 3:

I think your best bet is to use ElementTree

The main idea: 1) Parse the file 2) Find the elements value 3) Test your condition 4) Replace value if condition met

Here is a good place to start parsing : How do I parse XML in Python?

Post a Comment for "Replace String Between Tags If String Begins With "1""