Quantcast
Viewing all articles
Browse latest Browse all 3

Answer by terdon for How to modify a file if file contain numbers that begin with '+1'

Yes, you should generally avoid using regular expressions to parse structured data. But this is a pretty simple case if you are 100% that all occurrences of + followed 11 digits are valid targets. You can tell sed to only remove + if it is followed by 11 numbers (I assume you meant 11 not 10, since you have 11 in your data):

sed -E 's/\+([0-9]{11}[^0-9]*)\b/\1/' file.xml 

The -E enables extended regular expressions which give a simplified syntax and the ability to use {N} to mean "match N times". So here, we are matching a + (this needs to be escaped as \+ since otherwise it means "match 1 or more") that is followed by exactly 11 numbers, then 0 or more non-numbers until the first word boundary (\b).

The entire match except the + is captured in parentheses, so \1, the replacement, is everything except the +.


A slightly safer approach, since all of your target numbers seem to be in address tags, would be:

sed -E 's|<address>\+([0-9]{11})<\/address>|<address>\1</address>|' file.xml 

Or even, if your problem can be restated as "remove all + from lines where the first non-space string is <address>", you could do:

sed -E '/<address>+/{s/\+//}' file.xml

Viewing all articles
Browse latest Browse all 3

Trending Articles