When using regular expressions the
* special operator is very useful. An expression followed by
* matches a sequence of 0 or more matches of the expression. When the whole expression is followed by a
* we get some intersting behaviour that people may not have expected. We will explore such behaviour in this post.
Method of Testing
We will be using the
sed command, which is a stream editor. We will echo a string and pipe it to
sed where we will perform a substitution of the matches with the letter
[ahmed@amayem ~]$ echo aaa | sed 's/a/x/' xaa
s/a/x/ indicates a substitution. The
a indicated the regex pattern, and the
x indicated what we wanted to substitute into the string instead of the matched parts. For more on
sed check ShellTree 1: Analyzing a one Line Command Implementation
Let’s begin with a simple example:
[ahmed@amayem ~]$ echo aaa | sed 's/a*/x/' x
As expected the regex
a* matched the whole string
aaa, because regex is supposed to match the longest match. However, what happens when we put in a different letter in the beginning?
[ahmed@amayem ~]$ echo Aaaa | sed 's/a*/x/' xAaaa
It didn’t match the
aaa, instead it only matched the null string in the beginning of the string.
We find the following excerpt from the
man re_format page:
In the event that an RE could match more than one substring of a given string, the RE matches the one starting earliest in the string. If the RE could match more than one substring starting at that point, it matches the longest. Subexpressions also match the longest possible substrings, subject to the constraint that the whole match be as long as possible, with subexpressions starting earlier in the RE taking priority over ones starting later. Note that higher-level subexpressions thus take priority over their lower-level component subexpressions.
The key is that the matching gives precedence to the first match, and not the longest match. Since
* matches the expression before it zero or more times, the null string in the beginning of the string matched the first time.
man re_formatpage (FreeBSD version)