This post is part of a series on the difference between pattern matching notation and extended regular expressions.
In the previous post we talked about the differences between pattern matching and regular expressions in a POSIX compatible system. In this post we will do some practical testing of the previous post to help our understanding.
Setup
Creating the Test Directory and Files to test Pathname Expansion
Let’s make a test
directory where we can play a bit:
[ahmed@amayem ~]$ mkdir test && cd test
[ahmed@amayem test]$
We will be experimenting using the ls
command so we should make some files that we can test on. Here are Four ways to quickly create files.
[ahmed@amayem test]$ touch checktest test testcheck
[ahmed@amayem test]$ ls
checktest test testcheck
We created a file called test
and tes
.
Creating the temporary parameter to test Parameter Expansion
In order to test parameter expansion pattern matching we need a parameter to test on. Let’s make a parameter called TMP
that we can manipulate. It should be the same as the filenames:
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
checktest test testcheck
Method of Testing
- Pathname Pattern Matching:
ls pattern
- Parameter Pattern Matching:
${parameter#pattern}
${parameter##pattern}
${parameter%pattern}
${parameter%pattern}
${parameter/pattern/string}
- Regex:
echo ${parameter} | egrep -o 'pattern'
Notes About Egrep
Why echo ${parameter} instead of ls
I could use the output of ls
as input to grep
, but then the input of egrep
would be serval lines each line containing a file name. In order to keep the testing uniform I want to give egrep
a single line containing all the filenames as done in our testing for parameter pattern matching
. The following example illustrates the need:
[ahmed@amayem test]$ ls | egrep -on '.*'
1:checktest
2:test
3:testcheck
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP} | egrep -on '.*'
1:checktest test testcheck
So by using the parameter ${TMP}
in testing both paramater pattern matching and regex I am keeping my testing uniform.
Why egrep
egrep
is basically grep
but it matches extended regular expressions instead of basic regular expressions.
The Need for Quotes
The reason we need the quotes in the regex
is to prevent any expansions (like brace expansions) before egrep
accepts it as a parameter. For more on quotations in bash
check bash quotations, backslash escaped characters and whitespace explained with examples..
The -o Option
The -o
option displays only the matching parts of the lines. However, it will output every match regardless of how many matches there are in a line. That means it doesn’t stop matching after it finds a first match, which is sometimes the case depending on the function you are using. The following example should clear it up:
[ahmed@amayem test]$ echo "test" | egrep -o 't'
t
t
Two t
‘s were printed, indicating two matches.
*
Pathname Expansion Pattern Matching
The *
matches any string and doesn’t relate to any characters before it:
[ahmed@amayem test]$ ls *
checktest test testcheck
[ahmed@amayem test]$ ls t*
test testcheck
If the *
had acted like regex then nothing would have matched because in regex t*
would have matched a t
zero or more times. Instead it matched only the filenames beginning with t
, and ignored checktest
even though the pattern would have matched test
in checktest
. As mentioned in the previous post this is because of pathname expansion’s particular application, it only matches full filenames and not partial ones.
Parameter Expansion Pattern Matching
The *
matches any string and doesn’t relate to any characters before it.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP##*}
[ahmed@amayem test]$
Notice that with one #
nothing was matched, that’s because the smallest string that *
matches is the null string at the beginning of checktest
. With ##
the whole string is matched and is deleted. Let’s try it with a t
before the *
:
[ahmed@amayem test]$ echo ${TMP#t*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP##t*}
checktest test testcheck
Nothing was deleted because the string does not start with a t
, and the #
insists the match be at the beginning of a string. Let’s try to match it with a c
:
[ahmed@amayem test]$ echo ${TMP#c*}
hecktest test testcheck
[ahmed@amayem test]$ echo ${TMP##c*}
[ahmed@amayem test]$
The first one matched the shortest string, which is just the c
because the *
matched the null string. The second one matched the longest possible string and so deleted the whole string.
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP%*}
[ahmed@amayem test]$
Same thing as before except we are matching from the end of the string only. In the first one the *
matched the null string at the end, and in the second one the *
matched the whole string.
[ahmed@amayem test]$ echo ${TMP%c*}
checktest test testche
[ahmed@amayem test]$ echo ${TMP%c*}
[ahmed@amayem test]$
This is interesting. In the case of %
it searched for the first c
it could find starting from the end to the beginning of the string. The *
matched the rest of the string. With the %
it matched the last c
it could find starting from the end and searching forwards and the *
matched the rest of the string. Let’s switch the *
:
[ahmed@amayem test]$ echo ${TMP%*c}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP%*c}
checktest test testcheck
Nothing was matched. That is to be expected because the string does not end in c
. Let’s switch it to k
:
[ahmed@amayem test]$ echo ${TMP%*k}
checktest test testchec
[ahmed@amayem test]$ echo ${TMP%*k}
[ahmed@amayem test]$
In the first case the *
matched the null string before the last k
and in the second case the *
matched the whole string except for the final k
.
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/*/x}
x
The whole string was replaced. Let’s see if we can replace a part in the middle of the string:
[ahmed@amayem test]$ echo ${TMP/t*/x}
checkx
[ahmed@amayem test]$ echo ${TMP/t*t/x}
checkxcheck
As is becoming clearer this substiution format is not limited to matching patterns only at the beginning or end of a string as is the #
and %
formats mentioned earlier.
Extended Regular Expression
An atom followed by *
matches a sequence of 0 or more matches of the atom
[ahmed@amayem test]$ echo ${TMP} | egrep -o 't*'
t
t
t
t
t
t
It matched every t
, but only the t
‘s and not what came after it because the pattern is asking to match t
zero or more times. If we use *
by itself then we should expect to get nothing back:
[ahmed@amayem test]$ echo ${TMP} | egrep -o '*'
[ahmed@amayem test]$
In order to match the whole string we have to use the other special operator .
to indicate any character, then the *
after it:
[ahmed@amayem test]$ echo ${TMP} | egrep -o '.*'
checktest test testcheck
Note About GNU grep Versions
It turns out that there are some differences between versions of GNU grep that will affect the output. The above works with GNU grep 2.6.3, but it won’t work with GNU grep 2.5.1, which will give the following:
[ahmed@amayem test]$ echo ${TMP} | grep -o 't*'
[ahmed@amayem test]$
?
Pathname Expansion Pattern Matching
The ?
matches any single character.
[ahmed@amayem test]$ ls ?
ls: ?: No such file or directory
[ahmed@amayem test]$ ls t?
ls: t?: No such file or directory
We must remember that when it comes to pattern matching in pathname expansion, the goal is to use the operators to expand the pattern into an existing filename or parameter. Hence, even if t
does match part of the filenames, nothing will show because the full filename needs to be matched. Let’s match full filenames:
[ahmed@amayem test]$ ls tes?
test
[ahmed@amayem test]$ ls ?????????
checktest testcheck
Notice that in the second call, the file named test
was not matched. Let’s see if ?
can match the null string:
[ahmed@amayem test]$ ls ?test?
ls: cannot access ?test?: No such file or directory
Nope, it must match a character.
Parameter Expansion Pattern Matching
The ?
matches any single character.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#?}
hecktest test testcheck
[ahmed@amayem test]$ echo ${TMP##?}
hecktest test testcheck
[ahmed@amayem test]$
The first c
was deleted from checktest
. Since ?
only matches one character there is no difference between #
and ##
.
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%?}
checktest test testchec
[ahmed@amayem test]$ echo ${TMP%?}
checktest test testchec
The last k
was deleted. Since ?
only matches one character there is no difference between #
and ##
.
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/?/x}
xhecktest test testcheck
[ahmed@amayem test]$ echo ${TMP/?????/x}
xtest test testcheck
Notice that the whole match was substituted for 1 x
. The match could also be in the middle of the string:
[ahmed@amayem test]$ echo ${TMP/t??t/x}
checkx test testcheck
Extended Regular Expression
An atom followed by ?
matches a sequence of 0 or 1 matches of the atom.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test?'
test
test
test
Notice that unlike the regex example only the test
part was matched and not the part after the last t
. The ?
is useful when you want to say that a part of a pattern is optional:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'testy?'
test
test
test
+
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv test test+
[ahmed@amayem test]$ mv checktest check+test
[ahmed@amayem test]$ mv testcheck testcheck+
[ahmed@amayem test]$ ls
check+test test+ testcheck+
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check+test test+ testcheck+
Pathname Expansion Pattern Matching
The +
matches itself.
[ahmed@amayem test]$ ls test+
test+
[ahmed@amayem test]$ ls test+ check+*
check+test test+
Parameter Expansion Pattern Matching
The +
matches itself.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check+}
test test+ testcheck+
[ahmed@amayem test]$ echo ${TMP##check+}
test test+ testcheck+
[ahmed@amayem test]$ echo ${TMP##test+}
check+test test+ testcheck+
The last one didn’t match because it is not in the beginning.
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check+}
check+test test+ test
[ahmed@amayem test]$ echo ${TMP%check+}
check+test test+ test
[ahmed@amayem test]$ echo ${TMP%test+}
check+test test+ testcheck+
The last one didn’t match because it is not in the end.
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check+/x}
xtest test+ testcheck+
[ahmed@amayem test]$ echo ${TMP/test+/x}
check+test x testcheck+
Extended Regular Expression
An atom followed by +
matches a sequence of 1 or more matches of the atom.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test+'
test
test
test
Notice that test+
was not matched. To test the +
more we should modify our test string to contain some characters that are repeated as follows:
[ahmed@amayem test]$ TMP=$(echo ${TMP} | sed 's/+/tt/g')
[ahmed@amayem test]$ echo ${TMP}
checktttest testtt testchecktt
That’s more like it. Let’s try it again now:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test+'
test
testtt
test
[ahmed@amayem test]$ echo ${TMP} | egrep -o 't+est'
tttest
test
test
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'tt+'
ttt
ttt
tt
.
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv test+ test.
[ahmed@amayem test]$ mv check+test check.test
[ahmed@amayem test]$ mv testcheck+ testcheck.
[ahmed@amayem test]$ ls
check.test test. testcheck.
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check.test test. testcheck.
Pathname Expansion Pattern Matching
The .
matches itself.
[ahmed@amayem test]$ ls test.
test.
Parameter Expansion Pattern Matching
The .
matches itself.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check.}
test test. testcheck.
[ahmed@amayem test]$ echo ${TMP##check.}
test test. testcheck.
[ahmed@amayem test]$ echo ${TMP##test.}
check.test test. testcheck.
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check.}
check.test test. test
[ahmed@amayem test]$ echo ${TMP%check.}
check.test test. test
[ahmed@amayem test]$ echo ${TMP%test.}
check.test test. testcheck.
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check./x}
xtest test. testcheck.
[ahmed@amayem test]$ echo ${TMP/test./x}
check.test x testcheck.
Extended Regular Expression
The .
matches any single character. It is equivalent to the ?
in pattern matching.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test.'
test
test.
testc
Notice that the first test
actually has a space after it, so the .
matched the space.
If we want to match the literal .
then we will have to use a backslash:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test.'
test.
^
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv test. test^
[ahmed@amayem test]$ mv check.test check^test
[ahmed@amayem test]$ mv testcheck. testcheck^
[ahmed@amayem test]$ ls
check^test test^ testcheck^
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check^test test^ testcheck^
[ahmed@amayem test]$
Pathname Expansion Pattern Matching
The ^
matches itself.
[ahmed@amayem test]$ ls test^
test^
Parameter Expansion Pattern Matching
The ^
matches itself.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check^}
test test^ testcheck^
[ahmed@amayem test]$ echo ${TMP##check^}
test test^ testcheck^
[ahmed@amayem test]$ echo ${TMP##test^}
check^test test^ testcheck^
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check^}
check^test test^ test
[ahmed@amayem test]$ echo ${TMP%check^}
check^test test^ test
[ahmed@amayem test]$ echo ${TMP%test^}
check^test test^ testcheck^
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check^/x}
xtest test^ testcheck^
[ahmed@amayem test]$ echo ${TMP/test^/x}
check^test x testcheck^
Extended Regular Expression
The ^
matches the null string at the beginning of a line.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test^'
[ahmed@amayem test]$
It didn’t match anything, because obviously the null string at the beginning of a line is not present at the end of a word. To test this one I will make two more files:
[ahmed@amayem test]$ echo ${TMP} | egrep -o '^check'
check
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check'
check
check
When I added the ^
at the beginning it matched only the first check
, otherwise it matched both check
s.
$
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv test^ test$
[ahmed@amayem test]$ mv check^test check$test
[ahmed@amayem test]$ mv testcheck^ testcheck$
[ahmed@amayem test]$ ls
check test$ testcheck$
It looks like we lost check$test
. This is because $test
is considered a variable and is expanded to the empty string. To overcome this issue we need to use quotes.
[ahmed@amayem test]$ mv check 'check$test'
[ahmed@amayem test]$ ls
check$test test$ testcheck$
Great we have our files now.
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check$test test$ testcheck$
Pathname Expansion Pattern Matching
The $
matches itself.
[ahmed@amayem test]$ ls test$
test$
Parameter Expansion Pattern Matching
The $
matches itself.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check$}
test test$ testcheck$
[ahmed@amayem test]$ echo ${TMP##check$}
test test$ testcheck$
[ahmed@amayem test]$ echo ${TMP##test$}
check$test test$ testcheck$
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check$}
check$test test$ test
[ahmed@amayem test]$ echo ${TMP%check$}
check$test test$ test
[ahmed@amayem test]$ echo ${TMP%test$}
check$test test$ testcheck$
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check$/x}
xtest test$ testcheck$
[ahmed@amayem test]$ echo ${TMP/test$/x}
check$test x testcheck$
Extended Regular Expression
The $
matches the null string at the end of a line.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test$'
[ahmed@amayem test]$
We got nothing because the end of $TMP
is check$
. Let’s try using check
:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$'
[ahmed@amayem test]$
Still nothing. This is because check$
matches check
followed by the null string at the end of a string, not the literal check$
. To match the $
literally we need to use the backslash:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$'
check$
check$
Now to get the one at the end of the string we add the $
at the end:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$$'
check$
|
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv test$ test|
> ^C
[ahmed@amayem test]$
Upon seeing the |
shell understands that as a pipe hence, when we pressed enter
we saw >
which indicates the shell waiting for the end of the command. We cancel that command using ctrl+c
. We will have to use quotes when using |
.
[ahmed@amayem test]$ mv test$ 'test|'
[ahmed@amayem test]$ mv 'check$test' 'check|test'
[ahmed@amayem test]$ mv testcheck$ 'testcheck|'
[ahmed@amayem test]$ ls
check|test testcheck| test|
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check|test testcheck| test|
Pathname Expansion Pattern Matching
The |
matches itself.
[ahmed@amayem test]$ ls "test|"
test|
Parameter Expansion Pattern Matching
The |
matches itself.
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check|}
test testcheck| test|
[ahmed@amayem test]$ echo ${TMP##check|}
test testcheck| test|
[ahmed@amayem test]$ echo ${TMP##test|}
check|test testcheck| test|
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check|}
check|test testcheck| test|
[ahmed@amayem test]$ echo ${TMP%check|}
check|test testcheck| test|
[ahmed@amayem test]$ echo ${TMP%test|}
check|test testcheck|
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check|/x}
xtest testcheck| test|
[ahmed@amayem test]$ echo ${TMP/test|/x}
check|test testcheck| x
Extended Regular Expression
The |
indicates a branch (a way of saying ‘or’ between patterns).
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|check'
check
test
test
test
check
The regex is saying match test
or a check
.
Note About GNU grep Versions
It turns out that there are some differences between versions of GNU grep that will affect the output. The following works with GNU grep 2.6.3,
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|'
test
test
test
but it won’t work with GNU grep 2.5.1, which will give the following:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|'
[ahmed@amayem test]$
#
To test this we will have to modify our file names:
[ahmed@amayem test]$ mv 'test|' test*
[ahmed@amayem test]$ mv 'check|test' check*test
[ahmed@amayem test]$ mv 'testcheck|' testcheck*
[ahmed@amayem test]$ ls
check*test test* testcheck*
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check*test test* testcheck* testcheck*
That last line doesn’t seem right. Why is it printing out testcheck*
twice? The reason is that the shell is actually performing pathname expansion on test*
:
[ahmed@amayem test]$ echo test*
test* testcheck*
Hence we see an extra testcheck*
. Anyways this should not affect our tests.
Pathname Expansion Pattern Matching
The “ escapes the special characters used by the system.
[ahmed@amayem test]$ ls test*
test*
What happens if we put a “ before a regular character:
[ahmed@amayem test]$ ls test
test
It has no effect, and it is as if the “ doesn’t exist there.
Parameter Expansion Pattern Matching
The “ escapes the special characters used by the system
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check*}
test test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP##check*}
test test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP##test*}
check*test test* testcheck* testcheck*
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%check*}
check*test test* testcheck* test
[ahmed@amayem test]$ echo ${TMP%check*}
check*test test* testcheck* test
[ahmed@amayem test]$ echo ${TMP%test*}
check*test test* testcheck* testcheck*
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/check*/x}
xtest test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP/test*/x}
check*test x testcheck*
Extended Regular Expression
The “ escapes the special characters used by the system.
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check*'
check
check
check
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check*'
check*
check*
check*
In the first instance the *
was not escaped and so it’s special meaning, which is the k
appearing zero or more times applied. In the second instance it was escaped and so it matched the *
literally.
What happens if we put a “ before a regular character:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check'
check
check
check
It has no effect, and it is as if the “ doesn’t exist there.
egrep bug
Even though egrep
is supposed to be matching according to extended regular expressions (That’s what the e
in egrep
stands for) we get the following error:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'chec1'
egrep: Invalid back reference
Back-referencing is not a part of the extended regular expressions as mentioned in POSIX
standards
{ bound }
Countering brace expansion
When testing this special operator out we need to watch out for brace expansion. Brace expansion is explained in more detail in the man bash
pages:
Brace expansion is a mechanism by which arbitrary strings may be generated. This mechanism is similar to pathname expansion, but the filenames generated need not exist. Patterns to be brace expanded take the form of an optional preamble, followed by either a series of comma-separated strings or a sequence expression between a pair of braces, followed by an optional postscript. The preamble is prefixed to each string contained within the braces, and the postscript is then appended to each resulting string, expanding left to right.
The following is an example:
[ahmed@amayem test]$ echo test{0,2}
test0 test2
If we want to print it out literally we have to to quote it:
[ahmed@amayem test]$ echo "test{0,2}"
test{0,2}
[ahmed@amayem test]$ echo test{0,2}
test{0,2}
Let’s change the file names to help in testing:
[ahmed@amayem test]$ mv test* "test{0,2}"
[ahmed@amayem test]$ mv check*test "check{0,2}test"
[ahmed@amayem test]$ mv testcheck* "testcheck{0,2}"
[ahmed@amayem test]$ ls
check{0,2}test testcheck{0,2} test{0,2}
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check{0,2}test testcheck{0,2} test{0,2}
Pathname Expansion Pattern Matching
The {bound}
matches itself:
[ahmed@amayem test]$ ls "test{0,2}"
test{0,2}
Parameter Expansion Pattern Matching
The {bound}
matches itself:
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#check{0,2}}
}test testcheck{0,2} test{0,2}}
Note that the last }
was not considered part of the regex. It was instead considered the closing brace of the ${parameter#word}
formula. Therefore there is an extra }
printed at the end. To overcome this issue we will have to use quotes:
[ahmed@amayem test]$ echo ${TMP#"check{0,2}"}
test testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP##"check{0,2}"}
test testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP##"test{0,2}"}
check{0,2}test testcheck{0,2} test{0,2}
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%"test{0,2}"}
check{0,2}test testcheck{0,2}
[ahmed@amayem test]$ echo ${TMP%"test{0,2}"}
check{0,2}test testcheck{0,2}
[ahmed@amayem test]$ echo ${TMP%"check{0,2}"}
check{0,2}test testcheck{0,2} test{0,2}
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/"check{0,2}"/x}
xtest testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP/"test{0,2}"/x}
check{0,2}test testcheck{0,2} x
Extended Regular Expression
An atom followed by a bound containing two integers i and j matches a sequence of i through j (inclusive) matches of the atom. Lets add some more files that I can test:
[ahmed@amayem test]$ touch tes testt testtt
[ahmed@amayem test]$ ls
check{0,2}test tes testcheck{0,2} testt testtt test{0,2}
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check{0,2}test tes testcheck{0,2} testt testtt test{0,2}
Now let’s test:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test{0,2}'
test
tes
test
testt
testt
test
It was able to pick up tes
all the test
s and testt
as well. It didn’t however pick up testtt
because that would be three t
s.
[]
To test this special operator let’s make a brand new set of files.
We first begin by deleting our current files in our test directory. Do not do this if you are not in the test directory and want to keep your files.
[ahmed@amayem test]$ ls
check{0,2}test tes testcheck{0,2} testt testtt test{0,2}
[ahmed@amayem test]$ rm *
[ahmed@amayem test]$ ls
[ahmed@amayem test]$
We used pathname expansion as we saw earlier to remove all file names with just the *
.
Next let’s make some relevant files:
[ahmed@amayem test]$ touch testA test1 test.
[ahmed@amayem test]$ ls
test. test1 testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test. test1 testA
General
[characters]
matches any one of the enclosed characters.
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls tes[t][1A]
test1 testA
We used it twice in a row. Notice that we can put one character in the square brackets but it would be the same as having it without square brackets:
[ahmed@amayem test]$ ls tes[A]
ls: tes[A]: No such file or directory
[ahmed@amayem test]$ ls tesA
ls: tesA: No such file or directory
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[1A.]}
test1 testA
[ahmed@amayem test]$ echo ${TMP##test[1A.]}
test1 testA
[ahmed@amayem test]$ echo ${TMP##test[1A]}
test. test1 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[1A.]}
test. test1
[ahmed@amayem test]$ echo ${TMP%test[1A.]}
test. test1
[ahmed@amayem test]$ echo ${TMP%test[1.]}
test. test1 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[.1A]/x}
x test1 testA
[ahmed@amayem test]$ echo ${TMP/test[1A]/x}
test. x testA
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[.1A]'
test.
test1
testA
Range
A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched.
Let’s add another file for testing purposes:
[ahmed@amayem test]$ touch test2
[ahmed@amayem test]$ ls
test. test1 test2 testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test. test1 test2 testA
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls test[0-2]
test1 test2
[ahmed@amayem test]$ ls test[A-Z]
testA
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[+-0]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[!-A]}
test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[+-0]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[!-Z]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[!-Z]}
test. test1 test2
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[!-Z]/x}
x test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0-Z]/x}
test. x test2 testA
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[!-Z]'
test.
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0-9]'
test1
test2
Character Classes
According to the man bash
page:
Within [ and ], character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard: alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
A character class matches any character belonging to that class.
The names of the character classes are pretty straighforward. Let’s test them out.
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls test[[:alnum:]]
test1 test2 testA
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[[:alnum:]]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[[:punct:]]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[[:graph:]]}
test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[[:alnum:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:alnum:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:ascii:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:digit:]]}
test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[[:alnum:]]/x}
test. x test2 testA
[ahmed@amayem test]$ echo ${TMP/test[[:ascii:]]/x}
x test1 test2 testA
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[[:alnum:]]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[[:punct:]]'
test.
Negation
According to the man bash
page for pattern matching
:
If the first character following the `[` is a `!` or a `^` then any character not enclosed is matched.
It is the same for extended regular expressions except that the !
does not work.
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls test[^[:punct:][:alpha:]]
test1 test2
[ahmed@amayem test]$ ls test[![:punct:][:alpha:]]
test1 test2
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[^A-Z]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[^A-Z]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[^+-Z]}
test. test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[^A-Z]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[^0-9]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^0]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^0]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^A0.]}
test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[![:ascii:]]/x}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[![:digit:]]/x}
x test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[![:digit:].]/x}
test. test1 test2 x
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[^[:punct:]]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[^[:punct:][:alpha:]]'
test1
test2
Let’s make sure that the !
does not work:
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[![:punct:][:alpha:]]'
test.
testA
Its not working, just as expected.
The dash –
A – may be matched by including it as the first or last character in the set.
To test this we need to add a file with a dash:
[ahmed@amayem test]$ touch test-
[ahmed@amayem test]$ ls
test- test. test1 test2 testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test- test. test1 test2 testA
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls test[-At]
test- testA
[ahmed@amayem test]$ ls test[A-t]
testA
[ahmed@amayem test]$ ls test[At-]
test- testA
Notice that when the dash is in the middle it is considered to be a range expression.
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[-0Z]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[0-Z]}
test- test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[0Z-]}
test. test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[-0Z]}
test- test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[0-Z]}
test- test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[0Z-]}
test- test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[-0Z]/x}
x test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0-Z]/x}
test- test. x test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0Z-]/x}
x test. test1 test2 testA
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[-0Z]'
test-
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0-Z]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0Z-]'
test-
The Closing Bracket ]
A ]
may be matched by including it as the first character in the set.
let’s add a file that contains ]
for the sake of testing:
[ahmed@amayem test]$ touch test]
[ahmed@amayem test]$ ls
test- test. test1 test2 testA test]
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test- test. test1 test2 testA test]
Pathname Expansion Pattern Matching
[ahmed@amayem test]$ ls test[]1A]
test1 testA test]
[ahmed@amayem test]$ ls test[1]A]
ls: test[1]A]: No such file or directory
[ahmed@amayem test]$ ls test[1A]]
ls: test[1A]]: No such file or directory
Notice that if ]
is not the first character inside the square brackets then the brackets are considered to have been closed.
Parameter Expansion Pattern Matching
${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[]-1A]}
test- test. test1 test2 testA test]
[ahmed@amayem test]$ echo ${TMP#test[]1A-]}
test. test1 test2 testA test]
[ahmed@amayem test]$ echo ${TMP##test[]1A-]}
test. test1 test2 testA test]
Notice that the dash and the ]
cannot both be in the beginning of the range at the same time, hence the dash can be at the end.
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[]-1A]}
test- test. test1 test2 testA test]
How come it’s not deleting test]
? That is probably because the range ]-1
doesn’t make sense because the ]
comes after the 1
. Once we put the dash at the end it should work:
[ahmed@amayem test]$ echo ${TMP%test[]1A-]}
test- test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[]A]/x}
test- test. test1 test2 x test]
[ahmed@amayem test]$ echo ${TMP/test[]Z]/x}
test- test. test1 test2 testA x
Extended Regular Expression
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[]-]'
test-
test]
Next Steps
References
- IEEE Std 1003.1, 2013 Edition
man bash
pageman re_format
page (FreeBSD version)