Practical Explorations of the Differences Between Pattern Matching Notation Used in Pathname and Parameter Expansion and Extended Regular Expressions

This post is part of a series on the difference between pattern matching notation and extended regular expressions.

In the previous post we talked about the differences between pattern matching and regular expressions in a POSIX compatible system. In this post we will do some practical testing of the previous post to help our understanding.

Setup

Creating the Test Directory and Files to test Pathname Expansion

Let’s make a test directory where we can play a bit:

[ahmed@amayem ~]$ mkdir test && cd test
[ahmed@amayem test]$

We will be experimenting using the ls command so we should make some files that we can test on. Here are Four ways to quickly create files.

[ahmed@amayem test]$ touch checktest test testcheck
[ahmed@amayem test]$ ls
checktest   test        testcheck

We created a file called test and tes.

Creating the temporary parameter to test Parameter Expansion

In order to test parameter expansion pattern matching we need a parameter to test on. Let’s make a parameter called TMP that we can manipulate. It should be the same as the filenames:

[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
checktest test testcheck

Method of Testing

  1. Pathname Pattern Matching: ls pattern
  2. Parameter Pattern Matching:
    1. ${parameter#pattern}
    2. ${parameter##pattern}
    3. ${parameter%pattern}
    4. ${parameter%pattern}
    5. ${parameter/pattern/string}
  3. Regex: echo ${parameter} | egrep -o 'pattern'

Notes About Egrep

Why echo ${parameter} instead of ls

I could use the output of ls as input to grep, but then the input of egrep would be serval lines each line containing a file name. In order to keep the testing uniform I want to give egrep a single line containing all the filenames as done in our testing for parameter pattern matching. The following example illustrates the need:

[ahmed@amayem test]$ ls | egrep -on '.*'
1:checktest
2:test
3:testcheck
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP} | egrep -on '.*'
1:checktest test testcheck

So by using the parameter ${TMP} in testing both paramater pattern matching and regex I am keeping my testing uniform.

Why egrep

egrep is basically grep but it matches extended regular expressions instead of basic regular expressions.

The Need for Quotes

The reason we need the quotes in the regex is to prevent any expansions (like brace expansions) before egrep accepts it as a parameter. For more on quotations in bash check bash quotations, backslash escaped characters and whitespace explained with examples..

The -o Option

The -o option displays only the matching parts of the lines. However, it will output every match regardless of how many matches there are in a line. That means it doesn’t stop matching after it finds a first match, which is sometimes the case depending on the function you are using. The following example should clear it up:

[ahmed@amayem test]$ echo "test" | egrep -o 't'
t
t

Two t‘s were printed, indicating two matches.

*

Pathname Expansion Pattern Matching

The * matches any string and doesn’t relate to any characters before it:

[ahmed@amayem test]$ ls *
checktest   test        testcheck

[ahmed@amayem test]$ ls t*
test        testcheck

If the * had acted like regex then nothing would have matched because in regex t* would have matched a t zero or more times. Instead it matched only the filenames beginning with t, and ignored checktest even though the pattern would have matched test in checktest. As mentioned in the previous post this is because of pathname expansion’s particular application, it only matches full filenames and not partial ones.

Parameter Expansion Pattern Matching

The * matches any string and doesn’t relate to any characters before it.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP##*}

[ahmed@amayem test]$ 

Notice that with one # nothing was matched, that’s because the smallest string that * matches is the null string at the beginning of checktest. With ## the whole string is matched and is deleted. Let’s try it with a t before the *:

[ahmed@amayem test]$ echo ${TMP#t*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP##t*}
checktest test testcheck

Nothing was deleted because the string does not start with a t, and the # insists the match be at the beginning of a string. Let’s try to match it with a c:

[ahmed@amayem test]$ echo ${TMP#c*}
hecktest test testcheck
[ahmed@amayem test]$ echo ${TMP##c*}

[ahmed@amayem test]$ 

The first one matched the shortest string, which is just the c because the * matched the null string. The second one matched the longest possible string and so deleted the whole string.

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%*}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP%*}

[ahmed@amayem test]$ 

Same thing as before except we are matching from the end of the string only. In the first one the * matched the null string at the end, and in the second one the * matched the whole string.

[ahmed@amayem test]$ echo ${TMP%c*}
checktest test testche
[ahmed@amayem test]$ echo ${TMP%c*}

[ahmed@amayem test]$ 

This is interesting. In the case of % it searched for the first c it could find starting from the end to the beginning of the string. The * matched the rest of the string. With the % it matched the last c it could find starting from the end and searching forwards and the * matched the rest of the string. Let’s switch the *:

[ahmed@amayem test]$ echo ${TMP%*c}
checktest test testcheck
[ahmed@amayem test]$ echo ${TMP%*c}
checktest test testcheck

Nothing was matched. That is to be expected because the string does not end in c. Let’s switch it to k:

[ahmed@amayem test]$ echo ${TMP%*k}
checktest test testchec
[ahmed@amayem test]$ echo ${TMP%*k}

[ahmed@amayem test]$ 

In the first case the * matched the null string before the last k and in the second case the * matched the whole string except for the final k.

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/*/x}
x

The whole string was replaced. Let’s see if we can replace a part in the middle of the string:

[ahmed@amayem test]$ echo ${TMP/t*/x}
checkx
[ahmed@amayem test]$ echo ${TMP/t*t/x}
checkxcheck

As is becoming clearer this substiution format is not limited to matching patterns only at the beginning or end of a string as is the # and % formats mentioned earlier.

Extended Regular Expression

An atom followed by * matches a sequence of 0 or more matches of the atom

[ahmed@amayem test]$ echo ${TMP} | egrep -o 't*'
t
t
t
t
t
t

It matched every t, but only the t‘s and not what came after it because the pattern is asking to match t zero or more times. If we use * by itself then we should expect to get nothing back:

[ahmed@amayem test]$ echo ${TMP} | egrep -o '*'
[ahmed@amayem test]$ 

In order to match the whole string we have to use the other special operator . to indicate any character, then the * after it:

[ahmed@amayem test]$ echo ${TMP} | egrep -o '.*'
checktest test testcheck

Note About GNU grep Versions

It turns out that there are some differences between versions of GNU grep that will affect the output. The above works with GNU grep 2.6.3, but it won’t work with GNU grep 2.5.1, which will give the following:

[ahmed@amayem test]$ echo ${TMP} | grep -o 't*'
[ahmed@amayem test]$

?

Pathname Expansion Pattern Matching

The ? matches any single character.

[ahmed@amayem test]$ ls ?
ls: ?: No such file or directory
[ahmed@amayem test]$ ls t?
ls: t?: No such file or directory

We must remember that when it comes to pattern matching in pathname expansion, the goal is to use the operators to expand the pattern into an existing filename or parameter. Hence, even if t does match part of the filenames, nothing will show because the full filename needs to be matched. Let’s match full filenames:

[ahmed@amayem test]$ ls tes?
test
[ahmed@amayem test]$ ls ?????????
checktest  testcheck

Notice that in the second call, the file named test was not matched. Let’s see if ? can match the null string:

[ahmed@amayem test]$ ls ?test?
ls: cannot access ?test?: No such file or directory

Nope, it must match a character.

Parameter Expansion Pattern Matching

The ? matches any single character.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#?}
hecktest test testcheck
[ahmed@amayem test]$ echo ${TMP##?}
hecktest test testcheck
[ahmed@amayem test]$ 

The first c was deleted from checktest. Since ? only matches one character there is no difference between # and ##.

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%?}
checktest test testchec
[ahmed@amayem test]$ echo ${TMP%?}
checktest test testchec

The last k was deleted. Since ? only matches one character there is no difference between # and ##.

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/?/x}
xhecktest test testcheck
[ahmed@amayem test]$ echo ${TMP/?????/x}
xtest test testcheck

Notice that the whole match was substituted for 1 x. The match could also be in the middle of the string:

[ahmed@amayem test]$ echo ${TMP/t??t/x}
checkx test testcheck

Extended Regular Expression

An atom followed by ? matches a sequence of 0 or 1 matches of the atom.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test?'
test
test
test

Notice that unlike the regex example only the test part was matched and not the part after the last t. The ? is useful when you want to say that a part of a pattern is optional:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'testy?'
test
test
test

+

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv test test+
[ahmed@amayem test]$ mv checktest check+test
[ahmed@amayem test]$ mv testcheck testcheck+
[ahmed@amayem test]$ ls
check+test  test+       testcheck+
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check+test test+ testcheck+

Pathname Expansion Pattern Matching

The + matches itself.

[ahmed@amayem test]$ ls test+
test+
[ahmed@amayem test]$ ls test+ check+*
check+test  test+

Parameter Expansion Pattern Matching

The + matches itself.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check+}
test test+ testcheck+
[ahmed@amayem test]$ echo ${TMP##check+}
test test+ testcheck+
[ahmed@amayem test]$ echo ${TMP##test+}
check+test test+ testcheck+

The last one didn’t match because it is not in the beginning.

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check+}
check+test test+ test
[ahmed@amayem test]$ echo ${TMP%check+}
check+test test+ test
[ahmed@amayem test]$ echo ${TMP%test+}
check+test test+ testcheck+

The last one didn’t match because it is not in the end.

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check+/x}
xtest test+ testcheck+
[ahmed@amayem test]$ echo ${TMP/test+/x}
check+test x testcheck+

Extended Regular Expression

An atom followed by + matches a sequence of 1 or more matches of the atom.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test+'
test
test
test

Notice that test+ was not matched. To test the + more we should modify our test string to contain some characters that are repeated as follows:

[ahmed@amayem test]$ TMP=$(echo ${TMP} | sed 's/+/tt/g')
[ahmed@amayem test]$ echo ${TMP}
checktttest testtt testchecktt

That’s more like it. Let’s try it again now:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test+'
test
testtt
test
[ahmed@amayem test]$ echo ${TMP} | egrep -o 't+est'
tttest
test
test
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'tt+'
ttt
ttt
tt

.

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv test+ test.
[ahmed@amayem test]$ mv check+test check.test
[ahmed@amayem test]$ mv testcheck+ testcheck.
[ahmed@amayem test]$ ls
check.test  test.       testcheck.
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check.test test. testcheck.

Pathname Expansion Pattern Matching

The . matches itself.

[ahmed@amayem test]$ ls test.
test.

Parameter Expansion Pattern Matching

The . matches itself.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check.}
test test. testcheck.
[ahmed@amayem test]$ echo ${TMP##check.}
test test. testcheck.
[ahmed@amayem test]$ echo ${TMP##test.}
check.test test. testcheck.

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check.}
check.test test. test
[ahmed@amayem test]$ echo ${TMP%check.}
check.test test. test
[ahmed@amayem test]$ echo ${TMP%test.}
check.test test. testcheck.

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check./x}
xtest test. testcheck.
[ahmed@amayem test]$ echo ${TMP/test./x}
check.test x testcheck. 

Extended Regular Expression

The . matches any single character. It is equivalent to the ? in pattern matching.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test.'
test 
test.
testc

Notice that the first test actually has a space after it, so the . matched the space.

If we want to match the literal . then we will have to use a backslash:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test.'
test.

^

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv test. test^
[ahmed@amayem test]$ mv check.test check^test
[ahmed@amayem test]$ mv testcheck. testcheck^
[ahmed@amayem test]$ ls
check^test  test^       testcheck^
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check^test test^ testcheck^
[ahmed@amayem test]$ 

Pathname Expansion Pattern Matching

The ^ matches itself.

[ahmed@amayem test]$ ls test^
test^

Parameter Expansion Pattern Matching

The ^ matches itself.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check^}
test test^ testcheck^
[ahmed@amayem test]$ echo ${TMP##check^}
test test^ testcheck^
[ahmed@amayem test]$ echo ${TMP##test^}
check^test test^ testcheck^

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check^}
check^test test^ test
[ahmed@amayem test]$ echo ${TMP%check^}
check^test test^ test
[ahmed@amayem test]$ echo ${TMP%test^}
check^test test^ testcheck^

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check^/x}
xtest test^ testcheck^
[ahmed@amayem test]$ echo ${TMP/test^/x}
check^test x testcheck^

Extended Regular Expression

The ^ matches the null string at the beginning of a line.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test^'
[ahmed@amayem test]$

It didn’t match anything, because obviously the null string at the beginning of a line is not present at the end of a word. To test this one I will make two more files:

[ahmed@amayem test]$ echo ${TMP} | egrep -o '^check'
check
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check'
check
check

When I added the ^ at the beginning it matched only the first check, otherwise it matched both checks.

$

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv test^ test$
[ahmed@amayem test]$ mv check^test check$test
[ahmed@amayem test]$ mv testcheck^ testcheck$
[ahmed@amayem test]$ ls
check       test$       testcheck$

It looks like we lost check$test. This is because $test is considered a variable and is expanded to the empty string. To overcome this issue we need to use quotes.

[ahmed@amayem test]$ mv check 'check$test'
[ahmed@amayem test]$ ls
check$test  test$       testcheck$  

Great we have our files now.

[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check$test test$ testcheck$

Pathname Expansion Pattern Matching

The $ matches itself.

[ahmed@amayem test]$ ls test$
test$

Parameter Expansion Pattern Matching

The $ matches itself.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check$}
test test$ testcheck$
[ahmed@amayem test]$ echo ${TMP##check$}
test test$ testcheck$
[ahmed@amayem test]$ echo ${TMP##test$}
check$test test$ testcheck$

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check$}
check$test test$ test
[ahmed@amayem test]$ echo ${TMP%check$}
check$test test$ test
[ahmed@amayem test]$ echo ${TMP%test$}
check$test test$ testcheck$

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check$/x}
xtest test$ testcheck$
[ahmed@amayem test]$ echo ${TMP/test$/x}
check$test x testcheck$

Extended Regular Expression

The $ matches the null string at the end of a line.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test$'
[ahmed@amayem test]$

We got nothing because the end of $TMP is check$. Let’s try using check:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$'
[ahmed@amayem test]$ 

Still nothing. This is because check$ matches check followed by the null string at the end of a string, not the literal check$. To match the $ literally we need to use the backslash:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$'
check$
check$

Now to get the one at the end of the string we add the $ at the end:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check$$'
check$ 

|

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv test$ test|
> ^C
[ahmed@amayem test]$ 

Upon seeing the | shell understands that as a pipe hence, when we pressed enter we saw > which indicates the shell waiting for the end of the command. We cancel that command using ctrl+c. We will have to use quotes when using |.

[ahmed@amayem test]$ mv test$ 'test|'
[ahmed@amayem test]$ mv 'check$test' 'check|test'
[ahmed@amayem test]$ mv testcheck$ 'testcheck|'
[ahmed@amayem test]$ ls
check|test  testcheck|  test|
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check|test testcheck| test|

Pathname Expansion Pattern Matching

The | matches itself.

[ahmed@amayem test]$ ls "test|"
test|

Parameter Expansion Pattern Matching

The | matches itself.

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check|}
test testcheck| test|
[ahmed@amayem test]$ echo ${TMP##check|}
test testcheck| test|
[ahmed@amayem test]$ echo ${TMP##test|}
check|test testcheck| test|

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check|}
check|test testcheck| test|
[ahmed@amayem test]$ echo ${TMP%check|}
check|test testcheck| test|
[ahmed@amayem test]$ echo ${TMP%test|}
check|test testcheck|

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check|/x}
xtest testcheck| test|
[ahmed@amayem test]$ echo ${TMP/test|/x}
check|test testcheck| x

Extended Regular Expression

The | indicates a branch (a way of saying ‘or’ between patterns).

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|check'
check
test
test
test
check

The regex is saying match test or a check.

Note About GNU grep Versions

It turns out that there are some differences between versions of GNU grep that will affect the output. The following works with GNU grep 2.6.3,

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|'
test
test
test

but it won’t work with GNU grep 2.5.1, which will give the following:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test|'
[ahmed@amayem test]$

#

To test this we will have to modify our file names:

[ahmed@amayem test]$ mv 'test|' test*
[ahmed@amayem test]$ mv 'check|test' check*test
[ahmed@amayem test]$ mv 'testcheck|' testcheck*
[ahmed@amayem test]$ ls
check*test  test*  testcheck*
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check*test test* testcheck* testcheck*

That last line doesn’t seem right. Why is it printing out testcheck* twice? The reason is that the shell is actually performing pathname expansion on test*:

[ahmed@amayem test]$ echo test*
test* testcheck*

Hence we see an extra testcheck*. Anyways this should not affect our tests.

Pathname Expansion Pattern Matching

The “ escapes the special characters used by the system.

[ahmed@amayem test]$ ls test*
test*

What happens if we put a “ before a regular character:

[ahmed@amayem test]$ ls test
test

It has no effect, and it is as if the “ doesn’t exist there.

Parameter Expansion Pattern Matching

The “ escapes the special characters used by the system

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check*}
test test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP##check*}
test test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP##test*}
check*test test* testcheck* testcheck*

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%check*}
check*test test* testcheck* test
[ahmed@amayem test]$ echo ${TMP%check*}
check*test test* testcheck* test
[ahmed@amayem test]$ echo ${TMP%test*}
check*test test* testcheck* testcheck*

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/check*/x}
xtest test* testcheck* testcheck*
[ahmed@amayem test]$ echo ${TMP/test*/x}
check*test x testcheck*

Extended Regular Expression

The “ escapes the special characters used by the system.

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check*'
check
check
check
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check*'
check*
check*
check*

In the first instance the * was not escaped and so it’s special meaning, which is the k appearing zero or more times applied. In the second instance it was escaped and so it matched the * literally.

What happens if we put a “ before a regular character:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'check'
check
check
check

It has no effect, and it is as if the “ doesn’t exist there.

egrep bug

Even though egrep is supposed to be matching according to extended regular expressions (That’s what the e in egrep stands for) we get the following error:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'chec1'
egrep: Invalid back reference

Back-referencing is not a part of the extended regular expressions as mentioned in POSIX standards

{ bound }

Countering brace expansion

When testing this special operator out we need to watch out for brace expansion. Brace expansion is explained in more detail in the man bash pages:

Brace expansion is a mechanism by which arbitrary strings may be generated. This mechanism is similar to pathname expansion, but the filenames generated need not exist. Patterns to be brace expanded take the form of an optional preamble, followed by either a series of comma-separated strings or a sequence expression between a pair of braces, followed by an optional postscript. The preamble is prefixed to each string contained within the braces, and the postscript is then appended to each resulting string, expanding left to right.

The following is an example:

[ahmed@amayem test]$ echo test{0,2}
test0   test2

If we want to print it out literally we have to to quote it:

[ahmed@amayem test]$ echo "test{0,2}"
test{0,2}
[ahmed@amayem test]$ echo test{0,2}
test{0,2}

Let’s change the file names to help in testing:

[ahmed@amayem test]$ mv test* "test{0,2}"
[ahmed@amayem test]$ mv check*test "check{0,2}test"
[ahmed@amayem test]$ mv testcheck* "testcheck{0,2}"
[ahmed@amayem test]$ ls
check{0,2}test  testcheck{0,2}  test{0,2}
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check{0,2}test testcheck{0,2} test{0,2}

Pathname Expansion Pattern Matching

The {bound} matches itself:

[ahmed@amayem test]$ ls "test{0,2}"
test{0,2}   

Parameter Expansion Pattern Matching

The {bound} matches itself:

${parameter#word} ${parameter##word}

[ahmed@amayem test]$ echo ${TMP#check{0,2}}
}test testcheck{0,2} test{0,2}}

Note that the last } was not considered part of the regex. It was instead considered the closing brace of the ${parameter#word} formula. Therefore there is an extra } printed at the end. To overcome this issue we will have to use quotes:

[ahmed@amayem test]$ echo ${TMP#"check{0,2}"}
test testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP##"check{0,2}"}
test testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP##"test{0,2}"}
check{0,2}test testcheck{0,2} test{0,2}

${parameter%word} ${parameter%word}

[ahmed@amayem test]$ echo ${TMP%"test{0,2}"}
check{0,2}test testcheck{0,2}
[ahmed@amayem test]$ echo ${TMP%"test{0,2}"}
check{0,2}test testcheck{0,2}
[ahmed@amayem test]$ echo ${TMP%"check{0,2}"}
check{0,2}test testcheck{0,2} test{0,2}

${parameter/pattern/string}

[ahmed@amayem test]$ echo ${TMP/"check{0,2}"/x}
xtest testcheck{0,2} test{0,2}
[ahmed@amayem test]$ echo ${TMP/"test{0,2}"/x}
check{0,2}test testcheck{0,2} x

Extended Regular Expression

An atom followed by a bound containing two integers i and j matches a sequence of i through j (inclusive) matches of the atom. Lets add some more files that I can test:

[ahmed@amayem test]$ touch tes testt testtt
[ahmed@amayem test]$ ls
check{0,2}test  tes     testcheck{0,2}  testt       testtt      test{0,2}
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
check{0,2}test tes testcheck{0,2} testt testtt test{0,2}

Now let’s test:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test{0,2}'
test
tes
test
testt
testt
test

It was able to pick up tes all the tests and testt as well. It didn’t however pick up testtt because that would be three ts.

[]

To test this special operator let’s make a brand new set of files.

We first begin by deleting our current files in our test directory. Do not do this if you are not in the test directory and want to keep your files.

[ahmed@amayem test]$ ls
check{0,2}test  tes     testcheck{0,2}  testt       testtt      test{0,2}
[ahmed@amayem test]$ rm *
[ahmed@amayem test]$ ls
[ahmed@amayem test]$ 

We used pathname expansion as we saw earlier to remove all file names with just the *.

Next let’s make some relevant files:

[ahmed@amayem test]$ touch testA test1 test.
[ahmed@amayem test]$ ls
test.   test1   testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test. test1 testA

General

[characters] matches any one of the enclosed characters.

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls tes[t][1A]
test1   testA

We used it twice in a row. Notice that we can put one character in the square brackets but it would be the same as having it without square brackets:

[ahmed@amayem test]$ ls tes[A]
ls: tes[A]: No such file or directory
[ahmed@amayem test]$ ls tesA
ls: tesA: No such file or directory 

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[1A.]}
test1 testA
[ahmed@amayem test]$ echo ${TMP##test[1A.]}
test1 testA
[ahmed@amayem test]$ echo ${TMP##test[1A]}
test. test1 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[1A.]}
test. test1
[ahmed@amayem test]$ echo ${TMP%test[1A.]}
test. test1
[ahmed@amayem test]$ echo ${TMP%test[1.]}
test. test1 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[.1A]/x}
x test1 testA
[ahmed@amayem test]$ echo ${TMP/test[1A]/x}
test. x testA

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[.1A]'
test.
test1
testA

Range

A pair of characters separated by a hyphen denotes a range expression; any character that sorts between those two characters, inclusive, using the current locale’s collating sequence and character set, is matched.

Let’s add another file for testing purposes:

[ahmed@amayem test]$ touch test2
[ahmed@amayem test]$ ls
test.   test1   test2   testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test. test1 test2 testA

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls test[0-2]
test1   test2
[ahmed@amayem test]$ ls test[A-Z]
testA

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[+-0]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[!-A]}
test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[+-0]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[!-Z]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[!-Z]}
test. test1 test2
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[!-Z]/x}
x test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0-Z]/x}
test. x test2 testA

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[!-Z]'
test.
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0-9]'
test1
test2

Character Classes

According to the man bash page:

    Within [ and ], character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard: alnum alpha ascii blank cntrl digit graph lower print punct space upper word xdigit
    A character class matches any character belonging to that class.

The names of the character classes are pretty straighforward. Let’s test them out.

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls test[[:alnum:]]
test1   test2   testA

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[[:alnum:]]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[[:punct:]]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[[:graph:]]}
test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[[:alnum:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:alnum:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:ascii:]]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[[:digit:]]}
test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[[:alnum:]]/x}
test. x test2 testA
[ahmed@amayem test]$ echo ${TMP/test[[:ascii:]]/x}
x test1 test2 testA

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[[:alnum:]]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[[:punct:]]'
test.

Negation

According to the man bash page for pattern matching:

If the first character following the `[` is a `!` or a `^` then any character not enclosed is matched.

It is the same for extended regular expressions except that the ! does not work.

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls test[^[:punct:][:alpha:]]
test1   test2   
[ahmed@amayem test]$ ls test[![:punct:][:alpha:]]
test1   test2

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[^A-Z]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[^A-Z]}
test1 test2 testA
[ahmed@amayem test]$ echo ${TMP##test[^+-Z]}
test. test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[^A-Z]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[^0-9]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^0]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^0]}
test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[^A0.]}
test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[![:ascii:]]/x}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[![:digit:]]/x}
x test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[![:digit:].]/x}
test. test1 test2 x

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[^[:punct:]]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[^[:punct:][:alpha:]]'
test1
test2

Let’s make sure that the ! does not work:

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[![:punct:][:alpha:]]'
test.
testA

Its not working, just as expected.

The dash –

A – may be matched by including it as the first or last character in the set.

To test this we need to add a file with a dash:

[ahmed@amayem test]$ touch test-
[ahmed@amayem test]$ ls
test-   test.   test1   test2   testA
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test- test. test1 test2 testA

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls test[-At]
test-   testA
[ahmed@amayem test]$ ls test[A-t]
testA
[ahmed@amayem test]$ ls test[At-]
test-   testA

Notice that when the dash is in the middle it is considered to be a range expression.

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[-0Z]}
test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[0-Z]}
test- test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP#test[0Z-]}
test. test1 test2 testA
${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[-0Z]}
test- test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP%test[0-Z]}
test- test. test1 test2
[ahmed@amayem test]$ echo ${TMP%test[0Z-]}
test- test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[-0Z]/x}
x test. test1 test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0-Z]/x}
test- test. x test2 testA
[ahmed@amayem test]$ echo ${TMP/test[0Z-]/x}
x test. test1 test2 testA

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[-0Z]'
test-
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0-Z]'
test1
test2
testA
[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[0Z-]'
test-

The Closing Bracket ]

A ] may be matched by including it as the first character in the set.

let’s add a file that contains ] for the sake of testing:

[ahmed@amayem test]$ touch test]
[ahmed@amayem test]$ ls
test-   test.   test1   test2   testA   test]
[ahmed@amayem test]$ TMP=$(ls)
[ahmed@amayem test]$ echo ${TMP}
test- test. test1 test2 testA test]

Pathname Expansion Pattern Matching

[ahmed@amayem test]$ ls test[]1A]
test1   testA   test]
[ahmed@amayem test]$ ls test[1]A]
ls: test[1]A]: No such file or directory
[ahmed@amayem test]$ ls test[1A]]
ls: test[1A]]: No such file or directory

Notice that if ] is not the first character inside the square brackets then the brackets are considered to have been closed.

Parameter Expansion Pattern Matching

${parameter#word} ${parameter##word}
[ahmed@amayem test]$ echo ${TMP#test[]-1A]}
test- test. test1 test2 testA test]
[ahmed@amayem test]$ echo ${TMP#test[]1A-]}
test. test1 test2 testA test]
[ahmed@amayem test]$ echo ${TMP##test[]1A-]}
test. test1 test2 testA test]

Notice that the dash and the ] cannot both be in the beginning of the range at the same time, hence the dash can be at the end.

${parameter%word} ${parameter%word}
[ahmed@amayem test]$ echo ${TMP%test[]-1A]}
test- test. test1 test2 testA test]

How come it’s not deleting test]? That is probably because the range ]-1 doesn’t make sense because the ] comes after the 1. Once we put the dash at the end it should work:

[ahmed@amayem test]$ echo ${TMP%test[]1A-]}
test- test. test1 test2 testA
${parameter/pattern/string}
[ahmed@amayem test]$ echo ${TMP/test[]A]/x}
test- test. test1 test2 x test]
[ahmed@amayem test]$ echo ${TMP/test[]Z]/x}
test- test. test1 test2 testA x

Extended Regular Expression

[ahmed@amayem test]$ echo ${TMP} | egrep -o 'test[]-]'
test-
test]

Next Steps

  1. A Table of Practical Matching Differences Between Pattern Matching Notation Used in Pathname and Parameter Expansion and Extended Regular Expressions

References

  1. IEEE Std 1003.1, 2013 Edition
  2. man bash page
  3. man re_format page (FreeBSD version)

Ahmed Amayem has written 90 articles

A Web Application Developer Entrepreneur.