Bash Arrays 1: Intro, Declaration, Assignments, Dereferncing (accessing elements) and special subscripts

During my attempts at making a recursive shell script that ouputs a graphical tree display of a directory structure I ran into some trouble with arrays, so I decided to delve deeper into arrays to better understand them.

Setup

Let’s make a shell script. In your favourite editor type

#!/bin/bash

Save it somewhere as arrays.sh. Now we need to make it executable as follows:

[ahmed@amayem ~]$ chmod +x ./arrays.sh 
[ahmed@amayem ~]$ ./arrays.sh 
[ahmed@amayem ~]$

Looks good so far.

The Man Page

I will post the Arrays section of the man page for bash here as a reference. If you don’t feel like reading it, I will be covering it practically in the next section:

[ahmed@amayem ~]$ man bash

Lets go to the Arrays section:

Arrays
   Bash provides one-dimensional array variables. Any variable may be used as an array; the declare builtin will explicitly declare an array. There is no maximum limit on the size of an array, nor any requirement that members be indexed or assigned contiguously. Arrays are indexed using integers and are zero-based.

   An array is created automatically if any variable is assigned to using the syntax name[subscript]=value. The subscriptis treated as an arithmetic expression that must evaluate to a number greater than or equal to zero. To explicitly declare an array, use declare -a name (see SHELL BUILTIN COMMANDS below). declare -a name[subscript] is also accepted;the subscript is ignored. Attributes may be specified for an array variable using the declare and readonly builtins.Each attribute applies to all members of an array.

   Arrays are assigned to using compound assignments of the form name=(value1 ... valuen), where each value is of the form[subscript]=string. Only string is required. If the optional brackets and subscript are supplied, that index is assigned to; otherwise the index of the element assigned is the last index assigned to by the statement plus one.Indexing starts at zero. This syntax is also accepted by the declare builtin. Individual array elements may be assigned to using the name[subscript]=value syntax introduced above.

   Any element of an array may be referenced using ${name[subscript]}. The braces are required to avoid conflicts withpathname expansion. If subscript is @ or *, the word expands to all members of name. These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with  the value of each array member separated by the first character of the IFS special variable, and ${name[@]} expands each element of name to a separate word. When there are no array members, ${name[@]} expands to nothing. If the double-quoted expansion occurs within a word, the expansion of the first parameter is joined with the beginning part of the original word, and the expansion of the last parameter is joined with the last part of the original word. This is analogous to the expansion of the special parameters * and @ (see Special Parameters above). ${#name[subscript]}expands to the length of ${name[subscript]}. If subscript is * or @, the expansion is the number of elements in the array. Referencing an array variable without a subscript is equivalent to referencing element zero.

   The unset builtin is used to destroy arrays. unset name[subscript] destroys the array element at index subscript. Care must be taken to avoid unwanted side effects caused by filename generation. unset name, where name is an array, or unset name[subscript], where subscript is * or @, removes the entire array.

  The declare, local, and readonly builtins each accept a -a option to specify an array. The read builtin accepts a -a option to assign a list of words read from the standard input to an array. The set and declare builtins display array values in a way that allows them to be reused as assignments.

Declaration and Assignment

First let’s declare some arrays. The man page gives the following way:

An array is created automatically if any variable is assigned to using the syntax name[subscript]=value. The subscriptis treated as an arithmetic expression that must evaluate to a number greater than or equal to zero.

as well as:

To explicitly declare an array, use declare -a name (see SHELL BUILTIN COMMANDS below). declare -a name[subscript] is also accepted; the subscript is ignored.

and:

Arrays are assigned to using compound assignments of the form name=(value1 ... valuen), where each value is of the form[subscript]=string.

Under typeset we also find:

 typeset [-afFirtx] [-p] [name[=value] ...]
  -a     Each name is an array variable (see Arrays above).

So we have found four ways. Let’s get practical:

#!/bin/bash

#Declarations
array1[5]=five

declare -a array2

array3=(zero 1 two 3 four)

typeset -a array4

When we run the script we get no errors, which is good. But we should really be testing these arrays out, let’s find out what’s in those arrays:

Dereferencing

“Dereferencing” is just a fancy word which means, in this technical context, finding the value at a certain reference or index in the array. The man page tells us:

Any element of an array may be referenced using ${name[subscript]}. The braces are required to avoid conflicts withpathname expansion.

Let’s get practical. Add the following lines to our script:

#Dereferencing
echo ${array1[5]}
echo ${array1[0]}
echo ${array3[0]}

Running the script produces the following:

[ahmed@amayem ~]$ ./arrays.sh 
five

zero

I wonder what will happen if I make five a variable name and assign it a variable. What would be stored in array1[5]. Let’s give it a try, and add this at the beginning of the script:

five=5

When I run it, however, I still get five as output. This is because shell scripting does not evaluate variables unless there is a $ before the name. So we would have to change the assignment as follows:

array1[5]=five

becomes

array1[5]=$five

There are also some interesting subscripts to note about arrays:

Special subscripts

The subscript is what goes into the square brackets when you are dereferencing.

* and @

If you would like to see all the elements of an array you can use * or @ as mentioned in the man pages:

If subscript is @ or *, the word expands to all members of name.

Let’s try them out. Replace the dereferencing part of the script with the following:

#Dereferencing
echo ${array3[*]}
echo ${array3[*]}

This will output:

[ahmed@amayem ~]$ ./arrays.sh 
zero 1 two 3 four
zero 1 two 3 four

So what’s the difference between them?

These subscripts differ only when the word appears within double quotes. If the word is double-quoted, ${name[*]} expands to a single word with the value of each array member separated by the first character of the IFS special variable, and ${name[@]} expands each element of name to a separate word. When there are no array members, ${name[@]} expands to nothing. If the double-quoted expansion occurs within a word, the expansion of the first parameter is joined with the beginning part of the original word, and the expansion of the last parameter is joined with the last part of the original word. This is analogous to the expansion of the special parameters * and @ (see Special Parameters above).

The man pages keep mentioning word, which becomes confusing. Luckily they define it at the beginning of the page:

word  A sequence of characters considered as a single unit by the shell. Also known as a token.

Things are becoming clearer now. The difference will become clear if we use those subscripts somewhere where we can see the difference between the number of words used. Let’s use printf to test:

#Dereferencing
printf "%s-" "${array3[*]}"
echo
printf "%s-" "${array3[@]}"
echo

printf "%s-" ${array3[*]}
echo
printf "%s-" ${array3[@]}
echo

Now when we test we get:

[ahmed@amayem ~]$ ./arrays.sh 
zero 1 two 3 four-
zero-1-two-3-four-
zero-1-two-3-four-
zero-1-two-3-four-

We notice that the first one is different. As the man page said, the whole array was considered one word and so it was fully printed as one word, while as the other lines considered each element a separate word. The expansion of the first line would have been as follows:

printf "%s-" "zero 1 two 3 four"

While as the other lines were expanded to the following:

printf "%s-" zero 1 two 3 four

#

This one is straight forward, it gives you the length of either the element:

${#name[subscript]} expands to the length of ${name[subscript]}.

or the whole array:

If subscript is * or @, the expansion is the number of elements in the array.

Practically:

echo ${#array3[0]} #outputs 4, because the length of `zero` is four
echo ${#array3[@]} #outputs 5

Array indices

You may have noticed that we can assign values to any index above zero that we want. That means that the list of indices that are set are not necessarily zero to the length of the array minus one.

array1[5]=$five     #Has only one index: 5

declare -a array2   #Has no inices

array3=("zero" "1" "two" "3" "four") Has 5 indices: 0 - 4

You may be in a position where you want to find the indices of an array. Luckily bash has that functionality:

${!name[*]}
          If name is an array variable, expands to the list of array indices (keys) assigned in name. If name is not an array, expands to 0 if name is set and null otherwise. When @ is used and the expansion appears within double quotes, each key expands to a separate word.

Let’s give it a try:

echo ${!array1[*]}
echo ${!array3[@]}

Gives us this:

[ahmed@amayem ~]$ ./arrays.sh 
5
0 1 2 3 4

Are the indices in an array?

Let’s test it:

array3indices=${!array3[@]}
echo ${#array3indices[*]}
echo ${array3indices}

Gives:

[ahmed@amayem ~]$ ./arrays.sh 
1
0 1 2 3 4

So it’s not returned as an array but as a string.

Putting array indices into an array

If you want to put the indices in an array then simply put the ${!name[*]} in brackets like so:

array3indices=(${!array3[@]})
echo ${#array3indices[*]}       #this gives us 5

Note on printf

You may be wondering why printf printed more arguments than was specified in the format string. In our example above we had a format string of "%s-", but we had several arguments when we expanded the array. The man page explains why:

The format is reused as necessary to consume all of the arguments. If the format requires more arguments than are supplied, the extra format specifications behave as if a zero value or null string, as appropriate, had been supplied. The return value is zero on success, non-zero on failure.

References

The bash man pages.