ShellTree 4: Building a New Recursive Script

This post is part of an educational series on building a shell script to graphically display the structure of a directory.

Previously

  1. We broke down Dem Pilafian’s one line command to display a tree of a directory
  2. We broke down Dem Pilafian’s script that uses the one line command
  3. We modified the one line to show files, as well as directories

Goals

  1. Display a tree of a directory
  2. Show, or have the option to show which files are directories.
  3. Display the tree using unicode characters like: , and

Pre-requisites

  1. Understanding Bash Arrays
  2. Understanding how to make recursive functions

Getting the code

Run the following command:

git clone https://github.com/amayem/shell-tree.git

In any step past step-0 you can get the code by issuing the following command but changing the step number to the appropriate one:

git checkout -f step-1

Step-0 cloning the repo

[ahmed@amayem ~]$ git clone git@github.com:amayem/shell-tree.git
Cloning into 'shell-tree'...
Warning: Permanently added the RSA host key for IP address '192.30.252.130' to the list of known hosts.
remote: Counting objects: 4, done.
remote: Compressing objects: 100% (4/4), done.
remote: Total 4 (delta 0), reused 0 (delta 0)
Receiving objects: 100% (4/4), done.
[ahmed@amayem ~]$ ahmedamayem$ ls
shell-tree
[ahmed@amayem ~]$ ahmedamayem$ cd shell-tree/
[ahmed@amayem shell-tree]$ ahmedamayem$ ls -A
.git        LICENSE     README.md

We will use this as our testing directory for our tree command.

Step-1 Making the script file

In your favourite editor let’s start the script:

#!/bin/sh

Save it as tree.sh. Now we need to allow it to be execuatable:

[ahmed@amayem shell-tree]$ chmod +x tree.sh 
[ahmed@amayem shell-tree]$ ./tree.sh 

Good its working but doing nothing.

Step-2 Capture the output of ls in an array

currentDir=($(ls))

for item in ${currentDir[*]}
do
    printf "%s\n" $item
done    

Let’s test it out:

[ahmed@amayem shell-tree]$ ./tree.sh 
LICENSE
README.md
tree.sh

Step-3 Adding the unicode graphics

Step-3.1 Adding the stem:

for item in ${currentDir[*]}
do
    printf "├─%s\n" $item
done

Testing:

[ahmed@amayem shell-tree]$ ./tree.sh 
├─LICENSE
├─README.md
├─tree.sh   

We want the last line to not have a continuing stem:

Setp3.2 Cutting off the leftover final stem

Step-3.2.1 Less efficient

This can be done several ways, first let’s do it in a less efficient way:

currentDir=($(ls))
typeset -i lastIndex index
lastIndex=$((${#currentDir[*]} - 1))

for index in ${!currentDir[*]}
do
    if [ "$index" -lt "$lastIndex" ]; then
        printf "├─%s\n" ${currentDir[$index]}
    else
        printf "└─%s\n" ${currentDir[$index]}
    fi
done

Notice that we are checking if we are at the last index everytime. This can be avoided, which would make the script more efficient

Step-3.2.2 More efficient

currentDir=($(ls))
typeset -i lastIndex index
lastIndex=$((${#currentDir[*]} - 1))

for ((index=0; index<lastIndex; index++))
do
        printf "├─%s\n" ${currentDir[$index]}
done

printf "└─%s\n" ${currentDir[$lastIndex]}

The output for both ways is:

[ahmed@amayem shell-tree]$ ./tree.sh 
├─LICENSE
├─README.md
└─tree.sh

Looking much better.

Step-4 Turning our code into a reusable function

We like what we have but I want to be able to list the directories recursively, such that I can run the same bit of code again and again for sub-directories. For more on bash recursion check Bash Recursion Examples and Experiments with Local Variables Let’s turn our code into a function:

listdir()
{
    currentDir=($(ls $1))
    typeset -i lastIndex index
    lastIndex=$((${#currentDir[*]} - 1))

    for ((index=0; index<lastIndex; index++))
    do
            printf "├─%s\n" ${currentDir[$index]}
    done

    printf "└─%s\n" ${currentDir[$lastIndex]}
}

listdir $PWD

Step-5 Checking if each element is a directory and recursing when it is

listdir()
{
    currentDir=($(ls $1))
    typeset -i lastIndex index
    lastIndex=$((${#currentDir[*]} - 1))

    for ((index=0; index<lastIndex; index++))
    do
            printf "├─%s\n" ${currentDir[$index]}
            if [ -d ${currentDir[$index]} ]; then
                listdir ${currentDir[$index]}
            fi  
    done

    printf "└─%s\n" ${currentDir[$lastIndex]}
}

listdir $PWD

Since I don’t have any directories in my current directory let’s move to the .git directory:

[ahmed@amayem shell-tree]$ cd .git
[ahmed@amayem .git]$ ./../tree.sh
├─HEAD
├─config
├─description
├─hooks
├─applypatch-msg.sample
├─commit-msg.sample
├─post-update.sample
├─pre-applypatch.sample
├─pre-commit.sample
├─pre-push.sample
├─pre-rebase.sample
├─prepare-commit-msg.sample
└─update.sample
├─pre-commit.sample
├─pre-push.sample
├─pre-rebase.sample
├─prepare-commit-msg.sample
├─update.sample
└─

It’s outputting the sub-directory contents too, but the output is all muddled up. Notice how pre-commit.sample and the items under it are repeating. This happened because we didn’t make our variables inside listdir() local. Let’s explain this in a chart. The following is how it is supposed to be:

Call level      Contents of currentDir
1               HEAD config …
2               applypatch-msg.sample commit-msg.sample …
1               HEAD config

The following is what happened:

Call level      Contents of currentDir
1               HEAD config …
2               applypatch-msg.sample commit-msg.sample …
1               applypatch-msg.sample commit-msg.sample …

Basically the array currentDir of the first call was overwritten. So when we went back to the first call, it continued listing where it left off the first time, which was at the fourth element. For more on the use of local in recursive function check Bash Recursion Examples and Experiments with Local Variables

Step-6 Localizing the function variables

listdir()
{
    local currentDir=($(ls $1))
    local currentDir=($(ls $1))
    local -i lastIndex=$((${#currentDir[*]} - 1)) index

    for ((index=0; index<lastIndex; index++))
    do
            printf "├─%s\n" ${currentDir[$index]}
            if [ -d ${currentDir[$index]} ]; then
                listdir ${currentDir[$index]}
            fi  
    done

    printf "└─%s\n" ${currentDir[$lastIndex]}
}

listdir $PWD

Here is the output:

[ahmed@amayem .git]$ ./../tree.sh
├─HEAD
├─config
├─description
├─hooks
├─applypatch-msg.sample
├─commit-msg.sample
├─post-update.sample
├─pre-applypatch.sample
├─pre-commit.sample
├─pre-push.sample
├─pre-rebase.sample
├─prepare-commit-msg.sample
└─update.sample
├─index
├─info
└─exclude
├─logs
├─HEAD
└─refs
├─objects
├─info
└─exclude
└─pack
├─packed-refs
└─refs

We need to indent a few spaces every level down.

Step-7 Indenting every level down

To do this we need to keep track of how far down we are relative to the first function call. We can achieve this by passing another paramter to our listdir() function that gives it that information. We also need to know how many spaces to indent at each level, so I made a global variable called indentSize. Everytime we call a new listdir() function we pass it $(($indents + $indentSize)):

listdir()
{
    local currentDir=($(ls $1))
    local -i lastIndex=$((${#currentDir[*]} - 1)) index indents=$2

    for ((index=0; index<lastIndex; index++))
    do
            printf "%.${indents}s├─%s\n" " " ${currentDir[$index]}
            if [ -d ${currentDir[$index]} ]; then
                listdir ${currentDir[$index]} $(($indents + $indentSize))
            fi  
    done

    printf "%.${indents}s└─%s\n" " " ${currentDir[$lastIndex]}
}

listdir $PWD 0

Notice this line: printf "%.${indents}s├─%s\n" " " ${currentDir[$index]}. This is how we are printing the number of spaces before the item.

[ahmed@amayem .git]$ ./../tree.sh
 ├─HEAD
 ├─config
 ├─description
 ├─hooks
   ├─applypatch-msg.sample
   ├─commit-msg.sample
   ├─post-update.sample
   ├─pre-applypatch.sample
   ├─pre-commit.sample
   ├─pre-push.sample
   ├─pre-rebase.sample
   ├─prepare-commit-msg.sample
   └─update.sample
 ├─index
 ├─info
   └─exclude
 ├─logs
   ├─HEAD
   └─refs
 ├─objects
   ├─info
      └─exclude
   └─pack
 ├─packed-refs
 └─refs

It looks like it is coming around. Next we need to figure out how to continue the stems of directories to connect to the next item in the directory contents.

Step-8 Continuing the Stem

We just need to add a at the beginning of the spaces. Replace the two printf statements with the following two respectively:

printf "%.${indents}s├─%s\n" "│ " ${currentDir[$index]}

and

printf "%.${indents}s└─%s\n" "│ " ${currentDir[$lastIndex]}

The output is as follows:

[ahmed@amayem .git]$ ./../tree.sh
├─HEAD
├─config
├─description
├─hooks
│ ├─applypatch-msg.sample
│ ├─commit-msg.sample
│ ├─post-update.sample
│ ├─pre-applypatch.sample
│ ├─pre-commit.sample
│ ├─pre-push.sample
│ ├─pre-rebase.sample
│ ├─prepare-commit-msg.sample
│ └─update.sample
├─index
├─info
│ └─exclude
├─logs
│ ├─HEAD
│ └─refs
├─objects
│ ├─info
│ └─exclude
│ └─pack
├─packed-refs
└─refs

It looks good, except for this part:

├─objects
│ ├─info
│ └─exclude
│ └─pack

What happened here. Let’s take a look at objects:

[ahmed@amayem .git]$ ls -l objects
total 0
drwxr-xr-x  2 ahmedamayem  staff   68 11 May 13:46 info
drwxr-xr-x  4 ahmedamayem  staff  136 11 May 13:46 pack

So they are both directories. In my debugging I found the following: ./../tree.sh: line 23: currentDir: bad array subscript

That means that we are giving a wrong index. It turns out that the index was -1. This is solved with a simple if statement.

Step-9 Checking Directories and Passing Full Paths

We will correct the above mistake and prevent further problems by passing the whole absolute directory:

#!/bin/sh

declare -i indentSize=5

listdir()
{
    local currentPath=$1
    local currentDir=($(ls $1))
    local -i lastIndex=$((${#currentDir[*]} - 1)) index indents=$2

    for ((index=0; index<lastIndex; index++))
    do
            printf "%.${indents}s├─%s\n" "│ " ${currentDir[$index]}
            if [ -d "$currentPath/${currentDir[$index]}" ]; then
                listdir "$currentPath/${currentDir[$index]}" $(($indents + $indentSize))
            fi  
    done

    if [ $lastIndex -ge 0 ]; then
        printf "%.${indents}s└─%s\n" "│ " ${currentDir[$lastIndex]}
    fi
}

listdir $PWD 0

The output:

[ahmed@amayem .git]$ ./../tree.sh
├─HEAD
├─config
├─description
├─hooks
│ ├─applypatch-msg.sample
│ ├─commit-msg.sample
│ ├─post-update.sample
│ ├─pre-applypatch.sample
│ ├─pre-commit.sample
│ ├─pre-push.sample
│ ├─pre-rebase.sample
│ ├─prepare-commit-msg.sample
│ └─update.sample
├─index
├─info
│ └─exclude
├─logs
│ ├─HEAD
│ └─refs
├─objects
│ ├─info
│ └─pack
├─packed-refs
└─refs

Things look pretty good, but there is a case we haven’t covered: a directory at the end of the list. refs is a directory at the end but it is not showing up properly.

Step-10 Listing a Directory at the End of the List

The solution is to simply add the if statement we had in the body to the tail. Replace the last if statement with the following:

if [ $lastIndex -ge 0 ]; then
    printf "%.${indents}s└─%s\n" "│ " ${currentDir[$lastIndex]}
    if [ -d "$currentPath/${currentDir[$index]}" ]; then
        listdir "$currentPath/${currentDir[$index]}" $(($indents + $indentSize))
    fi
fi

which produces:

├─HEAD
├─config
├─description
├─hooks
│ ├─applypatch-msg.sample
│ ├─commit-msg.sample
│ ├─post-update.sample
│ ├─pre-applypatch.sample
│ ├─pre-commit.sample
│ ├─pre-push.sample
│ ├─pre-rebase.sample
│ ├─prepare-commit-msg.sample
│ └─update.sample
├─index
├─info
│ └─exclude
├─logs
│ ├─HEAD
│ └─refs
│ ├─heads
│ └─master
│ └─remotes
│ └─origin
│ └─HEAD
├─objects
│ ├─info
│ └─pack
│ ├─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.idx
│ └─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.pack
├─packed-refs
└─refs
│ ├─heads
│ └─master
│ ├─remotes
│ └─origin
│ └─HEAD
│ └─tags

The last directory is being displayed now but it doesn’t seem to be indenting properly. In fact anything past the second level doesn’t seem to be indenting properly.

Step-11 Indenting Subdirectories Properly

After some tinkering around I found out the following from the man page of printf:

Precision:
         An optional period, `.', followed by an optional digit string giving a precision which specifies the number of digits to appear after the decimal point, for e and f formats, or the maximum number of characters to be printed from a string; if the digit string is missing, the precision is treated as zero;

So the number of the precision is how much of a string is to be printed, not how many times a character is printed. Let’s add some spaces to our printf statements.

printf "%.${indents}s├─%s\n" "│                                        " ${currentDir[$index]}

We get the following output:

├─HEAD
├─config
├─description
├─hooks
│  ├─applypatch-msg.sample
│  ├─commit-msg.sample
│  ├─post-update.sample
│  ├─pre-applypatch.sample
│  ├─pre-commit.sample
│  ├─pre-push.sample
│  ├─pre-rebase.sample
│  ├─prepare-commit-msg.sample
│  └─update.sample
├─index
├─info
│  └─exclude
├─logs
│  ├─HEAD
│  └─refs
│       ├─heads
│            └─master
│       └─remotes
│            └─origin
│                 └─HEAD
├─objects
│  ├─info
│  └─pack
│       ├─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.idx
│       └─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.pack
├─packed-refs
└─refs
│  ├─heads
│       └─master
│  ├─remotes
│       └─origin
│            └─HEAD
│  └─tags

The sub-directories are indenting properly, but I am anticipating a problem. Look at this part:

└─refs
│  ├─heads
│       └─master
│  ├─remotes
│       └─origin
│            └─HEAD

The problem here is in how to figure out when I need to print more at a sub-level, and when I shouldn’t print one. For example there shouldn’t be any more under refs because it is the last element in the list, but there should be one between heads and remotes. The rule is simple, if we are the last directory of a list then don’t print a at that level, otherwise we should print one.

Step-12 Redesigning the Indentation Settings by Passing an Array

My first inclination is to pass a new paramater to listdir() that tells it whether to print a bar or not. However, once we get into deeper levels I will have to start passing one parameter per level, which would be bad design. Instead I will make an array, where the index indicates how deep we are and the element at that index indicates whether a bar is required or not. We also need to tell the function the level it is at so that it knows how far in the array it should look:

indent()
{
    for ((n=0; n<indentSize; n++))
    do
        printf " "
    done
}

declare -i indentSize=1
declare -a levelFlags=(1)

listdir()
{
    local currentPath=$1
    local -a currentDir=($(ls $1)) levelFlags=(${levelFlags[@]})
    local -i lastIndex=$((${#currentDir[*]} - 1)) index level=$2 nextLevel=$2+1

    for ((index=0; index<lastIndex; index++))
    do
        for ((j=0; j<level; j++))
        do
            if [ "${levelFlags[$j]}" == "1" ]; then
                printf "│"
                indent
            else
                printf " " 
                indent
            fi
        done
        printf "├─%s\n" ${currentDir[$index]}
        if [ -d "$currentPath/${currentDir[$index]}" ]; then
            levelFlags[$level]=1
            listdir "$currentPath/${currentDir[$index]}" $nextLevel
        fi  
    done

    if [ $lastIndex -ge 0 ]; then
        for ((j=0; j<level; j++))
        do
            if [ "${levelFlags[$j]}" == "1" ]; then
                printf "│"
                indent
            else
                printf " " 
                indent
            fi
        done
        printf "└─%s\n" ${currentDir[$lastIndex]}
        if [ -d "$currentPath/${currentDir[$index]}" ]; then
            levelFlags[$level]=0
            listdir "$currentPath/${currentDir[$index]}" $nextLevel
        fi
    fi
}
listdir $PWD 0

The way we did the indentations previously was not very optimal. The long string of spaces may not be long enough if the directories are really deep. We had to change the way we output the indentations using the flags by using the indent function. Let’s see what we get:

[ahmed@amayem .git]$ ./../tree.sh 
├─HEAD
├─config
├─description
├─hooks
│ ├─applypatch-msg.sample
│ ├─commit-msg.sample
│ ├─post-update.sample
│ ├─pre-applypatch.sample
│ ├─pre-commit.sample
│ ├─pre-push.sample
│ ├─pre-rebase.sample
│ ├─prepare-commit-msg.sample
│ └─update.sample
├─index
├─info
│ └─exclude
├─logs
│ ├─HEAD
│ └─refs
│   ├─heads
│   │ └─master
│   └─remotes
│     └─origin
│       └─HEAD
├─objects
│ ├─info
│ └─pack
│   ├─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.idx
│   └─pack-5d8c6d23ff13eded7a9d401ff91dae0f7fd6d00d.pack
├─packed-refs
└─refs
  ├─heads
  │ └─master
  ├─remotes
  │ └─origin
  │   └─HEAD
  └─tags

This looks much better. But there is more that we can do.

Next Steps

  1. Optimize the code by removing the array and adding strings
  2. Allow for flags to be used that are compatible with ls
  3. Deal with bad input
  4. Make it a user friendly bash script

References

  1. Understanding Bash Arrays
  2. Understanding how to make recursive function
  3. nik‘s answer to this stackoverflow question about piping output into a shell array.
  4. unicode-table.com for the unicode characters.
  5. LinuxJournal for the tips on shell arrays.
  6. dreamsyssoft for the tips on shell if else statements
  7. tzot‘s answer to this stackoverflow question. Got the typeset from him.
  8. tldp for the info on arrays and local variables
  9. Cyberciti for the info related to the for loop.

Ahmed Amayem has written 90 articles

A Web Application Developer Entrepreneur.