ShellTree 1: Analyzing a one Line Command Implementation

This post is part of an educational series on building a shell script to graphically display the structure of a directory.

While you are surfing your linux in command line, sometimes you want to be able to see the directory structure spread out in your terminal. I personally don’t know of any native command that allows you to do that, but luckily we can make such a command. Let’s discover how we can do this:

Setup

First let’s make a directory structure that we want to display:

[ahmed@amayem ~]$ mkdir test; cd test
[ahmed@amayem test]$ git init
Initialized empty Git repository in /home/ahmed/test/.git/

If you don’t want to make a git repository then feel free to manually make some folders and add some test files randomly to them. The purpose of this is just to test our system.

Dem Pilafian’s method

I found this method by Dem Pilafian on centerkey. Let’s give it a try:

[ahmed@amayem test]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /' -e 's/-/|/' 
.

Hmm it’s telling me that nothing is in the repo. But I think I know why, it’s because the git folder is actually hidden because it starts with a dot: .git.

[ahmed@amayem test]$ ls -A
.git

Let’s add the -A flag to our first command:

[ahmed@amayem test]$ ls -RA | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /' -e 's/-/|/' 
   .
   |-.git
   |---branches
   |---hooks
   |---info
   |---objects
   |-----info
   |-----pack
   |---refs
   |-----heads
   |-----tags

Great it worked. To avoid the -A option I can just enter the directory:

[ahmed@amayem test]$ cd .git/
[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /' -e 's/-/|/'
   .
   |-branches
   |-hooks
   |-info
   |-objects
   |---info
   |---pack
   |-refs
   |---heads
   |---tags

Let’s break it down to understand what is going on:

Breakdown

ls -R

ls is our usual command that lists for us what is in our current directory, and if you give it a directory as an argument it will list the contents of that directory. Let’s check the man page to see what it means:

-R, --recursive
    list subdirectories recursively

So let’s see what we get with that:

[ahmed@amayem .git]$ ls -R
.:
branches  config  description  HEAD  hooks  info  objects  refs

./branches:

./hooks:
applypatch-msg.sample  post-update.sample     pre-commit.sample          pre-rebase.sample
commit-msg.sample      pre-applypatch.sample  prepare-commit-msg.sample  update.sample

./info:
exclude

./objects:
info  pack

./objects/info:

./objects/pack:

./refs:
heads  tags

./refs/heads:

./refs/tags:

The first line says, .:. The dot stands for the current directory and the colon indicates that the next line will show the contents of that directory. After showing the contents, there is an empty line to indicate the end of the contents of that directory, then the next directory is shown, ./branches: and so on till all directories are listed. We are piping that output into grep ":$". Let’s see what that means:

grep “:$”

grep, “prints lines matching a pattern”, as we learn from grep‘s man page. So basically it is taking the output above and then printing out the lines that match the pattern ":$". What does the $ mean? The man page sheds some light:

Anchoring
   The caret ^ and the dollar sign $ are meta-characters that respectively match the empty string at the beginning and end of a line.

So the pattern ":$" means lines ending with a colon. So that means that we are printing out the directory paths. Let’s give it a try:

[ahmed@amayem .git]$ ls -R | grep ":$"
.:
./branches:
./hooks:
./info:
./objects:
./objects/info:
./objects/pack:
./refs:
./refs/heads:
./refs/tags:

Looks like we have succeeded.

sed

sed is a, “stream editor for filtering and transforming text”, as mentioned in sed‘s man page. So it is editing the output that we gave it earlier. The man page gives us more details:

DESCRIPTION
    Sed is a stream editor. A stream editor is used to perform basic text transformations on an input stream (a file or input from a pipeline). While in some ways similar to an editor which permits scripted edits (such as ed), sed works by making only one pass over the input(s), and is consequently more efficient. But it is sedâ€™s ability to filter text in a pipeline which particularly distinguishes it from other types of editors.

Our sed arguments above were as follows:

sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /' -e 's/-/|/'

I see four -e options. The man page tells us the following:

-e script, --expression=script

    add the script to the commands to be executed

So there are four scripts that we are executing. Let’s go through them.

-e ‘s/:$//’

I recognize the :$ part from earlier as the pattern that matches to the end of the directory paths, as for the rest we have to check the man page again:

s/regexp/replacement/
    Attempt to match regexp against the pattern space.  If successful, replace that portion  matched  with  replacement.   The  replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes 1 through 9 to refer to the corresponding  matching  sub-expressions  in  the regexp.

So we are, essentially, deleting the colon. Let’s see what happens with it:

[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//'
.
./branches
./hooks
./info
./objects
./objects/info
./objects/pack
./refs
./refs/heads
./refs/tags

Looks like we were correct.

-e ‘s/[^-][^/]*//–/g’

We may be unsure what this new pattern, [^-][^/]*/ means. Let’s check grep‘s man page again to figure it out.

Character Classes and Bracket Expressions
    A bracket expression is a list of characters enclosed by [ and ]. It matches any single character in that list; if the first character of the list is the caret ^ then it matches any character not in the list. For example, the regular expression [0123456789] matches any single digit.

So the pattern is saying, match anything that doesn’t start with a - or a /. The backslash in / is to escape the slash, otherwise sed will think that it was the end of the pattern. The * is explained here:

Repetition
    A regular expression may be followed by one of several repetition operators:
    ?      The preceding item is optional and matched at most once.
    *      The preceding item will be matched zero or more times.
    +      The preceding item will be matched one or more times.
    {n}    The preceding item is matched exactly n times.
    {n,}   The preceding item is matched n or more times.
    {,m}   The preceding item is matched at most m times.
    {n,m}  The preceding item is matched at least n times, but not more than m times.

So we are looking for a string that does not start with a dash nor slash, and ends with a slash. Then we will replace it with --. What’s that g doing after the replacement?

g G    Copy/append hold space to pattern space.

It’s not clear what this does, so let’s try it without and with the option:

[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/'
.
--branches
--hooks
--info
--objects
--objects/info
--objects/pack
--refs
--refs/heads
--refs/tags
[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g'
.
--branches
--hooks
--info
--objects
----info
----pack
--refs
----heads
----tags

So the g tells sed to keep going after it has done its replacement. Pretty cool.

-e ‘s/^/ /’

This time the caret, ^, is not inside square brackets so it acts as an anchor to the beginning of the word. Check anchoring mentioned earlier. We are adding three spaces at the beginning of eachline:

[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /'
   .
   --branches
   --hooks
   --info
   --objects
   ----info
   ----pack
   --refs
   ----heads
   ----tags

-e ‘s/-/|/’

This one replaces the - with a |:

[ahmed@amayem .git]$ ls -R | grep ":$" | sed -e 's/:$//' -e 's/[^-][^/]*//--/g' -e 's/^/   /' -e 's/-/|/'
   .
   |-branches
   |-hooks
   |-info
   |-objects
   |---info
   |---pack
   |-refs
   |---heads
   |---tags

Notice that because there was no g flag, sed only replaced the first dash of each line.

Next steps

References

Dem Pilafian on centerkey