| You are the 1289106th visitor to this page since March 20, 2001. |
|---|
This is a document that covers some issues regarding shell script programming. Note that this page is still under construction. The intension is that is should be possible to use it as a WWW text for "advanced" shell programming, but right now I am just collecting stuff.
Note! I use Bourne shell or derivatives thereof, like BASH. Therefore the scripts contained herein is written for Bourne shell (usually found underThis document is structured as follows./bin/sh), unless said otherwise.Also not that this is work in progress. Hence some of the descriptions might be bad, some might be confusing and yet some may be missing. If that is the case, please send me a note about it.
man
Modern computers are a little more complex than that. You there have a complete environment where you can execute your programs and even have such astonishing things as interactive programs (hear-hear). It is no longer enough to be able to load your program and just print the result. You also need support to reformat the results, process them in other manners (maybe printing a nice diagram) and store them in a database. It would of course be possible to write specially designed programs that formatted the output of your programs according to your wishes, but the number of specialized programs would quickly increase, leaving your computer loaded with "might come in handy" programs.
A better approach would be to have a small set of processing programs together with a program made to "glue the parts together." On a UNIX system such a program is called the shell (in contrast with the core that contains time-sharing code, file access code and other system oriented code). The shell is used to issue commands, start processes, control jobs, redirect input and output, and other mundane things that you do on a modern computer. Not only that, the shell is a pretty complete programming language.
In this paper we will introduce concepts and methods that a good shell-programmer can use to get the most out of his/her UNIX system. We will start from the beginning, but a basic familiarity with programming and/or the basic principles of computers will be assumed. This is not an paper for the complete novice, although you are welcome to read the paper.
A simple command is a sequence of non-blank words separated by blanks. The first word specifies the name of the command to be executed. Except as specified below, the remaining words are passed as arguments to the invoked command. The command name is passed as argument 0 (see execve(2V)).To execute a simple command you simply type it to the shell. For example, to execute the simple command
ls you do the
following.
In this example,$ ls Mail doc lib public_html News emacs man bin outgoing incoming tmp
$ is the prompt, i.e. the text
the computer prints to tell you that it is ready to accept
commands. ls is the command you type to the computer and
the remaining lines are the result of the ls command.
We will in the sequel print the text that the computer prints in
'Courier' and the text that you type to the computer in
'boldface Courier'.If you for some reason lack a prompt, you will not be able to give commands to the shell. There are many possible reasons to why you may not have a prompt.
There are many more simple commands that are useful. Some examples
are: man, cat, echo, and
rm.
The man command is one of the more useful. It is used to
get a manual of a command, i.e. a description of what the
command does, and possible variations of its use.
To get information on how to use a command, you just type
man followed by the command you want information on,
e.g. to get information on rm you type
$ man rm
and the result will be a manual of how the command rm
works and what it does.
Exercise. Try the above command. What does the
rm command do? How should you use it? Give at least 2
examples of use.
Buildin simple commands.
There is a special set of commands that are built into the shell. It
is entirely up the the type of shell that you are using. The more
common builtin commands are: echo, cd,
pwd, . (or alternatively
source), trap, return,
hash, eval, and kill.
HOME variable,
which contains the path to your home directory (where you end up when
you log in). If you type cd without any arguments, this
is the directory where you will end up.
To print the value of the variable HOME, you can write
the following:
$ echo $HOME
/users/matkin
PATH variable
When you type a non-builtin command to the shell, the shell searches
for a program to execute.
The programs are simply executable files somewhere in the file system;
they are executable either because they are compiled programs (written
in C, C++, Pascal, Ada, or some other language) or because they are
scripts that may be executed.
Since we don't want to go through all files in the file system (on my
account alone, I have approximately 3700 files), we have a
path of directories where the program may be stored.
This path is given to the shell as a colon separated list of
directories stored in the environment variable PATH.
To change the directories where your shell should look, just alter the
value of PATH.
Example: If your path contains
`/usr/bin:/bin:/usr/local/bin' you may extend the path
with /home/matkin/bin (which could be the place where you
put your own scripts) by writing
$ PATH=/home/matkin/bin:$PATH
The effect of this only remains as long as you are logged in. If you
log out, your changes will be undone since every time you start a new
shell: each shell starts with a fresh set of variables. To set the
path each time you log in, you have to add a
line to the startup file.
Note! It is very common to put . in your
path, either at the beginning or at the end. This is highly
insecure and you should never do that. There are
some common traps that can be used to
crack an account which have `.' in their path.
HOME
cd without any arguments.
PS1
$ '. This is what the computer prints
whenever it is ready to process another command.
PS2
> '. The shell prints this prompt whenever
you have type an incomplete line, e.g., if you are missing a
closing quote. For example:
$ echo 'hello
> world'
hello
world
SHELL
SHELL variables is usually set to
/bin/sh.
IFS
The shell will then see that$ ls $HOME
$HOME is a variable (also
called parameter) and replace it with its value. In my case,
$HOME will be replaced with
/home/matkin. The line that the shell sees is therefore
Observe that no more variable expansion takes place after this initial variable expansion. The meaning is that if$ ls /home/matkin
$HOME happend
to have the value '$PS1', the line above would try to
find a file with the name '$PS1'.
Here a lot of text is missing. I'll fill it in as I go along.
foo.C,
bar.C.gz, etc. and want to rename them to
foo.cc, bar.cc.gz, etc. This line will do
the trick.
\ls *.C* | sed 's/\(.*\).C\(.*\)/mv & \1.cc\2/' | sh
The backslash before the ls command is to prevent is from
being expanded, in the case that is is an alias and you are using
shell that has aliases (such as Bash).
We want to prevent the shell from doing this expansion since
ls might come out as ls -F (which would
behave strange) or ls -l which is really bad.
An alternative is to install the rename script, which is written in Perl.
Which version you use depend on what type of system you have. If you only want the first name (or only the surname) you can pipe the output throughypmatch matkin passwd | cut -d: -f5 | cut -d, -f1 grep "^matkin:" /etc/passwd | cut -d: -f5 | cut -d, -f1
cut -d' ' -f1 (or alternatively cut -d' '
-f2, if the second word is the surname).
This will kill any processes that has the word "sleep" in the calling command. If yourkill `ps xww | grep "sleep" | cut -c1-5` 2>/dev/null ps xww | grep "sleep" | cut -c1-5 | xargs kill 2>/dev/null
kill does not handle multiple pids' you can
either use the one-liner
ps xww | grep "sleep" | cut -c1-5 | xargs -i kill {} 2>/dev/null
or use a for-loop:
But then it is not a one-liner any more.for x in `ps xww | grep "sleep" | cut -c1-5` do kill $x 2>/dev/null done
Note!
Be very careful about what you use as
expression to grep. You might get more processes than
you wanted killed.
.c removed). If I
want to remove all the test programs in one go I type the
following line to my shell
This will remove all executable file with a correspondingfor x in *; do [ -x $x -a -f $x.c ] && echo $x; done | xargs rm -f
.c-file but keep all other executable files.
Note!
As always when you use a complex command, or a command with
wildcards in, together with rm, insert an
echo in front of the the rm to make sure
that the command does not do anything wierd.
*gif*.
( IFS=: ; for D in $PATH; do for F in $D/*gif*; do [ -x $F ] && echo
$F; done; done )
mail-list and want to verify them using the SMTP daemon
at kay.docs.uu.se (this is the computer I use), the
following line will do the job.
( sed 's/.*/VRFY <&>/' mail-list ; echo QUIT ) | socket -c kay smtp
shsh. One such example can be seen above, when we rewrite a line containing a file
name into a command involving the file name.
readread command.
Assume that, for some reason, you have a variable FOO
containing "foo is a bar". The following code can be used to put the
first word of the variable into the variable first, and
the rest into the variable rest:
echo $FOO | { read first rest ; echo "$first and $rest"; }
See below for a better example of when
this trick is useful.
IFS
IFS to split a line. It is worth noting that the
IFS is only regarded when using the read
command or the for control structure. As an example, here
is a script that works almost the same way as the which
command:
#!/bin/sh
IFS=:
for p in $PATH
do
if [ -x $p/$1 ]
then
echo $p/$1
return
fi
done
echo "No $1 in your path" 1>&2
return 1
; after the
assignment). This will result in the variable being set for that
command only, but keeping the old value (if it had one) after the
command has been executed. As an example, if you execute the code
MANPATH=/usr/man:/usr/local/man man test
will only look for the manual for test in directories
/usr/man and /usr/local/man. This regardless
of what value MANPATH had before the call.
read command to set variables to the supplied
values. This is most useful when you have a string that you need to
split at something else than whitespace.
Assume that you want to split an e-mail address into name and
domain address. We will also supply an extra Subject
line, to show that the IFS doesn't affect the second use
of read. We write the following script into the file
email:
The script is fed its input through standard input. Calling this script in the following manner (the#!/bin/sh IFS=@ read name address echo "A mail to $name at $address" read subject echo "Subject: $subject"
> and
$ are prompts)
to a shell, will produce the output$ email <<EOT > matkin@docs.uu.se > Something strange @ my place > EOT
A mail to matkin at docs.uu.se Subject: Something strange @ my place
As an example; assume that you want to go through all C files of a directory and, if they are readable to you, convert the filenames to contain uppercase letters only (this example may be a little contrived). We write two scripts that do this, but they do it in slightly different ways.
The first script calls tr inside the the for-loop:
and the second script calls#!/bin/sh for x in *.c do [ -r $x ] && echo $x | tr 'a-z' 'A-Z' done
tr outside the loop:
On this computer (a SPARCstation 10), the first script takes approximately 6.2 seconds to process 33 C files while the second takes approximately 0.7 seconds.#!/bin/sh for x in *.c do [ -r $x ] && echo $x done | tr 'a-z' 'A-Z'
$ rm #foo.c#
$ rm -foo
Following the above example you could try to type
$ rm '-foo'
but this will not work either.
How do you do to remove the file?
References: at(1), mail(1), sh(1)
Hint: Look up how the <<
redirection works in sh(1).
measure that will take as argument a file name
and write a line:
<file> <number> <average length>on standard output. Unfortunately this is not what you want.
The compiler is mounted on a computer that just went down so you can't recompile the program to print the data you want. You are very tired and don't want to wait around for the computer to restart itself. Each run takes a very long time and you don't want to spend your time watching it. Write a script that runs the programs, times them and reformats the output to:
<number> <user time + system time>and sorts in on increasing <number>. Also show what command you type before you log out and leave for a good nights rest.
References:
batch(1), time(1), calc(1), sh(1), sort(1), test(1), echo(1)
Hints: This is not an easy exercise.
which
does almost this work, but unfortunately it only prints the first
instance of the program it finds. What we want is a generalization of
the command described
above.
Let us instead try the following approach. Let ut call the program
whereare (in contrast with the program
whereis which searches for a program. As always when
writing programs, we first write down the specification. In this case
in the form of a manual.
Not too complicated. Well, the script appear as followsSYNOPSIS
wherearepattern . . .DESCRIPTION
wherearetakes a list of file name pattern and looks for executable files in the path that matches the file name pattern.EXAMPLES
The following code will look forispellorspell.The following command will look for a file matching the pattern$ whereare ispell spellfoo:*.txt.$ whereare 'foo:*.txt'ENVIRONMENT
PATH- The path
whereareuses to search for executable programs.
#!/bin/sh
for P in "$@"; do
IFS=:
for D in $PATH; do
for F in $D/$P; do
[ -x "$F" ] && echo $F
done
done
done
The outer loop will go through the supplied list of file name patterns
and the inner loop will, for each pattern, go through the path to see
what matches the file name pattern.
The inner-inner loop is used to expand the file name pattern. Remember
that if the pattern matches one or more files, this will result in a
list of file that matched. We then have go through each of the files
to see which of the files that were executable.A short example on how to use the program. Note that we have to surround the file patterns with single quotes ' to avoid the pattern from being expanded by the shell before it is sent to the script.
It seems to work quite ok.[ 17:42:56 ] @ Owein $ ./whereare '*mail*' /export/matkin/bin/mailserver /export/matkin/bin/mailto /export/matkin/bin/mailto-hebrew /export/matkin/bin/metamail /export/matkin/bin/patch-metamail /export/matkin/bin/splitmail /usr/ucb/mail /usr/sup/misc/bin/ml.mail /usr/openwin/bin/mailp /usr/openwin/bin/mailprint /usr/openwin/bin/mailtool /usr/bin/mail /usr/bin/mailcompat /usr/bin/mailq /usr/bin/mailstats /usr/bin/mailx /usr/bin/rmail
${foo:?well...} into ${foo?well...} if
you use the "old style" BSD shell, etc.)
Some of them are just hacks that I wrote to do something that I needed
to be done and some other are serious scripts intended for
distribution.
These scripts can be used as they are, serve as inspiration, or as
examples on ways to do things. I give no guarantee that they
are correct or even that they will work.
/bin/sh that will send mail
to users having more garbage than a predefined limit.
What it does is that it assumes that there is an executable
file named test_sort in the same directory and
that it accepts a number as its first (and only) argument. Then
number represents the size of the array to be sorted.
The script makes $COUNT measurements starting at
$START and increasing by $INC each
step. It then emits calc code to store the user time for each
execution in a matrix m and stores the supplied
number in the matrix t. Afterwards it generates
code for a matrix A representing the function
n + n * log(n) + 1 (i.e. without the constants
a, b, and c which we want to
compute). It then generates code to multiply A and
m with the transpose of A, thereby
projecting A and mto a 3-dimensional
space and solves the resulting linear equation.
You don't really have to know any linear algebra to use the script, just replace the command and the functions accordingly to make least-squares approximation to other linear functions.