Back to table of contents

Compiling UNIX Software

Dennis Clark

Introduction

Wowsers!! You just found a beaut new program on the Net... but what's this? it's not compiled!!! ARRGGHHH!!!

So what's wrong with these UNIX32.1people?? Is distributing software that's already compiled against their religion or something? Do they think we're all C gurus, or even have the inclination to become one???

Well there are a few reasons why freeware/shareware UNIX software is distributed in source code (uncompiled) form:

  1. A pre-compiled program will only work for the platform it was compiled for. Since UNIX is available in dozens of different platforms and flavours, they would need to provide a number of different distributions for the same program. However with source code, the one distribution can work for all supported platforms, so it saves a lot of space on FTP sites.

  2. With a source distribution, you can decide what features you want compiled into the program, and which ones to leave out. You can also make other decisions that can only be made at compile time, such as where to look for special data files.

  3. With the source code available, any programmer can examine the code and fix bugs or make extensions to the original program, and in doing so help save the world32.2!

Note that nowadays you can sometimes find pre-compiled distributions of programs, and while this makes installation simpler, they may be unusable if you do not have permissions to install it in the required directories (this usually requires root access).

But you don't need to be a C wizard to compile UNIX software32.3 !! All it takes is some basic knowledge of UNIX --- if you know your way around a shell (such the C shell) and a text editor (such as vi), you know enough to compile source-code distributions. Most of the difficult work is usually done for you by utilities such as `configure' and `make'. Distributions also come with instructions which explain what you need to do step-by-step.

Note that this chapter assumes you have already downloaded the distribution and unpacked it. If you don't know how to find and download a source-code distribution, read the chapter on FTP. If you don't know how to unpack a distribution, read the File Compression chapter.

Rules of Thumb

There are a couple of rules of thumb I follow when compiling a UNIX distribution32.4 : you would be wise to follow them as well...

Rule 1: READ THE INSTRUCTIONS!!!

There is NO substitute to reading the documentation that comes with the distribution ...not even this tutorial! Even once you've become accustomed to the procedure, some distributions can differ from it in subtle ways that can only be discovered by reading the README or INSTALL files32.5.

So what's the point of this tutorial then32.6? Well much of it is really to help you understand what is going on when the instructions ask you to do certain things, such as running make install. The rest is to help you make informed decisions when the instructions offer you choices, as well as to let you in on some secrets (ooh-aah!) on things such as fine-tuning your installation.

Rule 2 : Be careful

It is better to comment something out rather than to change or delete it.

Much of the compile-time configuration of programs is done by editing certain files. These files will be filled with default settings, some of which you may have to change.

Rather than changing or deleting the default value for a setting, it is better to comment it out, and add a replacement setting next to it. That way if you find later on that the choice you made with that setting was wrong, it is much easier to revert it back to its original value.

Being Prepared

Before you can start compiling, you need to know some basics about the system you are compiling the program for. You also need to make some decisions, such as which C compiler to use, and which directories to install the program and its files.

Know your UNIX

Many people32.7speak of UNIX as if it were a single clearly-defined operating system. In fact it isn't. The term UNIX32.8 these days collectively refers to a number of different operating systems which share some common features and interoperability32.9.

Briefly, the story goes32.10 that once there was Only One UNIX. Then there were two. These two were called System V (from ATT) and BSD (from the University of California, Berkeley). These then spawned other versions, and so on...and now we have a whole lot of UN*Xes which vary in different ways. Most however still show their origins as either from System V or from BSD.32.11

Now, for a program to compile correctly, it usually has to know which of these variants you are using. It is also important to know if the one you are using is based on System V or BSD, as sometimes that is all that the program needs to know. Here is a list of UN*Xes that you are likely to come across:

SunOS (version 4.1.x)
This was for many years the native operating system for Sun's computers. Sun has now ditched it in favour of a brand-new one called Solaris (see below). We at ProgSoc still use it on our machines: both ftoomsh and orgo run SunOS 4.1.3. SunOS is based on BSD UNIX.

Solaris (version 2.x)
Sun's new operating system, based on (you guessed it) System V. This means that doing things such as compiling is quite different on Solaris than it is on SunOS32.12. The Faculty of IT runs Solaris on its UNIX computers.

FreeBSD
This is a free operating system for 32-bit Intel PCs. As the name suggests, it is based on the source code for BSD UNIX.

Linux
Linux is a public domain operating system for 32-bit Intel PCs. As opposed to most UNIX variants, this one was not based on System V or BSD, but was written from scratch. As a consequence, if a program does not expicitly state support for Linux, it usually means that you need to obtain a "patch" to modify the program for Linux. See also the chapter on Linux. Most of the ProgSoc machines run Linux.

Choosing a C compiler

Just as not all UN*Xs are the same, neither are all C compilers. And just as the variants of UNIX come in 2 main flavours, so too does the C language come in 2 main flavours. ``Traditional'' or ``KR'' C was the original form of C that was developed along with the original UNIX. Thus, most of the older UNIX variants still use it as their ``standard'' C. There is a newer, improved and standardised version of C called ANSI C, which is the standard C for newer UN*Xs such as Solaris.

You will need to select one of the following 3 C compilers to use to create one of your programs (you probably won't have all 3 available on your system to choose from, however).

cc
This is the generic name for the KR C compiler that comes with UN*Xes that standardise on KR C. It's sufficient for most programs, but the quality of the code they produce is usually not as good as gcc (especially when using optimisation, see later).

gcc
The GNU C compiler, from the Free Software Foundation. gcc makes a good choice of compiler because it has saner behaviour than most C compilers due to the many programmers in the Internet community that continually work on improving it. It normally compiles its own version of C (which is a superset of ANSI C), but can compile KR C if it is given the `-traditional' option.

acc
This is the generic name for the ANSI C compiler that comes with some operating systems that have switched to ANSI C32.13.

Choosing a home for your program

A program will usually need to put files in a number of different directories for different purposes. Usually the directories used by any one program are all subdirectories of the one root directory. The typical root directory for add-on programs is /usr/local, so such a program may have files in the following directories:

    /usr/local/bin
    /usr/local/lib
    /usr/local/man
    ... etc.
If you don't have root access, you probably won't be able to install files under /usr/local. In this case, a good second choice is your own HOME directory.

Here are the directories just about every program uses to install files in:

bin
Directory for executable files, both binary programs as well as scripts. Remember to include this directory in your $PATH to be able to use the programs in it (see the chapter on shells for details)

lib
Libraries and other (mostly static) data files used by programs

man
On-line manual pages, readable with the man command. You need to include the directory in your $MANPATH to make use of it.
Here are some other directories which you may also have under your root installation directory, and may prove to be useful:
etc
Files and utilities for administring other programs.
info
compiled texinfo files, readable with the info command. This is a form of hypertext on-line help which offers an alternative to man pages (you may still want to install both). Include the directory in $INFOPATH to use it.
scripts
If the number of programs you install becomes large, you may which to separate the scripts from the binary executable programs. This also helps with porting and updating, as only the binary executables need to be changed, but not the scripts.

Autoconfiguration

Some programs may need detailed information of your system in order to compile correctly. If you had to supply all that information yourself you could be in a LOT of trouble!!! Fortunately there are ways for distributions to retrieve much of this information for itself --- if you're lucky, your distribution is one of them! Don't expect it to do all the work for you...you will still need to go over the header files and Makefiles it generates.

configure

Configure scripts are shell scripts included in some source distributions. Their job is to run tests on your system to find which UNIX features are supported by your system, and which are not. It then uses this information to generate a Makefile (and sometimes a header file as well).

Running the configure script is easy. Simply go to the top directory of the distribution and type:

% ./configure
Note the `./' in front of the configure command. This is to ensure that you run the script in the current directory, and not some other configure command that may be in some other directory in your $PATH.

Some configure scripts accept certain options: the README files will tell you if this is the case, and which options are accepted.

Configure scripts tell you what tests they are performing as they run. Some scripts ask you to confirm the facts they discover or the choices they make. If you find that you don't understand what they're asking, just accept the default answer and usually it will work.

xmkmf

Programs which use the X Windowing System, in addition to needing to know which UNIX features are available, also need to know certain facts about the X installation. These include such things as the locations of X header and library files, programs, fonts, and the font file format used.

Rather than having each X program needing its own configure script to search for this information, the X Windowing System takes a different approach. X installations come with a program called xmkmf, which stands for `X-MaKe-MakeFile'. This program knows the answers to all the typical questions asked by X program distributions. It reads a Makefile template which contains slots where these answers need to be filled in, and from it generates a Makefile. Neat, eh?

Running xmkf is even easier than running configure (remember to be in the top distribution directory when running this):

% xmkmf
Unlike configure, xmkmf needs no options and asks no questions...what could be easier?

Configuration Header Files

C source files vary the code that gets compiled (sometimes called `conditional compilation') with things called #defines. These #defines can be passed to the compiler as options on the command line, but sometimes there are just too many of them!!! In these cases, they are placed in a C header file, typically under a name such as config.h.

To continue configuring the program, you may need to edit such a header file and modify the #defines.

Hold on!!! Didn't I say you didn't have to know C to do any of this?? OK, so I lied :-). But before you start having a heart attack, take a look at a couple of examples.

Here are a couple of #defines from a typical config.h:

	#define X11_GRAPHICS		/* X11 interface */
	#define LOGFILE	"logfile"	/* for debugging purposes */
The first line tells the compiler to compile the code for X11 support, and the second line tells the compiler that the name of the file for logging is "logfile".

Now lets say that we want to disable X11 support, and change the name of the log file.

To disable the first line, we comment it out with a C comment. C comments begin with the characters /* and end with the characters */. Since there is already a */ at the end of the line, we just put a /* at the front.

For the second line, we simply change the word in quotes following the define name (LOGFILE). Our 2 lines now look like this:

	/* #define X11_GRAPHICS		/* X11 interface */
	#define LOGFILE	"blah"		/* for debugging purposes */
If you want to comment out a number of consecutive #defines, you need to comment out each line individually. Putting a /* before the first line and a */ after the last won't work.

Also, sometimes a #define won't have a comment after it, in which case you'll have to add the */ at the end yourself. For example, to disable the following line:

	#define MSDOS
change it to this:
/* #define MSDOS */
There! That wasn't so hard now, was it?

One final point: you may cause problems if you make changes to a C header file after you have compiled some or all of the source code. To avoid this, delete all the object files created from the previous compilation32.14, and restart the compilation from scratch.

make and Makefiles

Here we come to the ``big magic'' of compiling and installing source distributions: the Makefile.

So what is make?

One of UNIX's basic philosophies is of having lots of small programs, each of which does a simple task very well, and providing ways of using them together to achieve the desired result.

You will already have seen this in the shell, when using pipes to send the output of one program to another, as in this example:

	% expand file.txt | fold -78 | pr -l64 | lpr
A similar situation arises when compiling programs. There are a number of programs in UNIX to assist in the compilation process: the compiler, the linker, and the assembler are some commonly-used tools32.15

To ``glue'' these programs together in the right way so that they can build a program, another program is needed which can determine what files need to be compiled or generated, how to generate them, and in which order to generate them in.

This is where make steps in. Make is a program which builds targets (usually a file such as a program) by following rules from a Makefile. A Makefile defines which files are dependant on which other files, and what commands to run to generate them. It uses modification dates to determine which files need to be re-generated so it can avoid repeating operations unnecessarily.

Here is a sample Makefile rule, for the curious amongst you:

widget:	tic.o tac.o toe.o
    cc -o widget tic.o tac.o toe.o
This rule states that the target `widget' is dependent on the files tic.o, tac.o, and toe.o, and to build the `widget' from these files, the command cc -o widget tic.o tac.o toe.o must be executed.

Make rules can get much more complex than this example, with things such as special macros and implicit rules. But this needn't worry you, as you don't need to write or modify any rules to configure a Makefile!

To build a target, all you have to do is run make with the target name32.16:

% make widget
Make tells you what it is doing by echoing each command it runs to the screen. It also tells you if it has nothing to do (``target is up to date''), or if a command it ran failed (which normally halts the make process).

Editing Makefiles

Like shell scripts, Makefiles use variables, and it is these variables which allow them to be customised.

In fact, variable assignments in Makefiles look a lot like those in Bourne shell scripts:

	VARIABLE = value
Some times assignments can get very long, so to split them across multiple lines, backslashes () can be used as line continuation characters:
FIRSTTWENTY = one two three four five six seven eight \
	    nine ten eleven twelve thirteen fourteen fifteen \
	    sixteen seventeen eighteen nineteen twenty
They also share the same method for including comments: everything past the first # on a line is considered a comment, so to comment out a variable assignment, simply put a # in front of it:
	# VARIABLE = value
If the assignment takes up multiple lines, remember to comment out the other lines too.

Common make variables

These are some configurable make variables which commonly appear in Makefiles:

DESTDIR
prefix
Remember when I told you earlier to choose a root directory for the files for your program? Well this is where you specify it. The exact name of the variable can vary depending on where it came from (the person who wrote it, or the configure script which generated it), but you should recognise it once you see it.

BINDIR
MANDIR
These variables specify the specific directories for installing files. Note that these should default to the relevant subdirectory under the install root directory, for example:
	    BINDIR = $(prefix)/bin
	    MANDIR = $(prefix)/man
[Note how the prefix make variable is evaluated by writing it as $(prefix).]

CC
The name of the C compiler you wish to use: cc, gcc or acc.

CFLAGS
The options to be passed to the C compiler for compiling. These can include debugging or optimisation options: see the next section for more details.

CPPFLAGS
DEFINES
CPPFLAGS defines the options to be passed to the C compiler for preprocessing. These include any #defines that are to be passed via the command line. Sometimes the #defines are separated into a DEFINES variable as in the following:
	    DEFINES = -DX11_GRAPHICS -DLOGFILE=\"blah\"
	    CPPFLAGS = -I./include $(DEFINES)
	
(Note the backslashes in front of each of the double-quotes. This is to stop the shell which runs the commands on make's behalf from interpreting the double-quotes. Read the shell chapter for more details).

LDFLAGS
LIBS
LDFLAGS defines the options to be passed to the C compiler for linking. These include any special libraries which must be linked to the program. Sometimes these libraries are separated into a LIBS variable as in the following:
	   LIBS = -lX11 -lXaw -lm
	   LDFLAGS = -L/usr/local/X11/lib $(LIBS)
	

Compiling the program

If you have a Makefile, the program is usually the default target in it. This means that to compile it, all you need do is type ``make''.

This should compile the program but not install it anywhere.

Your distribution make come with some support programs and utilities along with the main program: if you have a Makefile, you can usually compile all of them by running:

	% make all
With some very simple programs, there may be only once C file to compile and no Makefile. In this case, you'll need to run the compiler manually. Assuming the program name is `widget' and the C file is widget.c, you should be able to compile it by running:
	% cc -o widget widget.c
You may however wish to include extra options for optimisation and/or debugging.

Optimisation

You can tell the compiler to produce optimised code by passing it the -O option. If you are using make, add it to the CFLAGS variable:

	CFLAGS = -O
If you are running the compiler manually, add it to the command line:
% cc -O -o widget widget.c
Optimised code executes faster than unoptimised code, and usually is noticably smaller. Unfortunately, optimisation algorithms in C compilers involve heavy wizardry32.17 and are thus the most likely place for bugs to arise. Some programs are more vulnerable to buggy optimisation than others: the README files will tell you if your program is sensitive to the -O option.

Note that the quality of the C compiler also affects how safe it is to use -O option: gcc is generally more trustworthy than cc (or even acc), but this depends on what version of gcc you are using32.18.

If you are feeling really confident, you can try using the -O2 option which does even MORE optimisations to the generated code.

Debugging

Another option that can be employed in the same way as -O is the -g option. This tells the compiler to include information in the compiled code which can be used by debuggers for tracing through the execution of the program. Unless you are a C guru, you are unlikely to make use of this yourself. On the other hand, if the Makefile includes the -g option by default, there is little harm in leaving it in unless you are short of disk space (it doesn't slow the program's execution down any).

As an aside, most compilers cannot use the -g and -O options together. Gcc is an exception to this rule: however strange things have been known to happen with optimised programs in debuggers - caveat programmer.

C Warnings

Sometimes in the middle of a compilation you may get warning lines which look something like the following (the actual warning may vary):

	widget.c: line 56: warning: pointer assigned to integer without cast
If you get this, don't panic!! This is usually a sign of sloppiness on the programmer's part, rather than something going wrong in the compilation. Let it keep going, and if the compilation finishes then the warning was a false alarm...hopefully32.19 ;)

C Errors

C errors are a different story: they're a bit hard to ignore even if you wanted to, because it stops the compilation from continuing!

You are venturing into treacherous territory here: it can get difficult to proceed without some good knowledge of C (to decode the sometimes very cryptic error messages the compiler gives you).

  1. Try re-reading the instructions. You may have skipped or misread a step, or simply have made a mistake. Otherwise look for a FAQ or troubleshooting guide in the documentation, and see if you can find a mention of your specific problem.

  2. Try to find somebody who knows more C than you do: maybe they can help decipher the error messages and fix the problem for you.

  3. If you don't know any C gurus personally, there's a whole dump-truck full of them right under your very nose: the ProgSoc mailing list!! Remember that the harder the problem, the better us ProgSoc code-hackers like it. On the other hand, if it becomes obvious that you simply haven't read the instructions, expect some taunts and shouts of "RTFM!" at the very least.

Testing the program

Now, assuming that make successfully compiled your program(s), you will want to test it to make sure it works before continuing with the installation.

Some Makefiles contain a test suite to verify that the program works correctly (the README docs will tell you if this is so). Run these tests with:

	% make test
This will run tests on the program similar to the way configure runs tests on your system...except that failure of a test means there's something wrong with the program. You will be told at the very end if all the tests passed or not (in case you missed seeing the rest of the output).

If it DOESN'T have a test suite, you'll have to run the program yourself. Make sure you know how to operate the program (read the documentation otherwise), and run:

	% ./program [arguments..]
In other words, run it as you would normally, except put ./ in front of the program name as you did with configure, so you ensure you are running your newly-compiled version of the program and not one that may already be installed and in your $PATH.

Unfortunately some programs may not work unless installed: they will compain ``cannot find file'', where <file> is in one of the installation directories you specified earlier, and is copied as part of the installation process. In this case, you'll have to install the program before you can test it.

Installing the program and related files

If you have a Makefile, installing the program and its related files is as easy as running:

        % make install
This will copy your compiled program(s) and files to the right directories and with the right permisions.

If you are installing the program in your own system (i.e. you have root access and the freedom to use it as you wish), this step is the only one you need to perform as root. All other steps can be done in your ordinary user account32.20.

Now is the time to try out your newly-installed program under normal operating conditions. HOWEVER, Some shells such as csh or tcsh32.21 build a hash table to remember where commands are in their $PATH without having to search all the directories. If you add a new program to one of these directories in the middle of your shell session, it won't find it. To tell csh or tcsh to rebuild this hash table so it can find the new program, run the rehash built-in command.

You should also test to see if man can find and display your program's manpage.

Cleaning up after yourself

Now that you've finished installing, you may want to do something about the source files and associated junk you created whilst compiling your program.

If you have a Makefile, you can delete the larger, more annoying files such as object (.o) files and core dumps but leave the source files intact with:

	% make clean
You may need to use make clean if you want to recompile after editing config.h, or parts of Makefile which affect the way the program is compiled. Otherwise you may have part of your program compiled for one configuration, and part compiled for another.

make clean might not delete some of the other files created, such as files produced by yacc or configure. There is often another make target for a ``cleaner clean'', under a name such as distclean or spotless, which will delete all these files as well, and (hopefully!) leave you with only the files you started with when you unpacked the distribution.

It is more likely however that you won't want anything to do with the source code after installing the program --- in this case, you may as well delete the distribution directory and everything in it.

Fine-tuning your installation

There are a couple of things you may want to do some time after installing. They don't significantly affect your installations, but can prove quite useful.

Reducing your program's size with strip

If you're getting very short of disk space, you can reduce the size of your programs by stripping out their symbol tables. This is done with the `strip' command. strip usually reduces your program's size by 20-50%. Here's an example of its use:

ftoomsh% ls -sl expect
640 -rwxr-xr-x  1 dbugger    647168 Feb 17 18:39 expect
ftoomsh% strip expect
ftoomsh% ls -sl expect
 360 -rwxr-xr-x  1 dbugger    360448 Feb 17 18:40 expect
Note that some Makefiles strip your program for you when installing---in these cases, strip will obviously have no effect. Also, strip does its job by removing unnecessary stuff from files, and debugging information is the first to go. Debugging a stripped program is not pretty.

Man page searching and catman

Ever needed a program in UNIX to do a specific task, but didn't know which program could do it? Or you knew of a program which could do it, but had forgotten the program's name? With a plethora of cryptically-named programs, this problem happens often in UNIX.

One way to find such programs is by searching the man whatis database. This database lists the title lines of man pages, where a short description of the program32.22 is given.

For example, say we want a program which will split long lines in a file into separate lines. To do this, we could search for ``lines'' in the whatis database like this:

ftoomsh% man -k lines
comm (1)                - display lines in common, and lines not in
				common, between two sorted lists
error (1)               - categorize compiler error messages, insert
				at responsible source file lines
fold (1)                - fold long lines for display on an output
				device of a given width
head (1)                - display first few lines of specified files
look (1)                - find words in the system dictionary or lines
				in a sorted list
paste (1V)              - join corresponding lines of several files, or
				subsequent lines of one file
random (6)              - select lines randomly from a file
sort (1V)               - sort and collate lines
textedit_filters, align_equals, capitalize, insert_brackets, remove_brackets,
shift_lines (1)         - filters provided with textedit(1)
unifdef (1)             - resolve and remove ifdef'ed lines from cpp input
uniq (1)                - remove or report adjacent duplicate lines
wc (1)                  - display a count of lines, words and characters
/usr/local/lib/perl5/man/whatis: No such file or directory
/home/dbugger/man/whatis: No such file or directory
/usr/local/man/whatis: No such file or directory
ftoomsh%
Looking at the results, it looks like fold does what we want. However notice the ``No such file or directory'' messages at the end. This means that the whatis database has not been built for these man directories. You can build or rebuild the whatis database for your man directories with the `catman' command, like this32.23 :
% /usr/etc/catman -w -M /home/dbugger/man
man -k will now be able to search man pages in the /home/dbugger/man directory in subsequent searches.

Help Save The World!

Well now that you've gone to all that trouble of installing that beaut new program, you're not going to keep it all to yourself now surely? Tell your friends! Tell the world!!! Let everyone make use of it! Otherwise we end up with half a dozen people with the same program installed in their accounts --- quite a waste really.

Back to table of contents