This entry was originally posted in slightly different form to Server Fault

If you’re coming from a Windows world, you’re used to using tools like zip or
rar, which compress collections of files. In the typical Unix tradition of
doing one thing and doing one thing well, you tend to have two different
utilities; a compression tool and a archive format. People then use these two
tools together to give the same functionality that zip or rar provide.

There are numerous different compression formats; the common ones used on Linux
these days are gzip (sometimes known as zlib) and the newer, higher performing
bzip2. Unfortunately bzip2 uses more CPU and memory to provide the higher rates
of compression. You can use these tools to compress any file and by convention
files compressed by either of these formats is .gz and .bz2. You can use gzip
and bzip2 to compress and gunzip and bunzip2 to decompress these formats.

There are also several different types of archive formats available, including
cpio, ar and tar, but people tend to only use tar. These allow you to take a
number of files and pack them into a single file. They can also include path
and permission information. You can create and unpack a tar file using the tar
command. You might hear these operations referred to as “tarring” and
“untarring”. (The name of the command comes from a shortening of Tape ARchive.
Tar was an improvement on the ar format in that you could use it to span
multiple physical tapes for backups).

# tar -cf archive.tar list of files to include

This will create (-c) and archive into a file -f called archive.tar. (.tar
is the convention extention for tar archives). You should now have a single
file that contains five files (“list”, “of”, “files”, “to” and “include”). If
you give tar a directory, it will recurse into that directory and store
everything inside it.

# tar -xf archive.tar
# tar -xf archive.tar list of files

This will extract (-x) the previously created archive.tar. You can extract just
the files you want from the archive by listing them on the end of the command
line. In our example, the second line would extract “list”, “of”, “file”, but
not “to” and “include”. You can also use

# tar -tf archive.tar

to get a list of the contents before you extract them.

So now you can combine these two tools to replication the functionality of zip:

# tar -cf archive.tar directory
# gzip archive.tar

You’ll now have an archive.tar.gz file. You can extract it using:

# gunzip archive.tar.gz
# tar -xf archive.tar

We can use pipes to save us having an intermediate archive.tar:

# tar -cf - directory | gzip > archive.tar.gz
# gunzip < archive.tar.gz | tar -xf -

You can use – with the -f option to specify stdin or stdout (tar knows which one based on context).

We can do slightly better, because, in a slight apparent breaking of the
“one job well” idea, tar has the ability to compress its output and decompress
its input by using the -z argument (I say apparent, because it still uses the
gzip and gunzip commandline behind the scenes)

# tar -czf archive.tar.gz directory
# tar -xzf archive.tar.gz

To use bzip2 instead of gzip, use bzip2, bunzip2 and -j instead of gzip, gunzip
and -z respectively (tar -cjf archive.tar.bz2). Some versions of tar can detect
a bzip2 file archive with you use -z and do the right thing, but it is probably
worth getting in the habit of being explicit.

More info:

This entry was originally posted in slightly different form to Server Fault

There are several ways to run Tomcat applications. You can either run
tomcat direcly on port 80, or you can put a webserver in front of tomcat and
proxy connections to it. I would highly recommend using Apache as a
front end. The main reason for this suggestion is that Apache is more
flexible than tomcat. Apache has many modules that would require you to
code support yourself in Tomcat. For example, while Tomcat can do gzip
compression, it’s a single switch; enabled or disabled. Sadly you can
not compress CSS or javascript for Internet Explorer 6. This is easy to
support in Apache, but impossible to do in Tomcat. Things like caching
are also easier to do in Apache.

Having decided to use Apache to front Tomcat, you need to decide how
to connect them. There are several choices: mod_proxy ( more accurately, mod_proxy_http in
Apache 2.2, but I’ll refer to this as mod_proxy), mod_jk and mod_jk2.
Mod_jk2 is not under active development and should not be used. This
leaves us with mod_proxy or mod_jk.

Both methods forward requests from apache to tomcat. mod_proxy uses the HTTP
that we all know an love. mod_jk uses a binary protocol AJP. The main
advantages of mod_jk are:

  • AJP is a binary protocol, so is slightly quicker for both ends to deal with and
    uses slightly less overhead compared to HTTP, but this is minimal.
  • AJP
    includes information like original host name, the remote host and the SSL
    connection. This means that ServletRequest.isSecure() works as expected, and
    that you know who is connecting to you and allows you to do some sort of
    virtualhosting in your code.

A slight disadvantage is that AJP is based on
fixed sized chunks, and can break with long headers, particularly request URLs
with long list of parameters, but you should rarely be in a position of having
8K of URL parameters. (It would suggest you were doing it wrong. 🙂 )

It used to be the case that mod_jk provided basic load balancing
between two tomcats, which mod_proxy couldn’t do, but with the new
mod_proxy_balancer in Apache 2.2, this is no longer a reason to choose between them.

The position is slightly complicated by the existence of mod_proxy_ajp. Between
them, mod_jk is the more mature of the two, but mod_proxy_ajp works in the same
framework as the other mod_proxy modules. I have not yet used mod_proxy_ajp,
but would consider doing so in the future, as mod_proxy_ajp is part of
Apche and mod_jk involves additional configuration outside of Apache.

Given a choice, I would prefer a AJP based connector, mostly due to my second
stated advantage, more than the performance aspect. Of course, if your
application vendor doesn’t support anything other than mod_proxy_http, that
does tie your hands somewhat.

You could use an alternative webserver like lighttpd, which does have
an AJP module. Sadly, my prefered lightweight HTTP server, nginx, does
not support AJP and is unlike ever to do so, due to the design of its
proxying system.