I just had the following conversation with my linux desktop:

Me: “Hi, I’d like to use my new printer please.”

Computer: “Do you mean this HP Laserjet CP1515n on the network?”

Me: “Erm, yes I do.”

Computer: “Good. You’ve got a test page printing as we speak.
Anything else I can help you with?”

Sadly I don’t have any alternative modern operating systems to
compare it to, but having done similar things with linux over the last
12 years, I’m impressed with how far we’ve come. Thank you to everyone
who made this possible.

This entry was originally posted in slightly different form to Server Fault

If you’re coming from a Windows world, you’re used to using tools like zip or
rar, which compress collections of files. In the typical Unix tradition of
doing one thing and doing one thing well, you tend to have two different
utilities; a compression tool and a archive format. People then use these two
tools together to give the same functionality that zip or rar provide.

There are numerous different compression formats; the common ones used on Linux
these days are gzip (sometimes known as zlib) and the newer, higher performing
bzip2. Unfortunately bzip2 uses more CPU and memory to provide the higher rates
of compression. You can use these tools to compress any file and by convention
files compressed by either of these formats is .gz and .bz2. You can use gzip
and bzip2 to compress and gunzip and bunzip2 to decompress these formats.

There are also several different types of archive formats available, including
cpio, ar and tar, but people tend to only use tar. These allow you to take a
number of files and pack them into a single file. They can also include path
and permission information. You can create and unpack a tar file using the tar
command. You might hear these operations referred to as “tarring” and
“untarring”. (The name of the command comes from a shortening of Tape ARchive.
Tar was an improvement on the ar format in that you could use it to span
multiple physical tapes for backups).

# tar -cf archive.tar list of files to include

This will create (-c) and archive into a file -f called archive.tar. (.tar
is the convention extention for tar archives). You should now have a single
file that contains five files (“list”, “of”, “files”, “to” and “include”). If
you give tar a directory, it will recurse into that directory and store
everything inside it.

# tar -xf archive.tar
# tar -xf archive.tar list of files

This will extract (-x) the previously created archive.tar. You can extract just
the files you want from the archive by listing them on the end of the command
line. In our example, the second line would extract “list”, “of”, “file”, but
not “to” and “include”. You can also use

# tar -tf archive.tar

to get a list of the contents before you extract them.

So now you can combine these two tools to replication the functionality of zip:

# tar -cf archive.tar directory
# gzip archive.tar

You’ll now have an archive.tar.gz file. You can extract it using:

# gunzip archive.tar.gz
# tar -xf archive.tar

We can use pipes to save us having an intermediate archive.tar:

# tar -cf - directory | gzip > archive.tar.gz
# gunzip < archive.tar.gz | tar -xf -

You can use – with the -f option to specify stdin or stdout (tar knows which one based on context).

We can do slightly better, because, in a slight apparent breaking of the
“one job well” idea, tar has the ability to compress its output and decompress
its input by using the -z argument (I say apparent, because it still uses the
gzip and gunzip commandline behind the scenes)

# tar -czf archive.tar.gz directory
# tar -xzf archive.tar.gz

To use bzip2 instead of gzip, use bzip2, bunzip2 and -j instead of gzip, gunzip
and -z respectively (tar -cjf archive.tar.bz2). Some versions of tar can detect
a bzip2 file archive with you use -z and do the right thing, but it is probably
worth getting in the habit of being explicit.

More info:

While git is a completely distributed revision control system, sometimes
the lack of a central canonical repository can be annoying. For example,
you might want to make your repository published publically, so other
people can fork your code, or you might want all your developers to push
into (or have code pulled into) a central “golden” tree, that you then
use for automated building and continuous integration. This entry should
explain how to get this all working on Ubuntu 9.04 (Jaunty).

Gitosis is a very useful git repository manager, which adds support like
ACLs in pre-commits and gitweb and git-daemon management. While it’s
possible to set all these things up by hand, gitosis does everything for
you. It is nicely configured via git; to make configuration changes,
you push the config file changes into gitosis repository on the server.

Gitosis is available in Jaunty, but unfortunately there is a bug
in the version in Jaunty, which means it doesn’t work out of the box.
Fortunately there is a fixed version in jaunty-proposed that
fixes the main problem. This does mean that you need to add the
following to your sources.list:

deb http://gb.archive.ubuntu.com/ubuntu/ jaunty-proposed universe

Run apt-get update && apt-get install gitosis. You should
install 0.2+20080825-2ubuntu0.1 or later. There is another
small bug in the current version too, as a result of git removing the
git-$command scripts out of /usr/bin. Edit
/usr/share/python-support/gitosis/gitosis/templates/admin/hooks/post-update
and replace

git-update-server-info

with

git update-server-info

With these changes in place, we can now set up our gitosis
repository. On the server you are going to use to host your central
repositories, run:

sudo -H -u gitosis gitosis-init < id_rsa.pub

The id_rsa.pub file is a public ssh key. As I mentioned,
gitosis is managed over git, so you need an initial user to clone and
then push changes back into the gitosis repo, so make sure this key
belongs to a keypair you have available to the remote user you’re going
to configure gitosis.

Now, on your local computer, you can clone the gitosis-admin repo
using:

git clone gitosis@gitserver.example.com:gitosis-admin.git

If you look inside the gitosis-admin directory, you should
find a file called gitosis.conf and a directory called
keydir. The directory is where you can add ssh public keys for
your users. The file is the configuration file for gitosis.

[gitosis]
loglevel = INFO

[group gitosis-admin]
writable = gitosis-admin
members = david@david

[group developers]
members = david@david
writable = publicproject privateproject

[group contributors]
members = george@wilber
writable = publicproject

[repo publicproject]
daemon = yes
gitweb = yes

[repo privateproject]
daemon = no
gitweb = no

This sets up two repositories, called publicproject and
privateproject. It enables the public project to be available via the
git protocol and in gitweb if you have that installed. We also create
two groups, developers and contributors. David has access to both
projects, but George only has access to change the publicproject. David
can also modify the gitosis configuration. The users are the names of
ssh keys (the last part of the line in id_dsa.pub or id_rsa.pub).

Once you’ve changed this file, you can run git add
gitosis.conf
to add it to the commit, git commit -m "update
gitosis configuration
to commit it to your local repository, and
finally git push to push your commits back up into the central
repository. Gitosis should now update the configuration on the server to
match the config file.

One last thing to do is to enable git-daemon, so people can
anonymously clone your projects. Create /etc/event.d/git-daemon with the
following contents:

start on startup
stop on shutdown

exec /usr/bin/git daemon 
   --user=gitosis --group=gitosis 
   --user-path=public-git 
   --verbose 
   --syslog 
   --reuseaddr 
   --base-path=/srv/gitosis/repositories/
respawn

You can now start this using start git-daemon

So now, you need to start using your repository. You can either start
with an existing project or an empty directory. Start by running
git init and then git add $file to add each of the
files you want in your project, and finally git commit to
commit them to your local repository. The final task is to add a remote
repository and push your code into it.

git remote add origin gitosis@gitserver.example.com:privateproject.git
git push origin master:refs/heads/master

In future, you should be able to do git push to push your
changes back into the central repository. You can also clone a project
using git or ssh, providing you have access, using the following
commands. The first is for read-write access over ssh and the second
uses the git protocol for read-only access. The git protocol uses TCP
port 9418, so make sure that’s available externally, if you want the
world to be able to clone your repos.

git clone gitosis@gitserver.example.com:publicproject.git
git clone git://gitserver.example.com/publicproject.git

Setting up GitWeb is left as an exercise for the reader (and myself
because I am yet to attempt to set that up).

Sometimes you need to review what exactly doing an svn up
will do. Fortunately, you can do a couple of things to find out. The
first is use svn status -u to find out what files have
changed:

/etc/puppet/modules# svn stat -u
       *     1338   exim4_mailserver/files/exim_db.pl
       *     1338   dbplc/files/sort-dbfs.pl
       *     1338   dbplc/files
       *     1338   dbplc/manifests/portal.pp
M            1386   tomcat/files/server.xml
Status against revision:   1386

Here we can see that four files were changed since their current
revision of 1338. We can also see that tomcat/files/server.xml is up to
date against the repository, but has local modifications.

This is all well and good, but how do we know what the changes are?
Well, svn diff is our friend here. By comparing the checkout
against the repository, we can see what will be updated.

/etc/puppet/modules# svn diff dbplc/manifests/portal.pp -rBASE:HEAD
Index: dbplc/manifests/portal.pp
===================================================================
--- dbplc/manifests/portal.pp (working copy)
+++ dbplc/manifests/portal.pp (revision 1386)
@@ -22,6 +22,7 @@
       owner => "tomcat55",
       group => "adm",
       mode => 644,
+      require => Package["tomcat5.5"],
    }

    apache::config { "portal":

When you’re happy, you can run svn up as normal. I’ve just
used this process to sanity check our Puppet config before updating, as
it hasn’t been updated for a few days.

Recently, I’ve seen a lot of suggestions along the lines of:

find . -name foo -exec ls -l {} ;

This is incredibly inefficient, because for each and every matching
file, ls will be executed. Fortunately, we can do better by
using xargs. xargs takes a list of lines from stdin
and uses them to build up a command line. It takes special care to not
exceed the maximum command line length by splitting the input up into
multiple commands if it is needed. So, with that knowledge, we can
replace our original command with:

find . -name foo | xargs ls -l

There is one slight problem with this command; it isn’t space-safe.
xargs splits arguments on whitespace, so “file name” will be
incorrectly passed to the command as “file” “name”. Fortunately,
xargs has an option to delimit parameters by the null
character, and as our luck would have it, find has a suitable
option to produce output in this format. This means our command is
now:

find . -name foo -print0 | xargs -0 ls -l

mlocate can do something similar:

locate foo -0 | xargs -0 ls -l

GNU find has one more trick up its sleeve. It has a modified version
of -exec that will do the same thing as xargs, so we could have
written our original command as:

find . -name foo -exec ls -l {} +

Every process is sacred; every process is great. If a process is wasted, God gets quite
irate. Please make sure you try to use one of the latter forms and not the
first form, and make a happy deity. 🙂

When you want to copy files from one machine to another, you might
think about using scp to copy them. You might think about using
rsync. If, however, you’re
trying to copy a large amount of data between two machines, here’s a
better, quicker, way to do it is using netcat.

On the receiving machine, run:

# cd /dest/dir && nc -l -p 12345 | tar -xf -

On the sending machine you can now run:

# cd /src/dr && tar -xf - | nc -q 0 remote-server 12345

You should find that everything works nicely, and a lot quicker. If
bandwidth is more constrained than CPU, then you can add “z” or “j” to
the tar options (“tar -xzf -” etc) to compress the data before it sends
it over the network. If you’re on gigabit, I wouldn’t bother with the
compression. If it dies, you’ll have to start from the beginning, but
then you might find you can get away with using rsync if you’ve copied
enough. It’s also
worth pointing out that the recieving netcat will die as soon as the
connection closes, so you’ll need to restart it if you want to copy the
data again using this method.

It’s worth pointing out that this does not have the security that scp or
rsync-over-ssh has, so make sure you trust the end points and everything
in between if you don’t want anyone else to see the data.

Why not use scp? because it’s incredibly slow in comparison. God knows
what scp is doing, but it doesn’t copy data at wire speed. It aint the
encyption and decryption, because that’d just use CPU and when I’ve done
it it’s hasn’t been CPU bound. I can only assume that the scp process
has a lot of handshaking and ssh protocol overhead.

Why not rsync? Rsync doesn’t really buy you that much on the first copy.
It’s only the subsequent runs where rsync really shines. However, rsync
requires the source to send a complete file list to the destination
before it starts copying any data. If you’ve got a filesystem with a
large number of files, that’s an awfully large overhead, especially as
the destination host has to hold it in memory.

I recently had a failed drive in my RAID1 array. I’ve just installed
the replacement drive and thought I’d share the method.

Let’s look at the current situation:

root@ace:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda3[1]
      483403776 blocks [2/1] [_U]

md0 : active raid1 sda1[1]
      96256 blocks [2/1] [_U]

unused devices: <none>

So we can see we have two mirrored arrays with one drive missing in both.

Let’s see that we’ve recognised the second drive:

root@ace:~# dmesg | grep sd
[   21.465395] Driver 'sd' needs updating - please use bus_type methods
[   21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465496] sd 2:0:0:0: [sda] Write Protect is off
[   21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465571] sd 2:0:0:0: [sda] Write Protect is off
[   21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465590]  sda: sda1 sda2 sda3
[   21.487248] sd 2:0:0:0: [sda] Attached SCSI disk
[   21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487314] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487381] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487407]  sdb: unknown partition table
[   21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk
[   21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   21.793835] md: bind<sda1>
[   21.858027] md: bind<sda3>

So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let’s give it one the same as sda. The easiest way to do this is using
sfdisk:

root@ace:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an MSDOS signature
 /dev/sdb: unrecognised partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    192779     192717  fd  Linux RAID autodetect
/dev/sdb2        192780   9960299    9767520  82  Linux swap / Solaris
/dev/sdb3       9960300 976768064  966807765  fd  Linux RAID autodetect
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

If we check dmesg now to check it’s worked, we’ll see:

root@ace:~# dmesg | grep sd
...
[  224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  224.246322] sd 2:0:1:0: [sdb] Write Protect is off
[  224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  224.246686]  sdb: unknown partition table
[  227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  227.326504] sd 2:0:1:0: [sdb] Write Protect is off
[  227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  227.326708]  sdb: sdb1 sdb2 sdb3

So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:

root@ace:~# mdadm /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
root@ace:~# mdadm /dev/md1 --add /dev/sdb3
mdadm: added /dev/sdb3

Everything looks good. Let’s check dmesg:

[  323.941542] md: bind<sdb1>
[  324.038183] RAID1 conf printout:
[  324.038189]  --- wd:1 rd:2
[  324.038192]  disk 0, wo:1, o:1, dev:sdb1
[  324.038195]  disk 1, wo:0, o:1, dev:sda1
[  324.038300] md: recovery of RAID array md0
[  324.038303] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  324.038310] md: using 128k window, over a total of 96256 blocks.
[  325.417219] md: md0: recovery done.
[  325.453629] RAID1 conf printout:
[  325.453632]  --- wd:2 rd:2
[  325.453634]  disk 0, wo:0, o:1, dev:sdb1
[  325.453636]  disk 1, wo:0, o:1, dev:sda1
[  347.970105] md: bind<sdb3>
[  348.004566] RAID1 conf printout:
[  348.004571]  --- wd:1 rd:2
[  348.004573]  disk 0, wo:1, o:1, dev:sdb3
[  348.004574]  disk 1, wo:0, o:1, dev:sda3
[  348.004657] md: recovery of RAID array md1
[  348.004659] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  348.004664] md: using 128k window, over a total of 483403776 blocks.

Everything still looks good. Let’s sit back and watch it rebuild using the wonderfully useful watch command:

root@ace:~# watch -n 1 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[2] sda3[1]
      483403776 blocks [2/1] [_U]
      [=====>...............]  recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec

md0 : active raid1 sdb1[0] sda1[1]
      96256 blocks [2/2] [UU]

unused devices: <none>

The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you’ve installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup. I used this method when a new machine recent
didn’t have enough SATA power cables and had to wait for an adaptor to
be delivered.

(Why did no one tell me about watch until recently. I wonder
how many more incredibly useful programs I’ve not discovered even after 10
years of using Linux
)