<?xml version="1.0" encoding="utf-8"?>
<!-- name="generator" content="pyblosxom/1.4.3 01/10/2008" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91" xmlns:dc="http://purl.org/dc/elements/1.1/">
<channel>
<title>JD   </title>
<link>http://www.davidpashley.com/blog</link>
<description>The rants of the unimportant</description>
<language>en</language>
<item>
  <title>Multiple Crimes</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/multiple-crimes.html</link>
  <pubDate>Sat, 29 Jan 2011 13:35 GMT</pubDate>
  <dc:date>2011-01-29T13:35:20Z</dc:date>
  <description><![CDATA[
<pre>
mysql> select "a" = "A";
+-----------+
| "a" = "A" |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)
</pre>
<p>WTF? (via <a
href="https://doc.nuxeo.com/pages/viewpage.action?pageId=3343486">Nuxeo</a>)</p>


      <div><a href="http://www.davidpashley.com/blog/databases/mysql/multiple-crimes" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Letter to my MP regarding the Digital Economy Bill</title>
  <link>http://www.davidpashley.com/blog/politics/digital-economy-bill-letter.html</link>
  <pubDate>Wed, 17 Mar 2010 12:35 GMT</pubDate>
  <dc:date>2010-03-17T12:35:53Z</dc:date>
  <description><![CDATA[
<p>I have just sent the following email to my MP, David Lepper MP, outlining my concerns about the Digital Economy Bill. I urge you to write to your MP with a similar letter.</p>
<a href="http://wiki.openrightsgroup.org/wiki/How_To_Talk_To_Your_MP_Notes">Open Rights Group's guide to writing to your MP</a>
<pre>
From: David Pashley &lt;david@davidpashley.com&gt;
To: David Lepper
Cc: 
Bcc: 
Subject: Digital Economy Bill
Reply-To: 

Dear Mr Lepper, 

I'm writing to you so express my concern at the Digital Economy Bill
which is currently working its way through the House of Commons. I
believe that the bill as it stands will have a negative effect on
the digital economy that the UK and in particular Brighton have
worked so hard to foster. 

Section 4-17 deals with disconnecting people reported as infringing
copyright. As it stands, this section will result in the possibility
that my internet connection could be disconnected as a result of the
actions of my flatmate. My freelance web development business is
inherently linked to my access of the Internet. I currently allow my
landlady to share my internet access with her holiday flat above me.
I will have to stop this arrangement for fear of a tourist's actions
jeopardising my business. 

This section will also result in the many pubs and cafes, much
favoured by Brighton's freelancers, from removing their free wifi. I
have often used my local pub's wifi when I needed a change of
scenery. I know a great many freelancers use Cafe Delice in the
North Laine as a place to meet other freelancers and discuss
projects while drinking coffee and working.

Section 18 deals with ISPs being required to prevent access to sites
hosting copyrighted material. The ISPs can insist on a court
injunction forcing them to prevent access. Unfortunately, a great
many ISPs will not want to deal with the costs of any court
proceedings and will just block the site in question. A similar law
in the Unitied States, the Digital Millenium Copyright Act (DMCA)
has been abused time and time again by spurious copyright claims to
silence critics or embarrassments.  A recent case is Microsoft
shutting down the entire Cryptome.org website because they were
embarrassed by a document they had hosted.  There are many more
examples of abuse at http://www.chillingeffects.org/

A concern is that there's no requirement for the accuser to prove
infringement has occured, nor is there a valid defense that a user
has done everything possible to prevent infringement. 

There are several ways to reduce copyright infringement of music and
movies without introducing new legislation. The promotion of legal
services like iTunes and spotify, easier access to legal media, like
Digital Rights Management free music. Many of the record labels and
movie studios are failing to promote competing legal services which
many people would use if they were aware of them. A fairer
alternative to disconnection is a fine through the courts. 

You can find further information on the effects of the Digital
Economy Bill at http://www.openrightsgroup.org/ and
http://news.bbc.co.uk/1/hi/technology/8544935.stm

The bill has currently passed the House of Lords and its first
reading in the Commons. There is a danger that without MPs demanding
to scrutinise this bill, this damaging piece of legislation will be
rushed through Parliament before the general election.

I ask you to demand your right to debate this bill and to amend the
bill to remove sections 4-18. I would also appreciate a response to
this email. If you would like to discuss the issues I've raised
further, I can be contacted on 01273 xxxxxx or 07966 xxx xxx or via
email at this address.

Thank you for your time.

-- 
David Pashley
david@davidpashley.com
</pre>

      <div><a href="http://www.davidpashley.com/blog/politics/digital-economy-bill-letter" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Mod_fastcgi and external PHP</title>
  <link>http://www.davidpashley.com/blog/systems-administration/apache/mod_fastcgi.html</link>
  <pubDate>Sun, 07 Mar 2010 23:02 GMT</pubDate>
  <dc:date>2010-03-07T23:02:49Z</dc:date>
  <description><![CDATA[
<p>Has anyone managed to get a standard version of mod_fastcgi work
correctly with <tt>FastCGIExternalServer</tt>? There seems to be a
complete lack of documentation on how to get this to work. I have
managed to get it working by removing some code which appears to
completely break <tt>AddHandler</tt>. However, people on the FastCGI
list told me I was wrong for making it work. So, if anyone has managed
to get it to work, please show me some working config. </p>

      <div><a href="http://www.davidpashley.com/blog/systems-administration/apache/mod_fastcgi" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Reducing Coupling between modules</title>
  <link>http://www.davidpashley.com/blog/systems-administration/puppet/reducing-coupling.html</link>
  <pubDate>Thu, 25 Feb 2010 09:30 GMT</pubDate>
  <dc:date>2010-02-25T09:30:18Z</dc:date>
  <description><![CDATA[
<p>In the past, several of my <a
href="http://reductivelabs.com/products/puppet/">Puppet</a> modules have
been tightly coupled. A perfect example is <a href="http://httpd.apache.org">Apache</a> and <a href="http://munin.projects.linpro.no/">Munin</a>. When I
install Apache, I want munin graphs set up. As a result my apache class
has the following snippet in it:</p>
<pre>
munin::plugin { "apache_accesses": }
munin::plugin { "apache_processes": }
munin::plugin { "apache_volume": }
</pre>
<p>This should make sure that these three plugins are installed and that
munin-node is restarted to pick them up. The define was implemented like
this:</p>
<pre>
define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   include munin::node

   file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
   }
}
</pre>
<p>(Note: this is a slight simplification of the define). As you can
see, the define includes <tt>munin::node</tt>, as it needs the definition of the
munin-node service and package. As a result of this, installing Apache
drags in munin-node on your server too. It would be much nicer if the
apache class only installed the munin plugins if you also install munin
on the server.</p>

<p>It turns out that is is possible, using <a
href="http://reductivelabs.com/trac/puppet/wiki/VirtualResources">virtual
resources</a>. Virtual resources allow you to define resources in one
place, but not make them happen unless you realise them. Using this, we
can make the file resource in the <tt>munin::plugin</tt> virtual and realise it
in our <tt>munin::node</tt> class. Our new <tt>munin::plugin</tt> looks like:</p>
<pre>
define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   <b># removed "include munin::node"</b>

   <b># Added @ in front of the resource to declare it as virtual</b>
   <b>@</b>file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
      <b>tag => munin-plugin,</b>
   }
}
</pre>
<p>We add the following line to our <tt>munin::node</tt> class:</p>
<pre>
File&lt;| tag == munin-plugin |&gt;
</pre>
<p>The odd syntax in the <tt>munin::node</tt> class realises all the
virtual resources that match the filter, in this case, any that is
tagged <tt>munin-plugin</tt>. We've had to define this tag ourself, as
the auto-generated tags don't seem to work. You'll also notice that
we've removed the <tt>munin::node</tt> include from the
<tt>munin::plugin</tt> define, which means that we no longer install
munin-node just by using the plugin define. I've used a similar
technique for logcheck, so additional rules are not installed unless
I've installed logcheck. I'm sure there are several other places where I
can use it to reduce such tight coupling between classes.</p>

      <div><a href="http://www.davidpashley.com/blog/systems-administration/puppet/reducing-coupling" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Maven and Grails 1.2 snapshot</title>
  <link>http://www.davidpashley.com/blog/programming/java/grails-1.2-maven.html</link>
  <pubDate>Tue, 22 Dec 2009 23:01 GMT</pubDate>
  <dc:date>2009-12-22T23:01:35Z</dc:date>
  <description><![CDATA[
<p>Because I couldn't find the information anywhere else, if you want to
use maven with Grails 1.2 snapshot, use:
</p>
<pre>
mvn org.apache.maven.plugins:maven-archetype-plugin:2.0-alpha-4:generate
-DarchetypeGroupId=org.grails
-DarchetypeArtifactId=grails-maven-archetype
-DarchetypeVersion=1.2-SNAPSHOT     -DgroupId=uk.org.catnip
-DartifactId=armstrong
-DarchetypeRepository=http://snapshots.maven.codehaus.org/maven2
</pre>

      <div><a href="http://www.davidpashley.com/blog/programming/java/grails-1.2-maven" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Conversations regarding printers</title>
  <link>http://www.davidpashley.com/blog/linux/printer-conversation.html</link>
  <pubDate>Tue, 08 Dec 2009 15:42 GMT</pubDate>
  <dc:date>2009-12-08T15:42:03Z</dc:date>
  <description><![CDATA[
<p>I just had the following conversation with my linux desktop:</p>
<blockquote>
<p>Me: "Hi, I'd like to use my new printer please."</p>
<p>Computer: "Do you mean this HP Laserjet CP1515n on the network?"</p>
<p>Me: "Erm, yes I do."</p>
<p>Computer: "Good. You've got a test page printing as we speak.
Anything else I can help you with?"</p>
</blockquote>
<p>Sadly I don't have any alternative modern operating systems to
compare it to, but having done similar things with linux over the last
12 years, I'm impressed with how far we've come. Thank you to everyone
who made this possible.</p>

      <div><a href="http://www.davidpashley.com/blog/linux/printer-conversation" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Tarballs explained</title>
  <link>http://www.davidpashley.com/blog/linux/tarballs.html</link>
  <pubDate>Tue, 13 Oct 2009 09:57 GMT</pubDate>
  <dc:date>2009-10-13T09:57:56Z</dc:date>
  <description><![CDATA[
<p><small>This entry was originally posted in slightly different form to <a href="http://serverfault.com/questions/73914/gzip-not-decompressing-as-a-directory/73953#73953">Server Fault</a></small></p> 
<p>If you're coming from a Windows world, you're used to using tools like zip or
rar, which compress collections of files. In the typical Unix tradition of
doing one thing and doing one thing well, you tend to have two different
utilities; a compression tool and a archive format. People then use these two
tools together to give the same functionality that zip or rar provide.</p>

<p>There are numerous different compression formats; the common ones used on Linux
these days are gzip (sometimes known as zlib) and the newer, higher performing
bzip2. Unfortunately bzip2 uses more CPU and memory to provide the higher rates
of compression. You can use these tools to compress any file and by convention
files compressed by either of these formats is .gz and .bz2. You can use gzip
and bzip2 to compress and gunzip and bunzip2 to decompress these formats.</p>

<p>There are also several different types of archive formats available, including
cpio, ar and tar, but people tend to only use tar. These allow you to take a
number of files and pack them into a single file. They can also include path
and permission information. You can create and unpack a tar file using the tar
command. You might hear these operations referred to as "tarring" and
"untarring". (The name of the command comes from a shortening of Tape ARchive.
Tar was an improvement on the ar format in that you could use it to span
multiple physical tapes for backups). </p>

     <pre># tar -cf archive.tar list of files to include</pre>

<p>This will create (-c) and archive into a file -f called archive.tar. (.tar
is the convention  extention for tar archives). You should now have a single
file that contains five files ("list", "of", "files", "to" and "include"). If
you give tar a directory, it will recurse into that directory and store
everything inside it.</p>

<pre># tar -xf archive.tar
# tar -xf archive.tar list of files</pre>

<p>This will extract (-x) the previously created archive.tar. You can extract just
the files you want from the archive by listing them on the end of the command
line. In our example, the second line would extract "list", "of", "file", but
not "to" and "include". You can also use</p>

      <pre># tar -tf archive.tar</pre>

<p>to get a list of the contents before you extract them.</p>

<p>So now you can combine these two tools to replication the functionality of zip:</p>

      <pre># tar -cf archive.tar directory
# gzip archive.tar</pre>

<p>You'll now have an archive.tar.gz file. You can extract it using:</p>

      <pre># gunzip archive.tar.gz
# tar -xf archive.tar</pre>

<p>We can use pipes to save us having an intermediate archive.tar:</p>

      <pre># tar -cf - directory | gzip &gt; archive.tar.gz
# gunzip &lt; archive.tar.gz | tar -xf -</pre>

<p>You can use - with the -f option to specify stdin or stdout (tar knows which one based on context).</p>

<p>We can do slightly better, because, in a slight apparent breaking of the
"one job well" idea, tar has the ability to compress its output and decompress
its input by using the -z argument (I say apparent, because it still uses the
gzip and gunzip commandline behind the scenes)</p>

     <pre># tar -czf archive.tar.gz directory
# tar -xzf archive.tar.gz</pre>

<p>To use bzip2 instead of gzip, use bzip2, bunzip2 and -j instead of gzip, gunzip
and -z respectively (tar -cjf archive.tar.bz2). Some versions of tar can detect
a bzip2 file archive with you use -z and do the right thing, but it is probably
worth getting in the habit of being explicit.</p>

<p>More info:</p>

<ul>

 <li><a href="http://en.wikipedia.org/wiki/Tar_(file_format)">http://en.wikipedia.org/wiki/Tar_(file_format)</a></li>
 <li><a href="http://en.wikipedia.org/wiki/Bzip2">http://en.wikipedia.org/wiki/Bzip2</a></li>
 <li><a href="http://en.wikipedia.org/wiki/Gzip">http://en.wikipedia.org/wiki/Gzip</a></li>
</ul>

      <div><a href="http://www.davidpashley.com/blog/linux/tarballs" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>mod_proxy or mod_jk</title>
  <link>http://www.davidpashley.com/blog/systems-administration/tomcat/mod_proxy-or-mod_jk.html</link>
  <pubDate>Sun, 11 Oct 2009 10:53 GMT</pubDate>
  <dc:date>2009-10-11T10:53:33Z</dc:date>
  <description><![CDATA[
<p><small>This entry was originally posted in slightly different form to <a href="http://serverfault.com/questions/73314/modjk-or-modproxy/73361#73361">Server Fault</a></small></p>
<p>There are several ways to run Tomcat applications. You can either run
tomcat direcly on port 80, or you can put a webserver in front of tomcat and
proxy connections to it. I would highly recommend using Apache as a
front end. The main reason for this suggestion is that Apache is more
flexible than tomcat. Apache has many modules that would require you to
code support yourself in Tomcat. For example, while Tomcat can do gzip
compression, it's a single switch; enabled or disabled. Sadly you can
not compress CSS or javascript for Internet Explorer 6. This is easy to
support in Apache, but impossible to do in Tomcat. Things like caching
are also easier to do in Apache.</p>

<p>Having decided to use Apache to front Tomcat, you need to decide how
to connect them. There are several choices: mod_proxy ( more accurately, mod_proxy_http in
Apache 2.2, but I'll refer to this as mod_proxy), mod_jk and mod_jk2.
Mod_jk2 is not under active development and should not be used. This
leaves us with mod_proxy or mod_jk.</p>

<p>
Both methods forward requests from apache to tomcat. mod_proxy uses the HTTP
that we all know an love. mod_jk uses a binary protocol AJP. The main
advantages of mod_jk are:
</p>

<ul>
<li>
AJP is a binary protocol, so is slightly quicker for both ends to deal with and
uses slightly less overhead compared to HTTP, but this is minimal.  
</li>
<li>
AJP
includes information like original host name, the remote host and the SSL
connection. This means that ServletRequest.isSecure() works as expected, and
that you know who is connecting to you and allows you to do some sort of
virtualhosting in your code.
</li>
</ul>

<p>A slight disadvantage is that AJP is based on
fixed sized chunks, and can break with long headers, particularly request URLs
with long list of parameters, but you should rarely be in a position of having
8K of URL parameters. (It would suggest you were doing it wrong. :) )
</p>

<p>It used to be the case that mod_jk provided basic load balancing
between two tomcats, which mod_proxy couldn't do, but with the new
mod_proxy_balancer in Apache 2.2, this is no longer a reason to choose between them.</p>

<p>
The position is slightly complicated by the existence of mod_proxy_ajp. Between
them, mod_jk is the more mature of the two, but mod_proxy_ajp works in the same
framework as the other mod_proxy modules. I have not yet used mod_proxy_ajp,
but would consider doing so in the future, as mod_proxy_ajp is part of
Apche and mod_jk involves additional configuration outside of Apache.
</p>

<p>
Given a choice, I would prefer a AJP based connector, mostly due to my second
stated advantage, more than the performance aspect. Of course, if your
application vendor doesn't support anything other than mod_proxy_http, that
does tie your hands somewhat.</p>

<p>You could use an alternative webserver like lighttpd, which does have
an AJP module. Sadly, my prefered lightweight HTTP server, nginx, does
not support AJP and is unlike ever to do so, due to the design of its
proxying system.</p>

      <div><a href="http://www.davidpashley.com/blog/systems-administration/tomcat/mod_proxy-or-mod_jk" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Blog Copyright</title>
  <link>http://www.davidpashley.com/blog/meta/copyright.html</link>
  <pubDate>Sat, 10 Oct 2009 18:57 GMT</pubDate>
  <dc:date>2009-10-10T18:57:05Z</dc:date>
  <description><![CDATA[
<p>To make things explicitly clear, my blog is copyrighted and licensed
as "All rights reserved". It even says that at the footer of every page.
That means you may not redistribute any content without my permission.
Yes, this means you, Ross Beazley. I may allow aggregation sites to
redistribute my content, but the only sites where I have given explicit
permission are Planet Debian and Planet BNM.  I am unlikely to be upset
if your aggregation site links back to the original entry and does not
carry advertising, and will probably give you permission. If both these
conditions are not met, you do not have permission and will not be
granted permission. </p>

      <div><a href="http://www.davidpashley.com/blog/meta/copyright" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Name-Based HTTPS</title>
  <link>http://www.davidpashley.com/blog/computing/name-based-https.html</link>
  <pubDate>Sat, 10 Oct 2009 07:17 GMT</pubDate>
  <dc:date>2009-10-10T07:17:21Z</dc:date>
  <description><![CDATA[

<p><small>This entry was originally posted in slightly different form to <a href="http://serverfault.com/questions/73162/can-i-run-two-different-secure-sites-using-the-port-443-on-the-same-server/73206#73206">Server Fault</a></small></p>
<p>There are two methods of using virtual hosting with HTTP: Name based and IP based.
IP based is the simplest as each virtual host is served from a different
IP address configured on the server, but this requires an IP address for
every host, and we're meant to be running out. The better solution is to
use the Host: header introduced in HTTP 1.1, which allows the server to
serve the right host to the client from a single IP address.</p>

<p>HTTPS throws a spanner in the works, as the server does not know
which certificate to present to the client during the SSL connection set
up, because the client can't send the Host: header until the connection
is set up. As a result, if you want to host more than one HTTPS site,
you need to use IP-based virtual hosting.
</p>

<p>
However, you can run multiple SSL sites from a single IP address using a couple of
methods, each with their own drawbacks.
</p>
<p>
The first method is to have a SSL certificate that covers both sites. The idea
here is to have a single SSL certificate that covers all the domains you want
to host from a single IP address. You can either do this using a wildcard
certificate that covers both domains or use Subject Alternative Name.
</p>
<p>
Wildcard certificates would be something *.example.com, which would cover
www.example.com, mail.example.com and support.example.com. There are a number
of problems with wildcard certificates. Firstly, every hostname needs to have a
common domain, e.g. with *.example.com you can have www.example.com, but not
www.example.org. Secondly, you can't reliably have more than one subdomain,
i.e. you can have www.example.com, but not www.eu.example.com. This might work
in earlier versions of Firefox (&lt;= 3.0), but it doesn't work in 3.5 or any
version of Internet Explorer. Thirdly, wildcard certificates are significantly
more expensive than normal certificates if you want it signed by a root CA.
</p>
<p>
Subject Alternative Name is a method of using an extension to X509 certificates
that lists alternative hostnames that are valid for that certificate. It
involves adding a "subjectAltName" field to the certificate that lists each
additional host you want covered by the certificate. This should work in most
browsers; certainly every modern mainstream browser. The downside of this
method is that you have to list every domain on the server that will use SSL.
You may not want this information publicly available. You probably don't want
unrelated domains to be listed on the same certificate. It may also be
difficult to add additional domains at a later date to your certificate.

</p>
<p>
The second approach is to use something called SNI (Server Name Indication)
which is an extension in TLS that solves the chicken and egg problem of not
knowing which certificate to send to the client because the client hasn't sent
the Host: header yet. As part of the TLS negotiation, the client sends the
required hostname as one of the options. The only downside to this is client
and server support. The support in browsers tends to be better than in servers.
Firefox has supported it since 2.0. Internet Explorer supports it from 7
onwards, but only on Vista or later. Chrome only supports it on Vista or later
too. Opera 8 and Safari 8.2.1 have support. Other browsers may not support it.

</p>
<p>
The biggest problem preventing adoption is the server support. Until very
recently neither of the two main webservers supported it. Apache gained SNI
support as of 2.2.12, which was released July 2009. As of writing, IIS does not
support SNI in any version. nginx, lighttpd and Cherokee all support SNI.

</p>
<p>
Going forward, SNI is the best method for solving the name-based virtual
hosting of HTTPS, but support might be patchy for a year or two yet. If you
must do HTTPS virtual hosting without problems in the near future, IP based
virtual hosting is the only option.
</p>

      <div><a href="http://www.davidpashley.com/blog/computing/name-based-https" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Setting up gitosis on Jaunty</title>
  <link>http://www.davidpashley.com/blog/linux/gitosis.html</link>
  <pubDate>Tue, 01 Sep 2009 16:38 GMT</pubDate>
  <dc:date>2009-09-01T16:38:18Z</dc:date>
  <description><![CDATA[
<p>
While git is a completely distributed revision control system, sometimes
the lack of a central canonical repository can be annoying. For example,
you might want to make your repository published publically, so other
people can fork your code, or you might want all your developers to push
into (or have code pulled into) a central "golden" tree, that you then
use for automated building and continuous integration. This entry should
explain how to get this all working on Ubuntu 9.04 (Jaunty).</p>

<p>
Gitosis is a very useful git repository manager, which adds support like
ACLs in pre-commits and gitweb and git-daemon management.  While it's
possible to set all these things up by hand, gitosis does everything for
you. It is nicely configured via git; to make configuration changes,
you push the config file changes into gitosis repository on the server. 
</p>

<p>Gitosis is available in Jaunty, but unfortunately there is a <a
href="https://bugs.launchpad.net/ubuntu/+source/gitosis/+bug/368895">bug</a>
in the version in Jaunty, which means it doesn't work out of the box.
Fortunately there is a fixed version in <tt>jaunty-proposed</tt> that
fixes the main problem. This does mean that you need to add the
following to your <tt>sources.list</tt>:</p>

<pre>deb http://gb.archive.ubuntu.com/ubuntu/ jaunty-proposed universe</pre>

<p>Run <tt>apt-get update &amp;&amp; apt-get install gitosis</tt>. You should
install <tt>0.2+20080825-2ubuntu0.1</tt> or later. There is another
small bug in the current version too, as a result of git removing the
<tt>git-$command</tt> scripts out of <tt>/usr/bin</tt>. Edit
<tt>/usr/share/python-support/gitosis/gitosis/templates/admin/hooks/post-update</tt>
and replace</p>

<pre>git-update-server-info</pre>

<p>with</p>

<pre>git update-server-info</pre>

<p>With these changes in place, we can now set up our gitosis
repository. On the server you are going to use to host your central
repositories, run:</p>

<pre>sudo -H -u gitosis gitosis-init &lt; id_rsa.pub</pre>

<p>The <tt>id_rsa.pub</tt> file is a public ssh key. As I mentioned,
gitosis is managed over git, so you need an initial user to clone and
then push changes back into the gitosis repo, so make sure this key
belongs to a keypair you have available to the remote user you're going
to configure gitosis.</p>

<p>Now, on your local computer, you can clone the gitosis-admin repo
using:</p>

<pre>git clone gitosis@gitserver.example.com:gitosis-admin.git</pre>

<p>If you look inside the <tt>gitosis-admin</tt> directory, you should
find a file called <tt>gitosis.conf</tt> and a directory called
<tt>keydir</tt>. The directory is where you can add ssh public keys for
your users. The file is the configuration file for gitosis.</p>

<pre>
[gitosis]
loglevel = INFO

[group gitosis-admin]
writable = gitosis-admin
members = david@david

[group developers]
members = david@david
writable = publicproject privateproject

[group contributors]
members = george@wilber
writable = publicproject

[repo publicproject]
daemon = yes
gitweb = yes

[repo privateproject]
daemon = no
gitweb = no
</pre>
<p>This sets up two repositories, called publicproject and
privateproject. It enables the public project to be available via the
git protocol and in gitweb if you have that installed. We also create
two groups, developers and contributors. David has access to both
projects, but George only has access to change the publicproject. David
can also modify the gitosis configuration. The users are the names of
ssh keys (the last part of the line in id_dsa.pub or id_rsa.pub).</p>

<p>Once you've changed this file, you can run <tt>git add
gitosis.conf</tt> to add it to the commit, <tt>git commit -m "update
gitosis configuration</tt> to commit it to your local repository, and
finally <tt>git push</tt> to push your commits back up into the central
repository. Gitosis should now update the configuration on the server to
match the config file.</p>

<p>One last thing to do is to enable git-daemon, so people can
anonymously clone your projects. Create /etc/event.d/git-daemon with the
following contents:</p>
<pre>
start on startup
stop on shutdown

exec /usr/bin/git daemon \
   --user=gitosis --group=gitosis \
   --user-path=public-git \
   --verbose \
   --syslog \
   --reuseaddr \
   --base-path=/srv/gitosis/repositories/ 
respawn
</pre>
<p>You can now start this using <tt>start git-daemon</tt></p>

<p>So now, you need to start using your repository. You can either start
with an existing project or an empty directory. Start by running
<tt>git init</tt> and then <tt>git add $file</tt> to add each of the
files you want in your project, and finally <tt>git commit</tt> to
commit them to your local repository. The final task is to add a remote
repository and push your code into it.</p>
<pre>git remote add origin gitosis@gitserver.example.com:privateproject.git
git push origin master:refs/heads/master</pre>

<p>In future, you should be able to do <tt>git push</tt> to push your
changes back into the central repository. You can also clone a project
using git or ssh, providing you have access, using the following
commands. The first is for read-write access over ssh and the second
uses the git protocol for read-only access. The git protocol uses TCP
port 9418, so make sure that's available externally, if you want the
world to be able to clone your repos.</p>
<pre>git clone gitosis@gitserver.example.com:publicproject.git
git clone git://gitserver.example.com/publicproject.git</pre>
<p>Setting up GitWeb is left as an exercise for the reader (and myself
because I am yet to attempt to set that up).</p>

      <div><a href="http://www.davidpashley.com/blog/linux/gitosis" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Puppetmaster with nginx and Mongrel on Ubuntu</title>
  <link>http://www.davidpashley.com/blog/systems-administration/puppet/nginx-mongrel.html</link>
  <pubDate>Tue, 18 Aug 2009 21:36 GMT</pubDate>
  <dc:date>2009-08-18T21:36:36Z</dc:date>
  <description><![CDATA[
<p>If you've not heard of <a
href="http://reductivelabs.com/trac/puppet/">Puppet</a>, it is a
configuration management tool. You write descriptions of how you want
your systems to look and it checks the current setup and works out what
it needs to do to move your system so it matches your description. 
The idea is to write how it should look, not how to change the system.</p>

<p>Puppet uses a client (puppetd) that talks to the central server
(puppetmaster) over HTTPS.The default puppetmaster HTTP server is
<a href="http://www.webrick.org/">webbrick</a>, which is a
lightweight Ruby HTTP server. While it's simple and allows Puppetmaster
to work straight out the box, due to it's pure Ruby structure and Ruby's
green thread architecture, it doesn't scale beyond a simple puppet
setup. After a while, every medium to large Puppet installation needs to
move to the other HTTP server that puppet supports: <a href="http://mongrel.rubyforge.org/">Mongrel</a>. Mongrel is
a faster HTTP library, but supports a lot less features. In particular
it doesn't support SSL, which is important with Puppet, as Puppet relies
heavily on client certificate verification for authentication.  As a
result, we need to put another webserver in front that can handle the
SSL aspect. As a nice side effect of having to proxy to Puppetmaster is
that we can run several puppetmaster processes and improve on the green
threads problem that Ruby has. In this blog post, I'm going to describe
setting up <a href="http://wiki.nginx.org/Main">nginx</a> and mongrel.</p>
<p>The first thing to do is to install the <tt>mongrel</tt> and
<tt>nginx</tt> packages. </p>

<pre>apt-get install mongrel nginx</pre>
<p>We need to run nginx on port 8140 and proxy to
our mongrel servers on different ports, so lets move puppetmaster off
8140 and configure it to use mongrel while we're at it. Edit
<tt>/etc/default/puppetmaster</tt> and set the following variables:</p>
<pre>SERVERTYPE=mongrel
PUPPETMASTERS=4
PORT=18140
DAEMON_OPTS="--ssl_client_header=HTTP_X_SSL_SUBJECT"</pre>
<p>This tells the init.d script to use the mongrel server type and to
run four of them. The init.d script is clever enough to start up the
right number of processes and will set them up to use a sequence of
ports for each one, starting at 18140 for the first process, up to 18143
for the last one. The <tt>DAEMON_OPTS</tt> option tells Puppetmaster how
we're going to pass the SSL certificate information from nginx so it can
grant or refuse permission.</p>
<p>Now to set up nginx. Put the following in
<tt>/etc/nginx/conf.d/puppetmaster.conf</tt>:</p>
<pre>
ssl                     on;
ssl_certificate /var/lib/puppet/ssl/certs/puppetmaster.example.com.pem;
ssl_certificate_key /var/lib/puppet/ssl/private_keys/puppetmaster.example.com.pem;
ssl_client_certificate  /var/lib/puppet/ssl/certs/ca.pem;
ssl_ciphers             SSLv2:-LOW:-EXPORT:RC4+RSA;
ssl_session_cache       shared:SSL:8m;
ssl_session_timeout     5m;

upstream puppet-production {
   server 127.0.0.1:18140;
   server 127.0.0.1:18141;
   server 127.0.0.1:18142;
   server 127.0.0.1:18143;
}</pre>
<p>In this file we tell nginx where to find the server certificates for
your puppetmaster, so your clients can authenticate your server. We also
tell nginx the CA certificate to authenticate clients with and set up
some SSL details required for Puppet. Finally we create a group of
remote servers for our pack of mongrel puppetmasters, so we can refer to
them later. If you added more or less servers earlier don't forget to
add or remove them here. You also need to replace
puppetmaster.example.com with your FQDN. If at a later stage, you find
you need ever more performance, you can easily move some of your
puppetmaster processes to a separate box and update the upstream list to
refer to servers on the remote server.</p>

<p>Finally, we need to set up a couple of HTTP servers. Create
<tt>/etc/nginx/sites-enabled/puppetmaster</tt> with the following
contents:</p>
<pre>server {
    listen                  8140;
    ssl_verify_client       on;
    root                    /var/empty;
    access_log              /var/log/nginx/access-8140.log;

    # Variables
    # $ssl_cipher returns the line of those utilized it is cipher for established SSL-connection
    # $ssl_client_serial returns the series number of client certificate for established SSL-connection
    # $ssl_client_s_dn returns line subject DN of client certificate for established SSL-connection
    # $ssl_client_i_dn returns line issuer DN of client certificate for established SSL-connection
    # $ssl_protocol returns the protocol of established SSL-connection

    location / {
        proxy_pass          http://puppet-production;
        proxy_redirect      off;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header    X-Client-Verify  SUCCESS;
        proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
        proxy_read_timeout  65;
    }
}

server {
    listen                  8141;
    ssl_verify_client       off;
    root                    /var/empty;
    access_log              /var/log/nginx/access-8141.log;

    location / {
        proxy_pass  http://puppet-production;
        proxy_redirect     off;
        proxy_set_header   Host             $host;
        proxy_set_header   X-Real-IP        $remote_addr;
        proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
        proxy_set_header   X-Client-Verify  FAILURE;
        proxy_set_header   X-SSL-Subject    $ssl_client_s_dn;
        proxy_set_header   X-SSL-Issuer     $ssl_client_i_dn;
        proxy_read_timeout  65;
    }
}
</pre>
<p>This creates two servers on port 8140 and 8141 which both proxy all
requests to our group of mongrel servers, adding suitable headers to
pass on the SSL information. The only difference between them is the
X-Client-Verify header. This shows the one problem with using nginx with
puppet. Because the client verification success or failure is not
available as a variable before nginx 0.8.7, we can't have a single port
for both the usual client connection and the initial unauthenticated
connection where the client requests a certificate to be signed. As a
result, with this setup, you are required to run puppet with <tt>--ca-port
8141</tt> the first time you run puppet until the certificate has been
signed with puppetca.</p>

<p>Foruntately with versions of nginx later than 0.8.7, you can use a
simpler setup shown below. This replaces both files shown above with the single
server. Unfortunately, 0.8.7 is not available in any
version of Ubuntu yet, not even Karmic.</p>
<pre>
server {
  listen 8140;

  ssl                     on;
  ssl_session_timeout     5m;
  ssl_certificate         /var/lib/puppet/ssl/certs/puppetmaster.pem;
  ssl_certificate_key     /var/lib/puppet/ssl/private_keys/puppetmaster.pem;
  ssl_client_certificate  /var/lib/puppet/ssl/ca/ca_crt.pem;

  # choose any ciphers
  ssl_ciphers             SSLv2:-LOW:-EXPORT:RC4+RSA;

  # allow authenticated and client without certs
  ssl_verify_client       optional;

  # obey to the Puppet CRL
  ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem;

  root                    /var/tmp;

  location / {
    proxy_pass              http://puppet-production;
    proxy_redirect         off;
    proxy_set_header    Host             $host;
    proxy_set_header    X-Real-IP        $remote_addr;
    proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
    proxy_set_header    X-Client-Verify  $ssl_client_verify;
    proxy_set_header    X-SSL-Subject    $ssl_client_s_dn;
    proxy_set_header    X-SSL-Issuer     $ssl_client_i_dn;
    proxy_read_timeout  65;
  }
}
</pre>
<p>If you are running another webserver on the server, you may want to
delete <tt>/etc/nginx/sites-enabled/default</tt> which attempts to
create a server listening on port 80, which will conflict with your
existing HTTP server.</p>
<p>If you follow these instructions, you should find yourself with a
better performing puppetmaster and significantly few "connection reset
by peer" and other related error messages.</p>

      <div><a href="http://www.davidpashley.com/blog/systems-administration/puppet/nginx-mongrel" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Reviewing svn up changes</title>
  <link>http://www.davidpashley.com/blog/linux/svn-up.html</link>
  <pubDate>Thu, 06 Aug 2009 07:48 GMT</pubDate>
  <dc:date>2009-08-06T07:48:00Z</dc:date>
  <description><![CDATA[
<p>Sometimes you need to review what exactly doing an <tt>svn up</tt>
will do. Fortunately, you can do a couple of things to find out. The
first is use <tt>svn status -u</tt> to find out what files have
changed:</p>
<pre>
/etc/puppet/modules# svn stat -u
       *     1338   exim4_mailserver/files/exim_db.pl
       *     1338   dbplc/files/sort-dbfs.pl
       *     1338   dbplc/files
       *     1338   dbplc/manifests/portal.pp
M            1386   tomcat/files/server.xml
Status against revision:   1386
</pre>
<p>Here we can see that four files were changed since their current
revision of 1338. We can also see that tomcat/files/server.xml is up to
date against the repository, but has local modifications.</p>
<p>This is all well and good, but how do we know what the changes are?
Well, <tt>svn diff</tt> is our friend here. By comparing the checkout
against the repository, we can see what will be updated. </p>
<pre>
/etc/puppet/modules# svn diff dbplc/manifests/portal.pp -rBASE:HEAD
Index: dbplc/manifests/portal.pp
===================================================================
--- dbplc/manifests/portal.pp (working copy)
+++ dbplc/manifests/portal.pp (revision 1386)
@@ -22,6 +22,7 @@
       owner => "tomcat55",
       group => "adm",
       mode => 644,
+      require => Package["tomcat5.5"],
    }
 
    apache::config { "portal":
</pre>
<p>When you're happy, you can run <tt>svn up</tt> as normal. I've just
used this process to sanity check our <a
href="http://reductivelabs.com/products/puppet/">Puppet</a> config before updating, as
it hasn't been updated for a few days.</p>

      <div><a href="http://www.davidpashley.com/blog/linux/svn-up" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Efficient Finds</title>
  <link>http://www.davidpashley.com/blog/linux/efficient-find.html</link>
  <pubDate>Fri, 24 Jul 2009 22:37 GMT</pubDate>
  <dc:date>2009-07-24T22:37:54Z</dc:date>
  <description><![CDATA[
<p>Recently, I've seen a lot of suggestions along the lines of:</p>
<pre>find . -name foo -exec ls -l {} \;</pre>
<p>This is incredibly inefficient, because for each and every matching
file, <tt>ls</tt> will be executed. Fortunately, we can do better by
using <tt>xargs</tt>. <tt>xargs</tt> takes a list of lines from stdin
and uses them to build up a command line. It takes special care to not
exceed the maximum command line length by splitting the input up into
multiple commands if it is needed. So, with that knowledge, we can
replace our original command with:</p>
<pre>find . -name foo | xargs ls -l</pre>
<p>There is one slight problem with this command; it isn't space-safe.
<tt>xargs</tt> splits arguments on whitespace, so "file name" will be
incorrectly passed to the command as "file" "name". Fortunately,
<tt>xargs</tt> has an option to delimit parameters by the null
character, and as our luck would have it, <tt>find</tt> has a suitable
option to produce output in this format. This means our command is
now:</p>
<pre>find . -name foo -print0 | xargs -0 ls -l</pre>
<p><tt>mlocate</tt> can do something similar:</p>
<pre>locate foo -0 | xargs -0 ls -l</pre>
<p>GNU find has one more trick up its sleeve. It has a modified version
of <tt>-exec</tt> that will do the same thing as <tt>xargs</tt>, so we could have
written our original command as:</p>
<pre>find . -name foo -exec ls -l {} +</pre>
<p>Every process is sacred; every process is great. If a process is wasted, God gets quite
irate. Please make sure you try to use one of the latter forms and not the
first form, and make a happy deity. :)</p>

      <div><a href="http://www.davidpashley.com/blog/linux/efficient-find" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Copying files with netcat</title>
  <link>http://www.davidpashley.com/blog/linux/copying-files-with-netcat.html</link>
  <pubDate>Mon, 15 Jun 2009 12:51 GMT</pubDate>
  <dc:date>2009-06-15T12:51:51Z</dc:date>
  <description><![CDATA[
<p>When you want to copy files from one machine to another, you might
think about using <tt>scp</tt> to copy them. You might think about using
<tt>rsync</tt>. If, however, you're
trying to copy a large amount of data between two machines, here's a
better, quicker, way to do it is using netcat.</p>

<p>On the receiving machine, run:</p>

<pre># cd /dest/dir &amp;&amp; nc -l -p 12345 | tar -xf -</pre>

<p>On the sending machine you can now run:</p>

<pre># cd /src/dr &amp;&amp; tar -xf - | nc -q 0 remote-server 12345</pre>

<p>You should find that everything works nicely, and a lot quicker. If
bandwidth is more constrained than CPU, then you can add "z" or "j" to
the tar options ("<tt>tar -xzf -</tt>" etc) to compress the data before it sends
it over the network. If you're on gigabit, I wouldn't bother with the
compression. If it dies, you'll have to start from the beginning, but
then you might find you can get away with using rsync if you've copied
enough.  It's also
worth pointing out that the recieving netcat will die as soon as the
connection closes, so you'll need to restart it if you want to copy the
data again using this method.
</p>

<p>It's worth pointing out that this does not have the security that scp or
rsync-over-ssh has, so make sure you trust the end points and everything
in between if you don't want anyone else to see the data.</p>

<p>Why not use scp? because it's incredibly slow in comparison. God knows
what scp is doing, but it doesn't copy data at wire speed. It aint the
encyption and decryption, because that'd just use CPU and when I've done
it it's hasn't been CPU bound. I can only assume that the scp process
has a lot of handshaking and ssh protocol overhead.</p>

<p>Why not rsync? Rsync doesn't really buy you that much on the first copy.
It's only the subsequent runs where rsync really shines. However, rsync
requires the source to send a complete file list to the destination
before it starts copying any data. If you've got a filesystem with a
large number of files, that's an awfully large overhead, especially as
the destination host has to hold it in memory.</p>


      <div><a href="http://www.davidpashley.com/blog/linux/copying-files-with-netcat" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Table sizes in PostgreSQL</title>
  <link>http://www.davidpashley.com/blog/databases/postgresql/table_sizes.html</link>
  <pubDate>Wed, 10 Jun 2009 09:42 GMT</pubDate>
  <dc:date>2009-06-10T09:42:45Z</dc:date>
  <description><![CDATA[
<p>Ever wanted to find out how much diskspace each table was taking in a
database? Here's how:</p>
<pre>
database=# SELECT 
   tablename, 
   pg_size_pretty(pg_relation_size(tablename)) AS table_size, 
   pg_size_pretty(pg_total_relation_size(tablename)) AS total_table_size 
FROM 
   pg_tables 
WHERE 
   schemaname = 'public';
 tablename  | table_size | total_table_size 
------------+------------+------------------
 deferrals  | 205 MB     | 486 MB
 errors     | 58 MB      | 137 MB
 deliveries | 2646 MB    | 10096 MB
 queue      | 7464 kB    | 22 MB
 unknown    | 797 MB     | 2644 MB
 messages   | 1933 MB    | 6100 MB
 rejects    | 25 GB      | 75 GB
(7 rows)
</pre>
<p>Table size is the size for the current data.
Total table size includes indexes and data that is too large to fix in
the main table store (things like large BLOB fields). You can find more
information in the <a
href="http://www.postgresql.org/docs/8.3/static/functions-admin.html#FUNCTIONS-ADMIN-DBSIZE">PostgreSQL
manual</a>.</p>
<p><strong>Edit:</strong> changed to use <tt>pg_size_pretty()</tt>, which I thought existed, but couldn't find in the docs. <a href="http://www.sommitrealweird.co.uk/">Brett Parker</a> reminded me it did exist after all and I wasn't just imagining it.</p>

      <div><a href="http://www.davidpashley.com/blog/databases/postgresql/table_sizes" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Check maven dependencies</title>
  <link>http://www.davidpashley.com/blog/programming/java/maven-dependencies.html</link>
  <pubDate>Fri, 10 Apr 2009 09:14 GMT</pubDate>
  <dc:date>2009-04-10T09:14:33Z</dc:date>
  <description><![CDATA[
<p>One really nice feature of maven is the dependency resolution stuff
that it does. The dependency plugin also has an analyse goal that can
detect a number of problems with your dependencies. It can detect
libraries you use but haven't declared in your POM, but work through
transitive dependencies. This can cause build problems when you remove
the library that was dragging in the undeclared dependency. It can also
work out which dependencies you are no longer using, but have a declared
dependency.</p>

<pre>
mojo-jojo david% mvn dependency:analyze
[INFO] Scanning for projects...
...
[INFO] [dependency:analyze]
[WARNING] Used undeclared dependencies found:
[WARNING]    commons-collections:commons-collections:jar:3.2:compile
[WARNING]    commons-validator:commons-validator:jar:1.3.1:compile
[WARNING]    org.apache.myfaces.core:myfaces-api:jar:1.2.6:compile
[WARNING] Unused declared dependencies found:
[WARNING]    javax.faces:jsf-api:jar:1.2_02:compile
...
</pre>

      <div><a href="http://www.davidpashley.com/blog/programming/java/maven-dependencies" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>How not to configure your DNS</title>
  <link>http://www.davidpashley.com/blog/computing/bad-ptr.html</link>
  <pubDate>Fri, 10 Apr 2009 09:04 GMT</pubDate>
  <dc:date>2009-04-10T09:04:17Z</dc:date>
  <description><![CDATA[
<p>How not to configure your DNS</p>
<pre>david% dig -x 190.208.19.230

; &lt;&lt;>> DiG 9.4.2-P2 &lt;&lt;>> -x 190.208.19.230
;; global options:  printcmd
;; Got answer:
;; ->>HEADER&lt;&lt;- opcode: QUERY, status: NOERROR, id: 35398
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;230.19.208.190.in-addr.arpa.   IN      PTR

;; ANSWER SECTION:
230.19.208.190.in-addr.arpa. 3600 IN    PTR     190.208.19.230.

;; Query time: 253 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Fri Apr 10 10:00:21 2009
;; MSG SIZE  rcvd: 73
</pre>
<p>Whoops</p>

      <div><a href="http://www.davidpashley.com/blog/computing/bad-ptr" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Fast Servlet development with Maven and Jetty</title>
  <link>http://www.davidpashley.com/blog/programming/java/maven-jetty.html</link>
  <pubDate>Sat, 14 Mar 2009 09:35 GMT</pubDate>
  <dc:date>2009-03-14T09:35:27Z</dc:date>
  <description><![CDATA[
<p>On off the biggest problems with developing servlets under a
container like <a href="http://tomcat.apache.org/">Tomcat</a> is the amount of time taken to build your code,
deploy it to the container and restart it to pick up any changes. <a href="http://maven.apache.org/">Maven</a>
and the <a href="http://www.mortbay.org/jetty/">Jetty</a> <a href="http://docs.codehaus.org/display/JETTY/Maven+Jetty+Plugin">plugin</a> allow you to cut down on this cycle considerably.
The first step is to allow you to start your application in maven by
running: </p>
<pre>mvn jetty:run</pre>
<p>We do this by configuring the jetty plugin inside our
<tt>pom.xml</tt>:</p>
<pre>
&lt;plugin>
   &lt;groupId>org.mortbay.jetty&lt;/groupId>
   &lt;artifactId>maven-jetty-plugin&lt;/artifactId>
   &lt;version>6.1.10&lt;/version>
&lt;/plugin>
</pre>
<p>Now when you run <tt>mvn jetty:run</tt> your application will start
up. But we can improve on this. The Jetty plugin can be configured to
scan your project every so often and rebuild it and reload it if
anything changes. We do this by changing our pom.xml to read:</p>
<pre>
&lt;plugin>
   &lt;groupId>org.mortbay.jetty&lt;/groupId>
   &lt;artifactId>maven-jetty-plugin&lt;/artifactId>
   &lt;version>6.1.10&lt;/version>
   <strong>&lt;configuration>
      &lt;scanIntervalSeconds>10&lt;/scanIntervalSeconds>
   &lt;/configuration></strong>
&lt;/plugin>
</pre>
<p>Now when you save a file in your IDE, by the time you've switched to
your web browser, Jetty is already running your updated code. Your
development cycle is almost up to the same speed as Perl or PHP.</p>

<p>You can find more information at the <a href="http://docs.codehaus.org/display/JETTY/Maven+Jetty+Plugin">plugin page.</a></p>

      <div><a href="http://www.davidpashley.com/blog/programming/java/maven-jetty" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>MySQL silently truncating your data: Update</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/silently-truncated-warnings.html</link>
  <pubDate>Sun, 15 Feb 2009 14:17 GMT</pubDate>
  <dc:date>2009-02-15T14:17:39Z</dc:date>
  <description><![CDATA[
<p>After my entry yesterday about MySQL truncating data, several people
have pointed out that MySQL 4.1 or later gives you a warning. Yes, this is true. You
can even see it in the example I gave:</p>
<pre>Query OK, 1 row affected, 1 warning (0.00 sec)</pre>
<p>I ignored mentioning this, but perhaps should have said something
about it. I reason I didn't mention it was because I didn't feel that a
warning really helped anyone. Developers have enough problems
remembering to check for errors, let along remembering to check in case
there was a warning as well. Plus, they'd then have to work out if the
warning was something serious or something they could ignore. There's
also the question of how well the language bindings present this
information. Take for example, PHP. The mysqli extension gained support
for checking for warnings in PHP5 and gives the <a
href="http://www.php.net/manual/en/mysqli.warning-count.php">following
code</a> as an
example of getting warnings:</p>

<pre>mysqli_query($link, $query);

if (mysqli_warning_count($link)) {
   if ($result = mysqli_query($link, "SHOW WARNINGS")) {
      $row = mysqli_fetch_row($result);
      printf("%s (%d): %s\n", $row[0], $row[1], $row[2]);
      mysqli_free_result($result);
   }
}</pre>

<p>Hardly concise code. As of 5.1.0, there is also <a
href="http://www.php.net/manual/en/mysqli.get-warnings.php">mysqli_get_warnings()</a>,
but is undocumented beyond noting its existence. The MySQL extension
does not support getting warning information. The PDO wrapper doesn't
provide any way to get this information. </p>

<p>In perl, <tt>DBD::mysql</tt> has a <tt>mysql_warning_count()</tt>
function, but presumably would have to call <tt>"SHOW WARNINGS"</tt>
like in the PHP example. Seems Python's MYSQLdb module will raise an
exception on warnings in certain cases. Mostly using the Cursor
object.</p>
<p>In java, you can set the <tt>jdbcCompliantTruncation</tt> connection
parameter to make the driver throw <tt>java.sql.DataTruncation</tt>
exceptions, as per the JDBC spec, which makes you wonder why this isn't
set by default. Unfortunately this setting is usually outside the
programmer's control. There is also the
<tt>java.sql.Statement.getWarnings()</tt>, but once again, you need to
check this after every statement. Not sure if ORM tools like Hibernate
check this or not.</p>

<p>So, yes MySQL does give you a warning, but in practice is useless.</p>

      <div><a href="http://www.davidpashley.com/blog/databases/mysql/silently-truncated-warnings" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>MySQL silently truncating your data</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/silently-truncated.html</link>
  <pubDate>Sun, 15 Feb 2009 01:50 GMT</pubDate>
  <dc:date>2009-02-15T01:50:52Z</dc:date>
  <description><![CDATA[
<p>MySQL in its standard configuration has this wonderful "feature" of
truncating your data if it can't fit in the field.</p>
<pre>
mysql> create table foo (bar varchar(4));
Query OK, 0 rows affected (0.00 sec)

mysql> insert into foo (bar) values ("12345");
Query OK, 1 row affected, 1 warning (0.00 sec)
</pre>

<p>In comparison, PostgeSQL does:</p>
<pre>
psql=> create table foo (bar varchar(4));
CREATE TABLE
psql=> insert into foo (bar) values ('12345');
ERROR:  value too long for type character varying(4)
</pre>
<p>You can make MySQL do the right thing by setting the <a
href="http://dev.mysql.com/doc/refman/5.0/en/server-sql-mode.html">SQL
Mode</a> option to
include <tt>STRICT_TRANS_TABLES</tt> or <tt>STRICT_ALL_TABLES</tt>. The difference is that the
former will only enable it for transactional data storage engines. As much as
I'm loathed to say it, I don't recommend using <tt>STRICT_ALL_TABLES</tt>, as an error
during updating a non-transational table will result in a partial
update, which is probably worse than a truncated field. Setting the mode
to <tt>TRADITIONAL</tt> includes both these and a couple of related issues
(<tt>NO_ZERO_IN_DATE</tt>, <tt>NO_ZERO_DATE</tt>,
<tt>ERROR_FOR_DIVISION_BY_ZERO</tt>) You can set the
mode using:</p>
<ul>
<li>
   <p>
   On the command line:
   </p>
   <pre>--sql-mode="TRADITIONAL"</pre>
   </li>
<li>
   <p>
   In <tt>/etc/mysql/my.cnf</tt>:
   </p>
   <pre>sql-mode="TRADITIONAL"</pre>
   </li>
<li>
   <p>
   At runtime:
   </p>
   <pre>
SET GLOBAL sql_mode="TRADITIONAL"
SET SESSION sql_mode="TRADITIONAL"</pre>
   </li>
</ul>
<p>Just say no to databases that happily throw away your data</p>

      <div><a href="http://www.davidpashley.com/blog/databases/mysql/silently-truncated" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Subversion and &quot;(502 Bad Gateway) in response to COPY request&quot; errors</title>
  <link>http://www.davidpashley.com/blog/computing/svn-bad-gateway.html</link>
  <pubDate>Sun, 25 Jan 2009 16:11 GMT</pubDate>
  <dc:date>2009-01-25T16:11:13Z</dc:date>
  <description><![CDATA[

<p>Was attempting to merge a branch in one of my projects and upon
committing the merge, I kept getting this error:</p>
<pre>
mojo-jojo david% svn commit -m "merge in the maven branch"
Sending        trunk
Sending        trunk/.classpath
Sending        trunk/.project
Adding         trunk/.settings
svn: Commit failed (details follow):
svn: Server sent unexpected return value (502 Bad Gateway) in response
to COPY request for '/svn/eddie/!svn/bc/314/branches/maven/.settings'
</pre>
<p>A quick search found several other people having the same problem.
Seems it only happens for https repositories using mod_dav_svn.
The solution is to make sure that your virtual host in apache has
explicit SSL config options, even if you are using an SSL config from a
default virtual host. For example, I added the following to my
subversion vhost, which was just copied from my default vhost:</p>
<pre>
SSLEngine on
SSLCertificateFile /etc/apache2/ssl/catnip.org.uk.crt
SSLCertificateKeyFile /etc/apache2/ssl/catnip.org.uk.key
</pre>



      <div><a href="http://www.davidpashley.com/blog/computing/svn-bad-gateway" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>DBI boilerplate code</title>
  <link>http://www.davidpashley.com/blog/programming/perl/DBI-boilerplate.html</link>
  <pubDate>Wed, 21 Jan 2009 16:42 GMT</pubDate>
  <dc:date>2009-01-21T16:42:47Z</dc:date>
  <description><![CDATA[
<p>I keep writing code to talk to databases in perl and I'm forever
forgetting the correct runes for talking to databases, so I thought I'd
stick it here for easy reference.</p>
<pre>
use DBI;

my $db_driver = "Pg" # Pg or mysql (or others)
my $db_name = "database";
my $db_host = "localhost";
my $db_user = "username";
my $db_pass = "password";


my $dbh = DBI->connect("dbi:$db_driver:dbname=$db_name;host=$db_host", 
   $db_user, $db_pass);
</pre>
<p>It's probably handy to give an example of a common database read
operation</p>
<pre>
my $sth = $dbh->prepare( "SELECT * FROM table WHERE id  = ?") 
      or die $dbh->errstr;

$sth->execute($id) or die $dbh->errstr;

while (my $hashref = $sth->fetchrow_hashref) {
   print $hashref->{id};
}
</pre>

      <div><a href="http://www.davidpashley.com/blog/programming/perl/DBI-boilerplate" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Vim Syntax Highlighting for Puppet</title>
  <link>http://www.davidpashley.com/blog/systems-administration/puppet/vim-highlighting.html</link>
  <pubDate>Wed, 26 Nov 2008 11:09 GMT</pubDate>
  <dc:date>2008-11-26T11:09:34Z</dc:date>
  <description><![CDATA[
<p>I've just set up syntax highlighting for <a
href="http://reductivelabs.com/trac/puppet">Puppet</a> manifest files,
and thought I'd share the simple steps. The first thing to do is
download the syntax file from <a
href="http://www.reductivelabs.com/downloads/puppet/puppet.vim">http://www.reductivelabs.com/downloads/puppet/puppet.vim</a>
and save this to <tt>~/.vim/syntax/puppet.vim</tt>. Now when the
filetype is set to "puppet", vim will use this syntax file.</p>
<p>That's useful, it it would be even nicer if we could make vim know
that files ending in <tt>.pp</tt> were puppet files. Turns out this is
very easy to do. You need to create a file to detect the correct
filetype when you open a file. You need to put the following lines in
<tt>~/.vim/ftdetect/puppet.vim</tt>:</p>
<pre>au BufRead,BufNewFile *.pp   setfiletype puppet</pre>
<p>Now when you load a file ending in .pp, you should get nice syntax
highlighting. You can also make vim use special settings for the puppet
filetype by creating a vim script file in one of
<tt>~/.vim/ftplugin/puppet.vim</tt>, <tt>~/.vim/ftplugin/puppet_*.vim</tt> and/or
<tt>~/.vim/ftplugin/puppet/*.vim</tt>. Vim has a lot of flexible hooks
to enable file type specific configuration; hopefully it should be
fairly easy to modify these examples for other file formats.</p>

      <div><a href="http://www.davidpashley.com/blog/systems-administration/puppet/vim-highlighting" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Setting up Ubuntu PXE booting</title>
  <link>http://www.davidpashley.com/blog/debian/pxeboot.html</link>
  <pubDate>Fri, 24 Oct 2008 06:25 GMT</pubDate>
  <dc:date>2008-10-24T06:25:16Z</dc:date>
  <description><![CDATA[
<p>I've recently had to set up a new machine, but didn't have an install
cdrom available, so I decided to use the easiest method for installing
Ubuntu; PXE booting. Here's how I did it. PXE involves setting up two
simple technologies, DHCP and TFTP. We start by setting up TFTP.</p>
<p>TFTP is <a
href="http://en.wikipedia.org/wiki/Trivial_File_Transfer_Protocol">Trivial
File Transfer Protocol</a>, a cut down version of FTP. There are a
number of TFTP servers in Debian and Ubuntu, but not all of them support
the extensions that the pxelinux bootloader used by debian-installer
need. Experience has shown that tftpd-hpa works correctly, so we'll want
to install that.</p>
<pre>ace root% <b>apt-get install tftpd-hpa</b></pre>
<p>Note: If this installs an inetd at the same time, you may need to
restart the inetd so it enables the tftpd service.</p>
<p>The tftpd will serve files out of <tt>/var/lib/tftpboot</tt>, so we
need to add some files for it to serve. You can use this script to fetch
various netboot installers from Ubuntu's servers.</p>
<pre>
#!/bin/bash

set -u
set -e

cd /var/lib/tftpboot

for dist in dapper feisty gutsy hardy intrepid; do
    mkdir -p $dist
    for arch in amd64 i386; do
        mkdir -p $dist/$arch/
        (cd $dist/$arch/ &amp;&amp; ncftpget -RT \
           ftp://archive.ubuntu.com/ubuntu/dists/$dist/main/installer-$arch/current/images/netboot/)
    done
done
</pre>
<p><small><a href="http://www.davidpashley.com/blogfiles/ubuntu-tftp-update.sh">Download ubuntu-tftp-update.sh</a></small></p>
<p>Now we need to alter our dhcpd configuration. (You are using DHCP
aren't you?) All we need to add is a group declaration to your subnet
declaration, adding a <tt>next-server</tt> and a <tt>filename</tt>
parameter. You can then add a host declaration for any machine you want
to netboot into the installer.</p>
<pre>
group { # intrepid amd64
     next-server 10.0.0.1;
     filename "intrepid/amd64/pxelinux.0";
     host foobar { hardware ethernet 00:22:15:45:cc:fa; fixed-address foobar.example.com; }
}
</pre>
<p>You'll need to restart the dhcp server so it picks up the new
setting. The <tt>next-server</tt> parameter is the name or IP address of your
tftp server. <tt>filename</tt> is the path to the bootloader. Obviously,
you can use this to pick which version of the installer you want to
run. If you do a lot of installations, it might be worth configuring
every installer you're likely to use and then move hosts in and out of
the suitable group as and when you need to install them.</p>

<p>All that's left to do now is to boot the computer and set it to boot
from the network and enjoy medialess installation.</p>


      <div><a href="http://www.davidpashley.com/blog/debian/pxeboot" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Slaves *and* Caching</title>
  <link>http://www.davidpashley.com/blog/databases/slaves-and-caching.html</link>
  <pubDate>Thu, 02 Oct 2008 17:06 GMT</pubDate>
  <dc:date>2008-10-02T17:06:38Z</dc:date>
  <description><![CDATA[
<p>Dear Lazyweb,</p>
<p>We have a web application that has quite a large database and
reasonable usage. Back in the dim and distant past, we scaled the
application by the age-old method of using several read-only slave
databases to prevent reads on the master swamping writes. This worked
well for several years, and then we introduced memcached into the mix to
improve performance by reducing the number of reads from the database.
This improved our database capacity even further.</p>

<p>Now the question has
arisen about reducing or even removing the code to read from the slaves.
I'm trying to come up with some compelling reasons to keep the
application reading from the slaves. The pros and cons I currently have
for removing the code are:</p>
<dl>
<dt>Pros</dt>
<dd>
<ul>
<li>Reduces code complexity</li>
<li>Removes consistency problems due to latency in the replication. This is less of a
problem than it used to be after we solved a problem with our
replication</li>
</ul>
</dd>
<dt>Cons</dt>
<dd>
<ul>
<li>Reduces our existing capacity</li>
<li>Cache flushes would cause huge spikes on our master server until the
cache filled up again</li>
<li>Caches wouldn't help queries with unique critera</li>
</ul>
</dd>
</dl>
<p>I would appreciate any additional reasons, pro or cons.
We already have an existing non-live slave for backups and slow
queries by developers. We would retain a slave for redundancy in the
case of master failure. I'm only looking for issues that would affect
the application.</p>

      <div><a href="http://www.davidpashley.com/blog/databases/slaves-and-caching" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Asymmetric Routing and Flow Sessions in JUNOS ES</title>
  <link>http://www.davidpashley.com/blog/networks/juniper/async-routing-flows.html</link>
  <pubDate>Wed, 20 Aug 2008 14:16 GMT</pubDate>
  <dc:date>2008-08-20T14:16:20Z</dc:date>
  <description><![CDATA[
<p>We've recently installed a couple of Juniper J-Series routers that
have the new JUNOS with Enhanced Services installed on them. During the
transition from our existing Linux routers, we started moving internal
subnets to the new routers, but when we moved the first subnet, we
discovered a problem with hosts that had addresses on two different
subnets. Connections would either connect for a minute and work and then
get a connection reset, or packets would come in to the server and leave
again, but then get swallowed in the ether. </p>
<p>I spent quite a bit of time this week reading about the security
features of the new routers, and finally came up with a solution. The
first clue was that I was getting connection reset from something on the
network, but carrying out packet sniffing on our existing routers and
the end points showed that they weren't generating it. I eventually
found the <tt>tcp-rst</tt> option, which generates a reset packet for
any non-SYN packet that doesn't match an existing flow session. JUNOS ES
does stateful packet inspection by creating a session when it sees an
initial SYN packet and then does filtering and routing based on that
flow session so it doesn't have to do it for every packet. When I turned
off the <tt>tcp-rst</tt> option on the trust zone, my connection that
worked for a minute worked again only for a minute, but this time, it
just hung, rather than dying with a connection reset. This cemented the
idea that the Juniper routers were the cause. </p>
<p>It turned out that the problem was that there was asynchronous
routing going on. A packet was coming in to 10.0.0.1/24, but the server
was also on 10.0.1.1/24 and the default route nexthop was 10.0.1.2.
Depending on which subnet we moved depended on the resulting behavour.
If we moved 10.0.0.0/24 to the new routers, they would only see the
incoming side of the conversation. If we moved 10.0.1.0/24, they would
only see the outgoing side of the conversation. If we then think how
this would work with the session-based routing and firewalling, in the
first case, the router would see the initial SYN packet, but would never
see the returning SYN-ACK packet, and after an initial timeout, decide
the flow never established and destroy the session info, resulting in
further incoming packets to be dropped. In the second case, it would
never see the SYN packet, only the SYN-ACK. This packet wouldn't belong
to an existing session, so would be blocked. The solution is to turn off
the SYN check, using:</p>
<pre>
[edit]
user@host# <b>set security flow tcp-session no-syn-check</b>
</pre>
<p>After commiting that, sessions work correctly, even without the
router seeing both sides of the connection: </p>
<pre>
user@host> <b>show security flow session destination-prefix 10.0.0.1</b>    
Session ID: 201341, Policy name: default-permit/4, Timeout: 1798
  In: 192.168.0.1/61136 --> 10.0.0.1/22;tcp, If: ge-0/0/0.7
  Out: 10.0.0.1/22 --> 192.168.0.1/61136;tcp, If: ge-0/0/1.7

1 sessions displayed
</pre>
<p>Sadly, I couldn't find much information on the <tt>no-syn-check</tt>
option, so hopefully people will find this explaination useful.</p>

      <div><a href="http://www.davidpashley.com/blog/networks/juniper/async-routing-flows" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Rebuilding a RAID array</title>
  <link>http://www.davidpashley.com/blog/linux/rebuilding-raid.html</link>
  <pubDate>Sat, 12 Jul 2008 17:54 GMT</pubDate>
  <dc:date>2008-07-12T17:54:43Z</dc:date>
  <description><![CDATA[
<p>I recently had a failed drive in my RAID1 array. I've just installed
the replacement drive and thought I'd share the method.</p>

<p>Let's look at the current situation:</p>
<pre>
root@ace:~# <b>cat /proc/mdstat</b> 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sda3[1]
      483403776 blocks [2/1] [_U]
      
md0 : active raid1 sda1[1]
      96256 blocks [2/1] [_U]
      
unused devices: &lt;none>
</pre>
<p>So we can see we have two mirrored arrays with one drive missing in both.</p>
<p>Let's see that we've recognised the second drive:</p>
<pre>
root@ace:~# <b>dmesg | grep sd</b>
[   21.465395] Driver 'sd' needs updating - please use bus_type methods
[   21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465496] sd 2:0:0:0: [sda] Write Protect is off
[   21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465571] sd 2:0:0:0: [sda] Write Protect is off
[   21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465590]  sda: sda1 sda2 sda3
[   21.487248] sd 2:0:0:0: [sda] Attached SCSI disk
[   21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487314] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487381] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487407]  sdb: unknown partition table
[   21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk
[   21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   21.793835] md: bind&lt;sda1>
[   21.858027] md: bind&lt;sda3>
</pre>
<p>So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let's give it one the same as sda. The easiest way to do this is using
<tt>sfdisk</tt>:</p>
<pre>
root@ace:~# <b>sfdisk -d /dev/sda | sfdisk /dev/sdb</b>
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an MSDOS signature
 /dev/sdb: unrecognised partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    192779     192717  fd  Linux RAID autodetect
/dev/sdb2        192780   9960299    9767520  82  Linux swap / Solaris
/dev/sdb3       9960300 976768064  966807765  fd  Linux RAID autodetect
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
</pre>
<p>If we check <tt>dmesg</tt> now to check it's worked, we'll see:</p>
<pre>
root@ace:~# <b>dmesg | grep sd</b>
...
[  224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  224.246322] sd 2:0:1:0: [sdb] Write Protect is off
[  224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  224.246686]  sdb: unknown partition table
[  227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  227.326504] sd 2:0:1:0: [sdb] Write Protect is off
[  227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  227.326708]  sdb: sdb1 sdb2 sdb3
</pre>
<p>So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:</p>
<pre>
root@ace:~# <b>mdadm /dev/md0 --add /dev/sdb1</b>
mdadm: added /dev/sdb1
root@ace:~# <b>mdadm /dev/md1 --add /dev/sdb3</b>
mdadm: added /dev/sdb3
</pre>
<p>Everything looks good. Let's check <tt>dmesg</tt>:</p>
<pre>
[  323.941542] md: bind&lt;sdb1>
[  324.038183] RAID1 conf printout:
[  324.038189]  --- wd:1 rd:2
[  324.038192]  disk 0, wo:1, o:1, dev:sdb1
[  324.038195]  disk 1, wo:0, o:1, dev:sda1
[  324.038300] md: recovery of RAID array md0
[  324.038303] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  324.038310] md: using 128k window, over a total of 96256 blocks.
[  325.417219] md: md0: recovery done.
[  325.453629] RAID1 conf printout:
[  325.453632]  --- wd:2 rd:2
[  325.453634]  disk 0, wo:0, o:1, dev:sdb1
[  325.453636]  disk 1, wo:0, o:1, dev:sda1
[  347.970105] md: bind&lt;sdb3>
[  348.004566] RAID1 conf printout:
[  348.004571]  --- wd:1 rd:2
[  348.004573]  disk 0, wo:1, o:1, dev:sdb3
[  348.004574]  disk 1, wo:0, o:1, dev:sda3
[  348.004657] md: recovery of RAID array md1
[  348.004659] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  348.004664] md: using 128k window, over a total of 483403776 blocks.
</pre>
<p>Everything still looks good. Let's sit back and watch it rebuild using the wonderfully useful <tt>watch</tt> command:</p>
<pre>
root@ace:~# <b>watch -n 1 cat /proc/mdstat</b>
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 sdb3[2] sda3[1]
      483403776 blocks [2/1] [_U]
      [=====>...............]  recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec
      
md0 : active raid1 sdb1[0] sda1[1]
      96256 blocks [2/2] [UU]
      
unused devices: &lt;none>
</pre>
<p>The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you've installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup.  I used this method when a new machine recent
didn't have enough SATA power cables and had to wait for an adaptor to
be delivered.</p>
<p><small>(Why did no one tell me about <tt>watch</tt> until recently. I wonder
how many more incredibly useful programs I've not discovered even after 10
years of using Linux</small>)</p>

      <div><a href="http://www.davidpashley.com/blog/linux/rebuilding-raid" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>index sambaSID sub</title>
  <link>http://www.davidpashley.com/blog/debian/substr-sambaSID-disallowed.html</link>
  <pubDate>Tue, 10 Jun 2008 08:13 GMT</pubDate>
  <dc:date>2008-06-10T08:13:46Z</dc:date>
  <description><![CDATA[
<p>If you get the following error:</p>
<pre>/etc/ldap/slapd.conf: line 127: substr index of attribute "sambaSID" disallowed</pre>
<p>when you run slapindex, then you haven't updated your
<tt>samba.schema</tt> to the version from Samba 3.0.23. Dapper and Edgy
had 3.0.22, so if you've recently upgraded to Hardy, you will see this
problem. The file should have an MD5 of
<tt>0e23b3ad05cd2b38a302fe61c921f300</tt>. I'm hoping this resolves
problems I have with samba not picking up group membership changes. I'll
update if it does.</p>
<p><strong>Update:</strong> Having installed the new schema and run <tt>slapindex</tt>, <tt>net rpc info</tt> shows I have twelve groups when previously it showed zero. This may not solve my group membership problems, but it can't be a step backwards.</p>

      <div><a href="http://www.davidpashley.com/blog/debian/substr-sambaSID-disallowed" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Compiled Regexes in Spamassassin 3.2</title>
  <link>http://www.davidpashley.com/blog/debian/sa-compile.html</link>
  <pubDate>Mon, 09 Jun 2008 09:39 GMT</pubDate>
  <dc:date>2008-06-09T09:39:06Z</dc:date>
  <description><![CDATA[
<p><a href="http://spamassassin.apache.org/">Spamassassin 3.2</a>, which is available in Gutsy and Lenny,  comes with a new feature to increase performance by
compiling its regular expressions using re2c. It's very quick to enable.
First, you need to install the required packages:</p>
<pre>apt-get install re2c libc6-dev gcc make</pre>
<p>Next, edit <tt>/etc/spamassassin/v320.pre</tt> and uncomment the line
that says:</p>
<pre>loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody</pre>
<p>Next pre-compile the regular expressions using <tt>sa-compile</tt>:</p>
<pre>
femme:/etc/logcheck# <strong>sa-compile</strong>
[18741] info: generic: base extraction starting. this can take a while...
[18741] info: generic: extracting from rules of type body_0
100% [===========================] 3293.83 rules/sec 00m00s DONE
100% [===========================] 650.12 bases/sec 00m01s DONE
[18741] info: body_0: 647 base strings extracted in 2 seconds
<em>[snip compiler output]</em>
make install
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /var/lib/spamassassin/compiled/3.002004/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so
Installing /tmp/.spamassassin18741hDrlUQtmp/ignored/man/man3/Mail::SpamAssassin::CompiledRegexps::body_0.3pm
Writing /var/lib/spamassassin/compiled/3.002004/auto/Mail/SpamAssassin/CompiledRegexps/body_0/.packlist
Appending installation info to /var/lib/spamassassin/compiled/3.002004/perllocal.pod
cp /tmp/.spamassassin18741hDrlUQtmp/bases_body_0.pl /var/lib/spamassassin/compiled/3.002004/bases_body_0.pl
cd /
rm -rf /tmp/.spamassassin18741hDrlUQtmp
</pre>
<p>Finally, restart spamassassin, and you should find it runs faster.
You will need to run sa-compile every time you update your rules, or
they won't take effect.</p>
<p>If you get the following warning:</p>
<pre>Can't locate Mail/SpamAssassin/CompiledRegexps/body_0.pm in @INC</pre>
<p>you forgot to run sa-compile; re-run it and the error should go
away.</p>

      <div><a href="http://www.davidpashley.com/blog/debian/sa-compile" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Apache 2.2 auth_ldap config</title>
  <link>http://www.davidpashley.com/blog/debian/apache22-auth-ldap.html</link>
  <pubDate>Thu, 22 May 2008 13:53 GMT</pubDate>
  <dc:date>2008-05-22T13:53:04Z</dc:date>
  <description><![CDATA[
<p>Apache 2.2 changed the way you configure LDAP authentication.
mod_auth_ldap was replaced with mod_authnz_ldap, so don't forget to
enable the new module and disable the old one. Because I'll always
forget, here's the new style config.</p>
<pre>
AuthType basic
AuthName "admin"
<b>AuthBasicProvider ldap</b>
AuthLDAPUrl ldap://ldap.example.com:389/ou=people,dc=example,dc=com?uid?sub
AuthLDAPGroupAttributeIsDN off
Require <b>ldap-group</b> cn=systems,ou=groups,dc=example,dc=com
AuthLDAPGroupAttribute memberUid
</pre>
<p>The sections in bold are the sections I had to change from the 2.0
config.</p>

      <div><a href="http://www.davidpashley.com/blog/debian/apache22-auth-ldap" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Not like not like not like</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/not-like-redux.html</link>
  <pubDate>Thu, 01 May 2008 07:41 GMT</pubDate>
  <dc:date>2008-05-01T07:41:54Z</dc:date>
  <description><![CDATA[
<p>I should have mentioned that my <a
href="http://www.davidpashley.com/blog/databases/mysql/not-like">previous
blog posting</a> was using MySQL
4.0(4.0.23_Debian-3ubuntu2-log). It seems that in 5.0.2 they changed the
precedence of the <tt>NOT</tt> operator to be lower than <tt>LIKE</tt>.
From the <a
href="http://dev.mysql.com/doc/refman/5.0/en/operator-precedence.html">manual</a>:</p>
<blockquote><p>The precedence shown for NOT  is as of MySQL 5.0.2. For
earlier versions, or from 5.0.2 on if the HIGH_NOT_PRECEDENCE SQL mode
is enabled, the precedence of NOT is the same as that of the !
operator.</p></blockquote>
<p>Using 5.0 (5.0.22-Debian_0ubuntu6.06.8-log), and a slightly smaller
dataset, I get:</p>
<pre>
mysql&gt; select count(*) from Table where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
|   199057 | 
+----------+
1 row in set (3.26 sec)

mysql&gt; select count(*) from Table where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
|   199057 | 
+----------+
1 row in set (0.96 sec)
</pre>
<p><a href="http://jkingdon2000.blogspot.com/">Jim Kingdon</a>
experimented with other databases and was unable to reproduce this
problem. My test with PostgreSQL 8.3:</p>
<pre>
quux=&gt; create table foo (blobid varchar(255));
CREATE TABLE
quux=&gt; insert into foo (blobid) values 
   ('5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7'), 
   (NULL), 
   ('d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2'), 
   ('845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9'),
   ('9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80'), 
   ('06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969'), 
   ('ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c'),
   (NULL),
   ('b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec'),
   ('642075963d6f69bb11c35a110dd07c2c8db54ac2d2accae7fa4a22db1d6caae9');
INSERT 0 10
quux=&gt; select count(*) from foo 
   where blobid is null or blobid not like '%-%';
 count 
-------
     3
(1 row)

quux=&gt; select count(*) from foo 
   where blobid is null or not blobid like '%-%';
 count 
-------
     3
(1 row)

quux=&gt; select not blobid from foo limit 10;
ERROR:  argument of NOT must be type boolean, not type character varying
</pre>
<p>This appears to have been the case since at least <a href="http://www.postgresql.org/docs/7.4/interactive/sql-syntax.html#SQL-PRECEDENCE-TABLE">7.4</a></p>
<p>Problems like this is going to make the transition from MySQL 4.0 to 5.x all the more fun when we get around to doing it.</p>

      <div><a href="http://www.davidpashley.com/blog/databases/mysql/not-like-redux" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Not like not like not like</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/not-like.html</link>
  <pubDate>Wed, 30 Apr 2008 15:04 GMT</pubDate>
  <dc:date>2008-04-30T15:04:01Z</dc:date>
  <description><![CDATA[
<p>Dear lazyweb,</p>
<p>I'm possibly being stupid, but can someone explain the differences
between these two queries?</p>
<pre>
mysql&gt; select count(*) from Table 
   where blobid is null or <b>not blobid like</b> '%-%';
+----------+
| count(*) |
+----------+
| 15262487 |
+----------+
1 row in set (25 min 4.18 sec)

mysql&gt; select count(*) from Table 
   where blobid is null or <b>blobid not like</b> '%-%';
+----------+
| count(*) |
+----------+
| 20044216 |
+----------+
1 row in set (24 min 54.06 sec)
</pre>
<p>For reference:</p>
<pre>
mysql&gt; select count(*) from Table where blobid is null;
+----------+
| count(*) |
+----------+
| 15262127 |
+----------+
1 row in set (24 min 7.15 sec)
</pre>
<p><strong>Update:</strong> It turns out that the former was doing <tt>(not blobid) like '%-%'</tt> which turns out to not do anything sensible:</p>
<pre>
mysql&gt; select not blobid from Table limit 10;
+------------+
| not blobid |
+------------+
|          0 |
|       NULL |
|          1 |
|          0 |
|          0 |
|          0 |
|          1 |
|       NULL |
|          1 |
|          0 |
+------------+
10 rows in set (0.02 sec)

mysql&gt; select blobid from Table limit 10;
+-------------------------------------------------------------------+
| blobid                                                            |
+-------------------------------------------------------------------+
| 5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7 |
| NULL                                                              |
| d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2 |
| 845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9 |
| 9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80 |
| 06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969 |
| ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c |
| NULL                                                              |
| b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec |
| 642075963d6f69bb11c35a110dd07c2c-8db54ac2d2accae7fa4a22db1d6caae9 |
+-------------------------------------------------------------------+
10 rows in set (0.00 sec)
</pre>
<p>The <a
href="http://dev.mysql.com/doc/refman/4.1/en/logical-operators.html#operator_not">documentation</a>
says <quote>Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the
operand is non-zero, and NOT NULL  returns NULL.</quote> but doesn't
describe the behaviour of <tt>NOT 'string'</tt>. It would appear that a
string starting with a number returns 0 and a string starting with a
letter returns 1. Either way, neither has a hyphen in.</p>

      <div><a href="http://www.davidpashley.com/blog/databases/mysql/not-like" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>User Administration under PostgreSQL 8.3</title>
  <link>http://www.davidpashley.com/blog/databases/postgresql/user-admin-8.3.html</link>
  <pubDate>Mon, 28 Apr 2008 21:54 GMT</pubDate>
  <dc:date>2008-04-28T21:54:50Z</dc:date>
  <description><![CDATA[
<p>A while ago I published an article on <a
href="http://www.davidpashley.com/articles/postgresql-user-administration.html">PostgreSQL
user administration</a>. Typically, things have changed since I wrote
that article. I thought I'd detail a couple of the differences since
I wrote that guide.</p>
<p>The major difference is that you now have roles rather than users and
you use the <tt>CREATE ROLE</tt> command to create them instead of
<tt>CREATE USER</tt>, although the latter command still works. The
command line options for the <tt>createuser</tt> command have changed as
a result too. Before superuser and the ability to create new users were
the same thing. Now you can give a role permissions to create new roles
without giving them superuser powers. The options are now -s for
superuser and -S for not superuser, -d to allow them to create
databases and -D to disallow database creation and -r to allow the new
role to create other roles and -R to prevent them. for a standard user
you probably want somethig like:</p>
<pre>createuser -S -D -R -P user</pre>
<p>The <tt>-P</tt> makes <tt>createuser</tt> ask you for a password for
the new role.</p>
<p>You can find out more information about the new role system in
PostgreSQL in the <a
href="http://www.postgresql.org/docs/8.3/interactive/user-manag.html">user
management</a> and <a
href="http://www.postgresql.org/docs/8.3/interactive/sql-createrole.html"><tt>CREATE
ROLE</tt> reference</a> sections of the manual.</p> 

      <div><a href="http://www.davidpashley.com/blog/databases/postgresql/user-admin-8.3" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Upgrading to latest Pyblosxom</title>
  <link>http://www.davidpashley.com/blog/meta/upgrade-1.4.3.html</link>
  <pubDate>Sat, 26 Apr 2008 18:33 GMT</pubDate>
  <dc:date>2008-04-26T18:33:49Z</dc:date>
  <description><![CDATA[
<p>I'm currently upgrading my blog to <a
href="http://pyblosxom.sourceforge.net/">PyBlosxom 1.4.3</a>. I apologise for
any broken links or entry flooding.</p>
<p><strong>Update:</strong> I've finished playing now. I've upgraded to
1.4.3 and I don't think I've broken anything yet.</p>

<p>I've also taken the opportunity to add a couple of plugins to add
tagging to entries and added the obligatory tag cloud to the side bar
rather than the list of months.  I'm going to make some changes to the
comment plugin later to add OpenID support. I'd be interested to know of
any other pyBlosxom plugins you find useful.</p>

<p>I did manage to make a mistake by using vim to edit entries to
add some tags rather than my wrapper script to keep timestamps the same.
This is where I'm glad I have a database table with the metadata from
all my entries to hand. A quick <tt>touch foo.txt -d 2006-06-07
19:02:57+01</tt> later and everything was fixed. Hopefully not too many
people got bitten by the few entries that had new dates for a few
minutes. Please let me know if you notice anything broken.</p>


      <div><a href="http://www.davidpashley.com/blog/meta/upgrade-1.4.3" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Violating Perl Module Namespaces</title>
  <link>http://www.davidpashley.com/blog/programming/perl/violating-namespaces.html</link>
  <pubDate>Thu, 24 Apr 2008 13:04 GMT</pubDate>
  <dc:date>2008-04-24T13:04:00Z</dc:date>
  <description><![CDATA[
<p>Perl doesn't enforce access to modules' namespaces. This would
usually be considered a bad thing, but sometimes it allows us to work
around problems in modules without changing their code. Here's a perfect
example:</p>

<p>I've been writing a script to talk to an XML-RPC endpoint, using
<a href="http://search.cpan.org/~kmacleod/Frontier-RPC-0.07b4/lib/Frontier/Client.pm">Frontier::Client</a> but for
one of the requests, the script throws the following error:</p>
<pre>wanted a data type, got `ex:i8'</pre>
<p>Turning on debugging showed the response type was indeed ex:i8, which
isn't one of the types that Frontier::Client supports.</p>
<pre>
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions"&gt;
  &lt;params&gt;
    &lt;param&gt;
      &lt;value&gt;
        &lt;ex:i8&gt;161&lt;/ex:i8&gt;
      &lt;/value&gt;
    &lt;/param&gt;
  &lt;/params&gt;
&lt;/methodResponse&gt;
</pre>
<p>Searching through the code shows Frontier::Client is a wrapper around
Frontier::RPC2 and the error message happens at the following
section:</p>
<pre>
   } elsif ($scalars{$tag}) {
       $expat-&gt;{'rpc_text'} = "";
       push @{ $expat-&gt;{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got \`$tag'\n");
   }
</pre>
<p>So we can see that it's looking up the tag into a hash called
<tt>%scalars</tt> to see if the type is a scalar type, otherwise throws
the error we saw. Looking at the top, we can see this hash:</p>
<pre>
%scalars = (
    'base64' =&gt; 1,
    'boolean' =&gt; 1,
    'dateTime.iso8601' =&gt; 1,
    'double' =&gt; 1,
    'int' =&gt; 1,
    'i4' =&gt; 1,
    'string' =&gt; 1,
);
</pre>
<p>So, if we could add <tt>ex:i8</tt> to this scalar, we could fix the
problem. We could fix the module, but that would require every user of
the script to patch their copy of the module. The alternative is to
inject something into that hash across module boundaries, which we can
do by just refering to the hash by it's complete name including the
package name. We can use:</p>
<pre>$Frontier::RPC2::scalars{'ex:i8'} = 1;</pre>
<p>Now when we run the script, everything works. It's not nice and it's
dependent on Frontier::RPC2 not changing. but it allows us to get on
with our script.</p>

      <div><a href="http://www.davidpashley.com/blog/programming/perl/violating-namespaces" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Photography In Public Areas Early Day Motion</title>
  <link>http://www.davidpashley.com/blog/politics/photography-in-public-areas-edm.html</link>
  <pubDate>Thu, 24 Apr 2008 09:49 GMT</pubDate>
  <dc:date>2008-04-24T09:49:50Z</dc:date>
  <description><![CDATA[
<p>I just emailed my <a href="http://www.epolitix.com/EN/MPWebsites/David+Lepper">MP</a> the following letter:</p>
<blockquote>
<p>Dear David Lepper,</p>

<p>I would just like to thank you for signing Auston Mitchell's Early Day
Motion 1155 <a href="http://edmi.parliament.uk/EDMi/EDMDetails.aspx?EDMID=35375&amp;SESSION=891">Photography In Public Areas</a>. I have been increasingly
concerned with reports of police action against innocent photographers,
including most recently a man assaulted by several security guards in
Stoke (<a href="http://www.flickr.com/photos/happyaslarry/2420960125/">http://www.flickr.com/photos/happyaslarry/2420960125/</a>). I'm sure
you appreciate Brighton's reputation as an artistic city and your
support for this motion shows your continued support for the
photography community in Brighton.</p>

<p>Yours sincerely,
David Pashley</p>
</blockquote>
<p>If your MP hasn't signed this EDM, I recommend you contact them to urge them
to sign it and if they have, contact them again to thank them.</p>

      <div><a href="http://www.davidpashley.com/blog/politics/photography-in-public-areas-edm" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Using In-memory tarballs with Archive::Tar</title>
  <link>http://www.davidpashley.com/blog/programming/perl/in-memory-archive-tar.html</link>
  <pubDate>Thu, 17 Apr 2008 12:38 GMT</pubDate>
  <dc:date>2008-04-17T12:38:36Z</dc:date>
  <description><![CDATA[
<p><a
href="http://search.cpan.org/~kane/Archive-Tar-1.38/lib/Archive/Tar.pm">Archive::Tar</a>
is a useful library for working with tar archives from Perl.
Unfortunately, one thing it doesn't allow is using data from memory as
the archive. From the TODO section:</p>
<blockquote><p>Allow archives to be passed in as string</p>

    <p>Currently, we only allow opened filehandles or filenames, but not
    strings. The internals would need some reworking to facilitate
    stringified archives.</p>
    </blockquote>
<p>Fortunately, it does allow you to use a filehandle. I've <a
href="http://www.davidpashley.com/articles/perl-io-objects.html">previously
mentioned</a> about how useful the IO::Handle subsystem in perl is. And we
should be able to use it in this case. The module we'll want is
<a
href="http://search.cpan.org/~gaas/IO-String-1.08/String.pm">IO::String</a>, which is a IO::Handle over a perl scalar. We can use it:</p>
<pre>my $tar = new Archive::Tar(new IO::String($data));</pre>
<p>Unfortunately when we run this now we get:</p>
<pre>Cannot read compressed format in tar-mode at Foo.pm line 41
No data could be read from file at Foo.pm line 41</pre>
<p>It turns out that this is because Archive::Tar uses IO::Zlib
internally if the file isn't uncompressed, but this doesn't provide the
ability to uncompress from a filehandle. The answer is to uncompress the
data before passing it to Archive::Tar and the easiest way to do this is
to use the <a
href="http://search.cpan.org/~pmqs/IO-Compress-Zlib-2.008/lib/IO/Uncompress/Gunzip.pm">IO::Uncompress::Gunzip</a> module, so we can rewrite our code
to:</p>
<pre>my $tar = new Archive::Tar(new IO::Uncompress::Gunzip(new IO::String($data)));</pre>
<p>Now when you run the script, Archive::Tar has an uncompressed tar
stream. Yet another situation where IO::Handles comes to the rescue.</p>

      <div><a href="http://www.davidpashley.com/blog/programming/perl/in-memory-archive-tar" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>Boilerplate code for a perl class</title>
  <link>http://www.davidpashley.com/blog/programming/perl/class-boilerplate.html</link>
  <pubDate>Thu, 17 Apr 2008 10:10 GMT</pubDate>
  <dc:date>2008-04-17T10:10:28Z</dc:date>
  <description><![CDATA[
<p>Because I always forget when I need to create a new class in
perl:</p>
<pre>package Foo::Bar;

use strict;
use warnings;

sub new {
   my $this = shift;
   my $class = ref($this) || $this;
   my $self = {};
   bless $self, $class;
   $self->initialize(@_);
   return $self;
}

sub initialize {
   my $self = shift;
}

1;</pre>
<p>If you have any useful additions I'd love to know.</p>

      <div><a href="http://www.davidpashley.com/blog/programming/perl/class-boilerplate" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

<item>
  <title>InnoDB being silently disabled</title>
  <link>http://www.davidpashley.com/blog/databases/mysql/innodb-disabled.html</link>
  <pubDate>Thu, 03 Apr 2008 12:26 GMT</pubDate>
  <dc:date>2008-04-03T12:26:36Z</dc:date>
  <description><![CDATA[
<p>Regular viewers will know that I don't think favourably of MySQL.
Here is yet another reason. Let's create an InnoDB table:</p>
<pre>
mysql&gt; CREATE TABLE `User_` (
mysql&gt; ...
mysql&gt; ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Query OK, 0 rows affected, 1 warning (0.04 sec) </pre>
<p>One warning, but we're running this as part of an import, so we'll
fail to spot this and even if we did, we wouldn't be able to get it back
out of mysql because <tt>SHOW WARNINGS</tt> only shows the last command.
So let's look at the table we just created:
</p>
<pre>
mysql&gt; show create table User_\G
*************************** 1. row ***************************
       Table: User_
Create Table: CREATE TABLE `User_` (
...
) ENGINE=MyISAM DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
</pre>
<p>Eh? what's going on? We asked for InnoDB, but have got a MyISAM
table. Lets look at the engines available.</p>
<pre>
mysql> show engines;
+------------+----------+----------------------------------------------------------------+
| Engine     | Support  | Comment                                                        |
+------------+----------+----------------------------------------------------------------+
| MyISAM     | DEFAULT  | Default engine as of MySQL 3.23 with great performance         | 
| MEMORY     | YES      | Hash based, stored in memory, useful for temporary tables      | 
| InnoDB     | DISABLED | Supports transactions, row-level locking, and foreign keys     | 
| BerkeleyDB | NO       | Supports transactions and page-level locking                   | 
| BLACKHOLE  | NO       | /dev/null storage engine (anything you write to it disappears) | 
| EXAMPLE    | NO       | Example storage engine                                         | 
| ARCHIVE    | YES      | Archive storage engine                                         | 
| CSV        | YES      | CSV storage engine                                             | 
| ndbcluster | DISABLED | Clustered, fault-tolerant, memory-based tables                 | 
| FEDERATED  | YES      | Federated MySQL storage engine                                 | 
| MRG_MYISAM | YES      | Collection of identical MyISAM tables                          | 
| ISAM       | NO       | Obsolete storage engine                                        | 
+------------+----------+----------------------------------------------------------------+
12 rows in set (0.00 sec)</pre>
<p>Oh, so innodb has been disabled. We can fix that easily by removing
<tt>skip-innodb</tt> from <tt>my.cnf</tt>.</p>
<pre>root@cmsdb01:/var/log# grep skip-innodb /etc/mysql/my.cnf
root@cmsdb01:/var/log#</pre>
<p>But hang on a second, that's not in the config file. What's going on?
It turns out that the reason InnoDB is disabled is because of the
<tt>innodb_log_file_size</tt> setting not matching the files on disk.</p>
<pre>
root@cmsdb01:/var/log# grep innodb_log_file_size /etc/mysql/my.cnf
innodb_log_file_size            = 512M
root@cmsdb01:/var/log# ls -lh /var/lib/mysql/ib_logfile*
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile0
-rw-rw---- 1 mysql mysql 5.0M 2006-12-19 18:39 /var/lib/mysql/ib_logfile1
</pre>
<p>Rumour has it that you can just stop MySQL, delete these log files
and start MySQL again. I'm yet to try this as the server in question is
in production use. The alternative is to change the 
<tt>innodb_log_file_size</tt> setting to match the file.</p>

<p>So in summary the problems with MySQL are:
<ul>
<li>Not logging warnings anywhere useful.</li>
<li>Converting engine types with a warning rather than throwing an
error. This can be fixed by setting <tt>sql_mode</tt> to include
<tt>NO_ENGINE_SUBSTITUTION</tt>.</li>
<li>Starting up and disabling InnoDB when there is a problem rather than
failing to start, giving a false impression that everything is
working.</li>
</ul>
MySQL has not impressed me this week.
</p>

      <div><a href="http://www.davidpashley.com/blog/databases/mysql/innodb-disabled" title="Permalink">Read Comments</a> </div>
]]></description>
</item>

</channel>
</rss>

