David Pashley.com

Asymmetric Routing and Flow Sessions in JUNOS ES

August 20, 2008

We’ve recently installed a couple of Juniper J-Series routers that
have the new JUNOS with Enhanced Services installed on them. During the
transition from our existing Linux routers, we started moving internal
subnets to the new routers, but when we moved the first subnet, we
discovered a problem with hosts that had addresses on two different
subnets. Connections would either connect for a minute and work and then
get a connection reset, or packets would come in to the server and leave
again, but then get swallowed in the ether.

I spent quite a bit of time this week reading about the security
features of the new routers, and finally came up with a solution. The
first clue was that I was getting connection reset from something on the
network, but carrying out packet sniffing on our existing routers and
the end points showed that they weren’t generating it. I eventually
found the tcp-rst option, which generates a reset packet for
any non-SYN packet that doesn’t match an existing flow session. JUNOS ES
does stateful packet inspection by creating a session when it sees an
initial SYN packet and then does filtering and routing based on that
flow session so it doesn’t have to do it for every packet. When I turned
off the tcp-rst option on the trust zone, my connection that
worked for a minute worked again only for a minute, but this time, it
just hung, rather than dying with a connection reset. This cemented the
idea that the Juniper routers were the cause.

It turned out that the problem was that there was asynchronous
routing going on. A packet was coming in to 10.0.0.1/24, but the server
was also on 10.0.1.1/24 and the default route nexthop was 10.0.1.2.
Depending on which subnet we moved depended on the resulting behavour.
If we moved 10.0.0.0/24 to the new routers, they would only see the
incoming side of the conversation. If we moved 10.0.1.0/24, they would
only see the outgoing side of the conversation. If we then think how
this would work with the session-based routing and firewalling, in the
first case, the router would see the initial SYN packet, but would never
see the returning SYN-ACK packet, and after an initial timeout, decide
the flow never established and destroy the session info, resulting in
further incoming packets to be dropped. In the second case, it would
never see the SYN packet, only the SYN-ACK. This packet wouldn’t belong
to an existing session, so would be blocked. The solution is to turn off
the SYN check, using:

[edit]
user@host# set security flow tcp-session no-syn-check

After commiting that, sessions work correctly, even without the
router seeing both sides of the connection:

user@host> show security flow session destination-prefix 10.0.0.1
Session ID: 201341, Policy name: default-permit/4, Timeout: 1798
  In: 192.168.0.1/61136 --> 10.0.0.1/22;tcp, If: ge-0/0/0.7
  Out: 10.0.0.1/22 --> 192.168.0.1/61136;tcp, If: ge-0/0/1.7

1 sessions displayed

Sadly, I couldn’t find much information on the no-syn-check
option, so hopefully people will find this explaination useful.

Rebuilding a RAID array

July 12, 2008

I recently had a failed drive in my RAID1 array. I’ve just installed
the replacement drive and thought I’d share the method.

Let’s look at the current situation:

root@ace:~# cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda3[1]
      483403776 blocks [2/1] [_U]

md0 : active raid1 sda1[1]
      96256 blocks [2/1] [_U]

unused devices: <none>

So we can see we have two mirrored arrays with one drive missing in both.

Let’s see that we’ve recognised the second drive:

root@ace:~# dmesg | grep sd
[   21.465395] Driver 'sd' needs updating - please use bus_type methods
[   21.465486] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465496] sd 2:0:0:0: [sda] Write Protect is off
[   21.465498] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465512] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465562] sd 2:0:0:0: [sda] 976773168 512-byte hardware sectors (500108 MB)
[   21.465571] sd 2:0:0:0: [sda] Write Protect is off
[   21.465573] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[   21.465587] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.465590]  sda: sda1 sda2 sda3
[   21.487248] sd 2:0:0:0: [sda] Attached SCSI disk
[   21.487303] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487314] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487317] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487331] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487371] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[   21.487381] sd 2:0:1:0: [sdb] Write Protect is off
[   21.487382] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[   21.487403] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[   21.487407]  sdb: unknown partition table
[   21.502763] sd 2:0:1:0: [sdb] Attached SCSI disk
[   21.506690] sd 2:0:0:0: Attached scsi generic sg0 type 0
[   21.506711] sd 2:0:1:0: Attached scsi generic sg1 type 0
[   21.793835] md: bind<sda1>
[   21.858027] md: bind<sda3>

So, sda has three partitions, sda1, sda2 and sda3, and sdb has no partition
table. Let’s give it one the same as sda. The easiest way to do this is using
sfdisk:

root@ace:~# sfdisk -d /dev/sda | sfdisk /dev/sdb
Checking that no-one is using this disk right now ...
OK

Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track

sfdisk: ERROR: sector 0 does not have an MSDOS signature
 /dev/sdb: unrecognised partition table type
Old situation:
No partitions found
New situation:
Units = sectors of 512 bytes, counting from 0

   Device Boot    Start       End   #sectors  Id  System
/dev/sdb1   *        63    192779     192717  fd  Linux RAID autodetect
/dev/sdb2        192780   9960299    9767520  82  Linux swap / Solaris
/dev/sdb3       9960300 976768064  966807765  fd  Linux RAID autodetect
/dev/sdb4             0         -          0   0  Empty
Successfully wrote the new partition table

Re-reading the partition table ...

If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)

If we check dmesg now to check it’s worked, we’ll see:

root@ace:~# dmesg | grep sd
...
[  224.246102] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  224.246322] sd 2:0:1:0: [sdb] Write Protect is off
[  224.246325] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  224.246547] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  224.246686]  sdb: unknown partition table
[  227.326278] sd 2:0:1:0: [sdb] 976773168 512-byte hardware sectors (500108 MB)
[  227.326504] sd 2:0:1:0: [sdb] Write Protect is off
[  227.326507] sd 2:0:1:0: [sdb] Mode Sense: 00 3a 00 00
[  227.326703] sd 2:0:1:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[  227.326708]  sdb: sdb1 sdb2 sdb3

So, now we have identical partition tables. The next thing to do is to add the new partitions to the array:

root@ace:~# mdadm /dev/md0 --add /dev/sdb1
mdadm: added /dev/sdb1
root@ace:~# mdadm /dev/md1 --add /dev/sdb3
mdadm: added /dev/sdb3

Everything looks good. Let’s check dmesg:

[  323.941542] md: bind<sdb1>
[  324.038183] RAID1 conf printout:
[  324.038189]  --- wd:1 rd:2
[  324.038192]  disk 0, wo:1, o:1, dev:sdb1
[  324.038195]  disk 1, wo:0, o:1, dev:sda1
[  324.038300] md: recovery of RAID array md0
[  324.038303] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  324.038305] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  324.038310] md: using 128k window, over a total of 96256 blocks.
[  325.417219] md: md0: recovery done.
[  325.453629] RAID1 conf printout:
[  325.453632]  --- wd:2 rd:2
[  325.453634]  disk 0, wo:0, o:1, dev:sdb1
[  325.453636]  disk 1, wo:0, o:1, dev:sda1
[  347.970105] md: bind<sdb3>
[  348.004566] RAID1 conf printout:
[  348.004571]  --- wd:1 rd:2
[  348.004573]  disk 0, wo:1, o:1, dev:sdb3
[  348.004574]  disk 1, wo:0, o:1, dev:sda3
[  348.004657] md: recovery of RAID array md1
[  348.004659] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
[  348.004660] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
[  348.004664] md: using 128k window, over a total of 483403776 blocks.

Everything still looks good. Let’s sit back and watch it rebuild using the wonderfully useful watch command:

root@ace:~# watch -n 1 cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb3[2] sda3[1]
      483403776 blocks [2/1] [_U]
      [=====>...............]  recovery = 26.0% (126080960/483403776) finish=96.2min speed=61846K/sec

md0 : active raid1 sdb1[0] sda1[1]
      96256 blocks [2/2] [UU]

unused devices: <none>

The Ubuntu and Debian installers will allow you create RAID1 arrays
with less drives than you actually have, so you can use this technique
if you plan to add an additional drive after you’ve installed the
system. Just tell it the eventual number of drives, but only select the
available partitions during RAID setup. I used this method when a new machine recent
didn’t have enough SATA power cables and had to wait for an adaptor to
be delivered.

(Why did no one tell me about watch until recently. I wonder
how many more incredibly useful programs I’ve not discovered even after 10
years of using Linux)

index sambaSID sub

June 10, 2008

If you get the following error:

/etc/ldap/slapd.conf: line 127: substr index of attribute "sambaSID" disallowed

when you run slapindex, then you haven’t updated your
samba.schema to the version from Samba 3.0.23. Dapper and Edgy
had 3.0.22, so if you’ve recently upgraded to Hardy, you will see this
problem. The file should have an MD5 of
0e23b3ad05cd2b38a302fe61c921f300. I’m hoping this resolves
problems I have with samba not picking up group membership changes. I’ll
update if it does.

Update: Having installed the new schema and run slapindex, net rpc info shows I have twelve groups when previously it showed zero. This may not solve my group membership problems, but it can’t be a step backwards.

Compiled Regexes in Spamassassin 3.2

June 9, 2008

Spamassassin 3.2, which is available in Gutsy and Lenny, comes with a new feature to increase performance by
compiling its regular expressions using re2c. It’s very quick to enable.
First, you need to install the required packages:

apt-get install re2c libc6-dev gcc make

Next, edit /etc/spamassassin/v320.pre and uncomment the line
that says:

loadplugin Mail::SpamAssassin::Plugin::Rule2XSBody

Next pre-compile the regular expressions using sa-compile:

femme:/etc/logcheck# sa-compile
[18741] info: generic: base extraction starting. this can take a while...
[18741] info: generic: extracting from rules of type body_0
100% [===========================] 3293.83 rules/sec 00m00s DONE
100% [===========================] 650.12 bases/sec 00m01s DONE
[18741] info: body_0: 647 base strings extracted in 2 seconds
[snip compiler output]
make install
Files found in blib/arch: installing files in blib/lib into architecture dependent library tree
Installing /var/lib/spamassassin/compiled/3.002004/auto/Mail/SpamAssassin/CompiledRegexps/body_0/body_0.so
Installing /tmp/.spamassassin18741hDrlUQtmp/ignored/man/man3/Mail::SpamAssassin::CompiledRegexps::body_0.3pm
Writing /var/lib/spamassassin/compiled/3.002004/auto/Mail/SpamAssassin/CompiledRegexps/body_0/.packlist
Appending installation info to /var/lib/spamassassin/compiled/3.002004/perllocal.pod
cp /tmp/.spamassassin18741hDrlUQtmp/bases_body_0.pl /var/lib/spamassassin/compiled/3.002004/bases_body_0.pl
cd /
rm -rf /tmp/.spamassassin18741hDrlUQtmp

Finally, restart spamassassin, and you should find it runs faster.
You will need to run sa-compile every time you update your rules, or
they won’t take effect.

If you get the following warning:

Can't locate Mail/SpamAssassin/CompiledRegexps/body_0.pm in @INC

you forgot to run sa-compile; re-run it and the error should go
away.

Apache 2.2 auth_ldap config

May 22, 2008

Apache 2.2 changed the way you configure LDAP authentication.
mod_auth_ldap was replaced with mod_authnz_ldap, so don’t forget to
enable the new module and disable the old one. Because I’ll always
forget, here’s the new style config.

AuthType basic
AuthName "admin"
AuthBasicProvider ldap
AuthLDAPUrl ldap://ldap.example.com:389/ou=people,dc=example,dc=com?uid?sub
AuthLDAPGroupAttributeIsDN off
Require ldap-group cn=systems,ou=groups,dc=example,dc=com
AuthLDAPGroupAttribute memberUid

The sections in bold are the sections I had to change from the 2.0
config.

Not like not like not like

May 1, 2008

I should have mentioned that my previous
blog posting was using MySQL
4.0(4.0.23_Debian-3ubuntu2-log). It seems that in 5.0.2 they changed the
precedence of the NOT operator to be lower than LIKE.
From the manual:

The precedence shown for NOT is as of MySQL 5.0.2. For
earlier versions, or from 5.0.2 on if the HIGH_NOT_PRECEDENCE SQL mode
is enabled, the precedence of NOT is the same as that of the !
operator.

Using 5.0 (5.0.22-Debian_0ubuntu6.06.8-log), and a slightly smaller
dataset, I get:

mysql> select count(*) from Table where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (3.26 sec)

mysql> select count(*) from Table where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
|   199057 |
+----------+
1 row in set (0.96 sec)

Jim Kingdon
experimented with other databases and was unable to reproduce this
problem. My test with PostgreSQL 8.3:

quux=> create table foo (blobid varchar(255));
CREATE TABLE
quux=> insert into foo (blobid) values
   ('5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7'),
   (NULL),
   ('d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2'),
   ('845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9'),
   ('9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80'),
   ('06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969'),
   ('ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c'),
   (NULL),
   ('b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec'),
   ('642075963d6f69bb11c35a110dd07c2c8db54ac2d2accae7fa4a22db1d6caae9');
INSERT 0 10
quux=> select count(*) from foo
   where blobid is null or blobid not like '%-%';
 count
-------
     3
(1 row)

quux=> select count(*) from foo
   where blobid is null or not blobid like '%-%';
 count
-------
     3
(1 row)

quux=> select not blobid from foo limit 10;
ERROR:  argument of NOT must be type boolean, not type character varying

This appears to have been the case since at least 7.4

Problems like this is going to make the transition from MySQL 4.0 to 5.x all the more fun when we get around to doing it.

Not like not like not like

April 30, 2008

Dear lazyweb,

I’m possibly being stupid, but can someone explain the differences
between these two queries?

mysql> select count(*) from Table
   where blobid is null or not blobid like '%-%';
+----------+
| count(*) |
+----------+
| 15262487 |
+----------+
1 row in set (25 min 4.18 sec)

mysql> select count(*) from Table
   where blobid is null or blobid not like '%-%';
+----------+
| count(*) |
+----------+
| 20044216 |
+----------+
1 row in set (24 min 54.06 sec)

For reference:

mysql> select count(*) from Table where blobid is null;
+----------+
| count(*) |
+----------+
| 15262127 |
+----------+
1 row in set (24 min 7.15 sec)

Update: It turns out that the former was doing (not blobid) like '%-%' which turns out to not do anything sensible:

mysql> select not blobid from Table limit 10;
+------------+
| not blobid |
+------------+
|          0 |
|       NULL |
|          1 |
|          0 |
|          0 |
|          0 |
|          1 |
|       NULL |
|          1 |
|          0 |
+------------+
10 rows in set (0.02 sec)

mysql> select blobid from Table limit 10;
+-------------------------------------------------------------------+
| blobid                                                            |
+-------------------------------------------------------------------+
| 5cd1237469cc4b52ca094e215156c582-9ef460ac4134c600a4d2382c4b0acee7 |
| NULL                                                              |
| d20cb4037f8f9ab1de5de264660f005c-2c34209dcfb39251cf7c16bb6754bbd2 |
| 845a8d06719d8bad521455a8dd47745c-095d9a0831433c92cd269e14e717b3a9 |
| 9580ed23f34dd68d35da82f7b2a293d6-bf39df7509d977a1de767340536ebe80 |
| 06c9521472cdac02a2d4b2a18f8bec0f-0a8a28d3b63df54860055f1d1de92969 |
| ed3cd0dd9b55f76db7544eeb64f3cfa0-80a6a3eb6d73c0a58f88b7c332866d5c |
| NULL                                                              |
| b339f6545651fbfa49fa500b7845c4ce-6defb5ffc188b8f72f1aa10bbd5c6bec |
| 642075963d6f69bb11c35a110dd07c2c-8db54ac2d2accae7fa4a22db1d6caae9 |
+-------------------------------------------------------------------+
10 rows in set (0.00 sec)

The documentation
says Logical NOT. Evaluates to 1 if the operand is 0, to 0 if the
operand is non-zero, and NOT NULL returns NULL. but doesn’t
describe the behaviour of NOT 'string'. It would appear that a
string starting with a number returns 0 and a string starting with a
letter returns 1. Either way, neither has a hyphen in.

User Administration under PostgreSQL 8.3

April 28, 2008

A while ago I published an article on PostgreSQL
user administration. Typically, things have changed since I wrote
that article. I thought I’d detail a couple of the differences since
I wrote that guide.

The major difference is that you now have roles rather than users and
you use the CREATE ROLE command to create them instead of
CREATE USER, although the latter command still works. The
command line options for the createuser command have changed as
a result too. Before superuser and the ability to create new users were
the same thing. Now you can give a role permissions to create new roles
without giving them superuser powers. The options are now -s for
superuser and -S for not superuser, -d to allow them to create
databases and -D to disallow database creation and -r to allow the new
role to create other roles and -R to prevent them. for a standard user
you probably want somethig like:

createuser -S -D -R -P user

The -P makes createuser ask you for a password for
the new role.

You can find out more information about the new role system in
PostgreSQL in the user
management and CREATE ROLE reference sections of the manual.

Upgrading to latest Pyblosxom

April 26, 2008

I’m currently upgrading my blog to PyBlosxom 1.4.3. I apologise for
any broken links or entry flooding.

Update: I’ve finished playing now. I’ve upgraded to
1.4.3 and I don’t think I’ve broken anything yet.

I’ve also taken the opportunity to add a couple of plugins to add
tagging to entries and added the obligatory tag cloud to the side bar
rather than the list of months. I’m going to make some changes to the
comment plugin later to add OpenID support. I’d be interested to know of
any other pyBlosxom plugins you find useful.

I did manage to make a mistake by using vim to edit entries to
add some tags rather than my wrapper script to keep timestamps the same.
This is where I’m glad I have a database table with the metadata from
all my entries to hand. A quick touch foo.txt -d 2006-06-07 19:02:57+01 later and everything was fixed. Hopefully not too many
people got bitten by the few entries that had new dates for a few
minutes. Please let me know if you notice anything broken.

Violating Perl Module Namespaces

April 24, 2008

Perl doesn’t enforce access to modules’ namespaces. This would
usually be considered a bad thing, but sometimes it allows us to work
around problems in modules without changing their code. Here’s a perfect
example:

I’ve been writing a script to talk to an XML-RPC endpoint, using
Frontier::Client but for
one of the requests, the script throws the following error:

wanted a data type, got `ex:i8'

Turning on debugging showed the response type was indeed ex:i8, which
isn’t one of the types that Frontier::Client supports.

<?xml version="1.0" encoding="UTF-8"?>
<methodResponse xmlns:ex="http://ws.apache.org/xmlrpc/namespaces/extensions">
  <params>
    <param>
      <value>
        <ex:i8>161</ex:i8>
      </value>
    </param>
  </params>
</methodResponse>

Searching through the code shows Frontier::Client is a wrapper around
Frontier::RPC2 and the error message happens at the following
section:

   } elsif ($scalars{$tag}) {
       $expat->{'rpc_text'} = "";
       push @{ $expat->{'rpc_state'} }, 'cdata';
   } else {
       Frontier::RPC2::die($expat, "wanted a data type, got `$tag'n");
   }

So we can see that it’s looking up the tag into a hash called
%scalars to see if the type is a scalar type, otherwise throws
the error we saw. Looking at the top, we can see this hash:

%scalars = (
    'base64' => 1,
    'boolean' => 1,
    'dateTime.iso8601' => 1,
    'double' => 1,
    'int' => 1,
    'i4' => 1,
    'string' => 1,
);

So, if we could add ex:i8 to this scalar, we could fix the
problem. We could fix the module, but that would require every user of
the script to patch their copy of the module. The alternative is to
inject something into that hash across module boundaries, which we can
do by just refering to the hash by it’s complete name including the
package name. We can use:

$Frontier::RPC2::scalars{'ex:i8'} = 1;

Now when we run the script, everything works. It’s not nice and it’s
dependent on Frontier::RPC2 not changing. but it allows us to get on
with our script.