David Pashley.com -David Pashley.com

NullPointerExceptions in Xerces-J

January 18, 2016

Xerces is an XML library for several languages, but if a very common library in Java.

I recently came across a problem with code intermittently throwing a NullPointerException inside the library:

[sourcecode lang=”text”]java.lang.NullPointerException
at org.apache.xerces.dom.ParentNode.nodeListItem(Unknown Source)
at org.apache.xerces.dom.ParentNode.item(Unknown Source)
at com.example.xml.Element.getChildren(Element.java:377)
at com.example.xml.Element.newChildElementHelper(Element.java:229)
at com.example.xml.Element.newChildElement(Element.java:180)
…
[/sourcecode]You may also find the NullPointerException in ParentNode.nodeListGetLength() and other locations in ParentNode.

Debugging this was not helped by the fact that the xercesImpl.jar is stripped of line numbers, so I couldn’t find the exact issue. After some searching, it appeared that the issue was down to the fact that Xerces is not thread-safe. ParentNode caches iterations through the NodeList of children to speed up performance and stores them in the Node’s Document object. In multi-threaded applications, this can lead to race conditions and NullPointerExceptions. And because it’s a threading issue, the problem is intermittent and hard to track down.

The solution is to synchronise your code on the DOM, and this means the Document object, everywhere you access the nodes. I’m not certain exactly which methods need to be protected, but I believe it needs to be at least any function that will iterate a NodeList. I would start by protecting every access and testing performance, and removing some if needed.

[sourcecode lang=”java”]/**
* Returns the concatenation of all the text in all child nodes
* of the current element.
*/
public String getText() {
StringBuilder result = new StringBuilder();

synchronized ( m_element.getOwnerDocument()) {
NodeList nl = m_element.getChildNodes();
for (int i = 0; i < nl.getLength(); i++) {
Node n = nl.item(i);

if (n != null && n.getNodeType() == org.w3c.dom.Node.TEXT_NODE) {
result.append(((CharacterData) n).getData());
}
}
}

return result.toString();
}[/sourcecode]Notice the “synchronized ( m_element.getOwnerDocument()) {}” block around the section that deals with the DOM. The NPE would normally be thrown on the nl.getLength() or nl.item() calls.

Since putting in the synchronized blocks, we’ve gone from having 78 NPEs between 2:30am and 3:00am, to having zero in the last 12 hours, so I think it’s safe to say, this has drastically reduced the problem.

Working with development servers

April 23, 2014

I can’t believe that this is not a solved problem by now, but my Google-fu is failing me. I’m looking for a decent, working extension for Chrome that can redirect a list of hosts to a different server while setting the Host: header to the right address. Everything I’ve found so far assumes that you’re running the servers on different urls. I’m using the same URL on different servers and don’t want to mess around with /etc/hosts.

Please tell me something exists to do this?

Bad Password Policies

April 16, 2014

After the whole Heartbleed fiasco, I’ve decided to continue my march towards improving my online security. I’d already begun the process of using LastPass to store my passwords and generate random passwords for each site, but I hadn’t completed the process, with some sites still using the same passwords, and some having less than ideal strength passwords, so I spent some time today improving my password position. Here’s some of the bad examples of password policy I’ve discovered today.

First up we have Live.com. A maximum of 16 characters from the Microsoft auth service. Seems to accept any character though.

This excellent example is from creditexpert.co.uk, one of the credit agencies here in the UK. They not only restrict to 20 characters, they restrict you to @, ., _ or |. So much for teaching people how to protect themselves online.

Here’s Tesco.com after attempting to change my password to “QvHn#9#kDD%cdPAQ4&b&ACb4x%48#b”. If you can figure out how this violates their rules, I’d love to know. And before you ask, I tried without numbers and that still failed so it can’t be the “three and only three” thing. The only other idea might be that they meant “‘i.e.” rather than “e.g.”, but I didn’t test that.

Edit: Here is a response from Tesco on Twitter:

Here’s a poor choice from ft.com, refusing to accept non-alphanumeric characters. On the plus side they did allow the full 30 characters in the password.

The finest example of a poor security policy is a company who will remain nameless due to their utter lack of security. Not only did they not use HTTPS, they accepted a 30 character password and silently truncated it to 20 characters. The reason I know this is because when I logged out and tried to log in again and then used the “forgot my password” option, they emailed me the password in plain text.

I have also been setting up two-factor authentication where possible. Most sites use the Google Authenticator application on your mobile to give you a 6 digit code to type in in addition to your password. I highly recommend you set it up too. There’s a useful list of sites that implement 2FA and links to their documentation at http://twofactorauth.org/.

I realise that my choice LastPass requires me to trust them, but I think the advantages outweigh the disadvantages of having many sites using the same passwords and/or low strength passwords. I know various people cleverer than me have looked into their system and failed to find any obvious flaws.

Remember people, when you implement a password, allow the following things:

Any length of password. You don’t have to worry about length in your database, because when you hash the password, it will be a fixed length. You are hashing your passwords aren’t you?
Any character. The more possible characters that can be in your passwords, the harder it will be to brute force, as you are increasing the number of permutations a hacker needs to try.

If you are going to place restrictions, please make sure the documentation matches the implementation, provide a client-side implementation to match and provide quick feedback to the user, and make sure you explicitly say what is wrong with the password, rather than referring back to the incorrect documentation.

There are also many JS password strength meters available to show how secure the inputted passwords are. They are possibly a better way of providing feedback about security than having arbitrary policies that actually harm your security. As someone said to me on twitter, it’s not like “password is too strong” was ever a bad thing.

A New Chapter

September 23, 2013

It’s been a while since I posted anything to my personal site, but I figured I should update with the news that I’m leaving Brighton (and the UK) after nearly nine years living by the seaside. I’ll be sad to leave this city, which is the greatest place to live in the country, but I have to opportunity to go explore the world and I’d be crazy to let it pass me by.

So what am I doing? In ten short days, I plan to sell all my possessions bar those I need to live and work day to day and will be moving to Spain for three months. I’m renting a flat in Madrid, where I’ll continue to work for my software development business and set about improving both my Spanish and my fitness.

If you want to follow my adventures, or read about the reasons for my change, then check out the Experimental Nomad website.

Multiple Crimes

January 29, 2011

mysql> select "a" = "A";
+-----------+
| "a" = "A" |
+-----------+
|         1 |
+-----------+
1 row in set (0.00 sec)

WTF? (via Nuxeo)

Letter to my MP regarding the Digital Economy Bill

March 17, 2010

I have just sent the following email to my MP, David Lepper MP, outlining my concerns about the Digital Economy Bill. I urge you to write to your MP with a similar letter.

Open Rights Group’s guide to writing to your MP

From: David Pashley <david@davidpashley.com>
To: David Lepper
Cc:
Bcc:
Subject: Digital Economy Bill
Reply-To:

Dear Mr Lepper,

I'm writing to you so express my concern at the Digital Economy Bill
which is currently working its way through the House of Commons. I
believe that the bill as it stands will have a negative effect on
the digital economy that the UK and in particular Brighton have
worked so hard to foster.

Section 4-17 deals with disconnecting people reported as infringing
copyright. As it stands, this section will result in the possibility
that my internet connection could be disconnected as a result of the
actions of my flatmate. My freelance web development business is
inherently linked to my access of the Internet. I currently allow my
landlady to share my internet access with her holiday flat above me.
I will have to stop this arrangement for fear of a tourist's actions
jeopardising my business.

This section will also result in the many pubs and cafes, much
favoured by Brighton's freelancers, from removing their free wifi. I
have often used my local pub's wifi when I needed a change of
scenery. I know a great many freelancers use Cafe Delice in the
North Laine as a place to meet other freelancers and discuss
projects while drinking coffee and working.

Section 18 deals with ISPs being required to prevent access to sites
hosting copyrighted material. The ISPs can insist on a court
injunction forcing them to prevent access. Unfortunately, a great
many ISPs will not want to deal with the costs of any court
proceedings and will just block the site in question. A similar law
in the Unitied States, the Digital Millenium Copyright Act (DMCA)
has been abused time and time again by spurious copyright claims to
silence critics or embarrassments.  A recent case is Microsoft
shutting down the entire Cryptome.org website because they were
embarrassed by a document they had hosted.  There are many more
examples of abuse at http://www.chillingeffects.org/

A concern is that there's no requirement for the accuser to prove
infringement has occured, nor is there a valid defense that a user
has done everything possible to prevent infringement.

There are several ways to reduce copyright infringement of music and
movies without introducing new legislation. The promotion of legal
services like iTunes and spotify, easier access to legal media, like
Digital Rights Management free music. Many of the record labels and
movie studios are failing to promote competing legal services which
many people would use if they were aware of them. A fairer
alternative to disconnection is a fine through the courts.

You can find further information on the effects of the Digital
Economy Bill at http://www.openrightsgroup.org/ and
http://news.bbc.co.uk/1/hi/technology/8544935.stm

The bill has currently passed the House of Lords and its first
reading in the Commons. There is a danger that without MPs demanding
to scrutinise this bill, this damaging piece of legislation will be
rushed through Parliament before the general election.

I ask you to demand your right to debate this bill and to amend the
bill to remove sections 4-18. I would also appreciate a response to
this email. If you would like to discuss the issues I've raised
further, I can be contacted on 01273 xxxxxx or 07966 xxx xxx or via
email at this address.

Thank you for your time.

--
David Pashley
david@davidpashley.com

Mod_fastcgi and external PHP

March 7, 2010

Has anyone managed to get a standard version of mod_fastcgi work
correctly with FastCGIExternalServer? There seems to be a
complete lack of documentation on how to get this to work. I have
managed to get it working by removing some code which appears to
completely break AddHandler. However, people on the FastCGI
list told me I was wrong for making it work. So, if anyone has managed
to get it to work, please show me some working config.

Reducing Coupling between modules

February 25, 2010

In the past, several of my Puppet modules have
been tightly coupled. A perfect example is Apache and Munin. When I
install Apache, I want munin graphs set up. As a result my apache class
has the following snippet in it:

munin::plugin { "apache_accesses": }
munin::plugin { "apache_processes": }
munin::plugin { "apache_volume": }

This should make sure that these three plugins are installed and that
munin-node is restarted to pick them up. The define was implemented like
this:

define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   include munin::node

   file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
   }
}

(Note: this is a slight simplification of the define). As you can
see, the define includes munin::node, as it needs the definition of the
munin-node service and package. As a result of this, installing Apache
drags in munin-node on your server too. It would be much nicer if the
apache class only installed the munin plugins if you also install munin
on the server.

It turns out that is is possible, using virtual
resources. Virtual resources allow you to define resources in one
place, but not make them happen unless you realise them. Using this, we
can make the file resource in the munin::plugin virtual and realise it
in our munin::node class. Our new munin::plugin looks like:

define munin::plugin (
      $enable = true,
      $plugin_name = false,
      ) {

   # removed "include munin::node"

   # Added @ in front of the resource to declare it as virtual
   @file { "/etc/munin/plugins/$name":
      ensure => $enable ? {
         true => $plugin_name ? {
            false => "/usr/share/munin/plugins/$name",
            default => "/usr/share/munin/plugins/$plugin_name"
         },
         default => absent
      },
      links => manage,
      require => Package["munin-node"],
      notify => Service["munin-node"],
      tag => munin-plugin,
   }
}

We add the following line to our munin::node class:

File<| tag == munin-plugin |>

The odd syntax in the munin::node class realises all the
virtual resources that match the filter, in this case, any that is
tagged munin-plugin. We’ve had to define this tag ourself, as
the auto-generated tags don’t seem to work. You’ll also notice that
we’ve removed the munin::node include from the
munin::plugin define, which means that we no longer install
munin-node just by using the plugin define. I’ve used a similar
technique for logcheck, so additional rules are not installed unless
I’ve installed logcheck. I’m sure there are several other places where I
can use it to reduce such tight coupling between classes.

Maven and Grails 1.2 snapshot

December 22, 2009

Because I couldn’t find the information anywhere else, if you want to
use maven with Grails 1.2 snapshot, use:

mvn org.apache.maven.plugins:maven-archetype-plugin:2.0-alpha-4:generate
-DarchetypeGroupId=org.grails
-DarchetypeArtifactId=grails-maven-archetype
-DarchetypeVersion=1.2-SNAPSHOT     -DgroupId=uk.org.catnip
-DartifactId=armstrong
-DarchetypeRepository=http://snapshots.maven.codehaus.org/maven2

Conversations regarding printers

December 8, 2009

I just had the following conversation with my linux desktop:

Me: “Hi, I’d like to use my new printer please.”

Computer: “Do you mean this HP Laserjet CP1515n on the network?”

Me: “Erm, yes I do.”

Computer: “Good. You’ve got a test page printing as we speak.
Anything else I can help you with?”

Sadly I don’t have any alternative modern operating systems to
compare it to, but having done similar things with linux over the last
12 years, I’m impressed with how far we’ve come. Thank you to everyone
who made this possible.