IPv6 validation - more caveats
Last week I was taking a nice hot bath while reading the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan. Really, there is no better way of relaxing
But then chapter 7.17 made me jump out of the tub, rush to my computer, and - while still wet - start typing the regular expression printed on page 387. The chapter was called 'Matching IPv6 Addresses'.
Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either
Here's the expression (using PHP):
PHP:
In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.
More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).
Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:
PHP:
Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):
PHP:
The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.
Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either
Here's the expression (using PHP):
PHP:
| <?php
|
In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.
More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).
Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:
PHP:
| <?php
|
Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):
PHP:
| <?php
|
The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.
|
|
Having fun with IE part 5 - what item? |
|
|
Inline validatie met een Ajax sausje |
Comments
I wonder if it's only coincidence that you reported this bug in the php issue tracker ( http://bugs.php.net/bug.php?id=50117 ) only a few hours after I reported it to ZF's issue tracker: http://framework.zend.com/issues/browse/ZF-8253
So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other?
So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other?
freakingme: I think it's coincidence
I realised after writing this blogpost that I forgot to report the issues with PHP's filter_var when I wrote about it last time, so I went ahead and reported them last night.
I use this version.
PHP:
PHP:
| <?php
|
[Comment edited on Tuesday 24 November 2009 15:42]
I just ran accross this issue: http://framework.zend.com/issues/browse/ZF-8640 Which basically describes that it allows \n behind the actual ip, something that's not mentioned in the ipv6 specs 
You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).
You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).
freakingme: you're absolutely right; I mistakingly mixed up the meaning of \Z versus \z 