IPv6 validation - more caveats
Last week I was taking a nice hot bath while reading the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan. Really, there is no better way of relaxing
But then chapter 7.17 made me jump out of the tub, rush to my computer, and - while still wet - start typing the regular expression printed on page 387. The chapter was called 'Matching IPv6 Addresses'.
Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either
Here's the expression (using PHP):
PHP:
In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.
More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).
Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:
PHP:
Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):
PHP:
The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.
Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either
Here's the expression (using PHP):
PHP:
| <?php
|
In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.
More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).
Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:
PHP:
| <?php
|
Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):
PHP:
| <?php
|
The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.
11-'09 Having fun with IE part 5 - what item?
11-'09 Inline validatie met een Ajax sausje
Comments
I wonder if it's only coincidence that you reported this bug in the php issue tracker ( http://bugs.php.net/bug.php?id=50117 ) only a few hours after I reported it to ZF's issue tracker: http://framework.zend.com/issues/browse/ZF-8253
So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other?
So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other?
freakingme: I think it's coincidence
I realised after writing this blogpost that I forgot to report the issues with PHP's filter_var when I wrote about it last time, so I went ahead and reported them last night.
I use this version.
PHP:
PHP:
| <?php
|
[Comment edited on Tuesday 24 November 2009 15:42]
I just ran accross this issue: http://framework.zend.com/issues/browse/ZF-8640 Which basically describes that it allows \n behind the actual ip, something that's not mentioned in the ipv6 specs 
You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).
You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).
freakingme: you're absolutely right; I mistakingly mixed up the meaning of \Z versus \z 
crisp, do you have a corrected version of your test-suite available? I think I've worked out which ones you meant were wrong, but I could be wrong. 
Crisp,
I have added the four test cases cited above to Dartware's compendium of IPv6 Regex test cases. It's at:
http://forums.dartware.com/viewtopic.php?t=452
I would be willing to add other test cases from your suite.
I have added the four test cases cited above to Dartware's compendium of IPv6 Regex test cases. It's at:
http://forums.dartware.com/viewtopic.php?t=452
I would be willing to add other test cases from your suite.
Crisp - two more thoughts:
1) We also have an IPv6 Validator page that gives a go/no-go indication for a particular address, and also reformats it into the "best text representation" for display. It's at:
http://www.intermapper.com/ipv6validator
2) Do you know if the current PHP filter_var (PHP >= 5.2) function passes all these test cases?
1) We also have an IPv6 Validator page that gives a go/no-go indication for a particular address, and also reformats it into the "best text representation" for display. It's at:
http://www.intermapper.com/ipv6validator
2) Do you know if the current PHP filter_var (PHP >= 5.2) function passes all these test cases?
As richb-hanover hasn't read my mail I send a very long time ago:
Here is my homepage describing the shortest possible IPv6 validation regex, the number of IPv6 address representations, test cases and some tips:
http://home.deds.nl/~aeron/regex/
Here is my homepage describing the shortest possible IPv6 validation regex, the number of IPv6 address representations, test cases and some tips:
http://home.deds.nl/~aeron/regex/
Crisp, I would like to use your code in an open source project. What license is on the code?
Consider it GPLzyzygy wrote on Wednesday 28 September 2011 @ 15:11:
Crisp, I would like to use your code in an open source project. What license is on the code?
Would it be possible to dual license it under Apache 2.0 also?
I have no problem with that. As a matter of fact LGPL would be fine with me as well. I'm not that familiar with all those different licenses nor do I care much.zyzygy wrote on Wednesday 19 October 2011 @ 16:21:
Would it be possible to dual license it under Apache 2.0 also?
Comments are closed