IPv6 validation - more caveats

By crisp on Monday 09 November 2009 00:12 - Comments (13)
Categories: Internet, PHP, Views: 6.775

Last week I was taking a nice hot bath while reading the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan. Really, there is no better way of relaxing :) But then chapter 7.17 made me jump out of the tub, rush to my computer, and - while still wet - start typing the regular expression printed on page 387. The chapter was called 'Matching IPv6 Addresses'.

Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either :P

Here's the expression (using PHP):

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?php
// This is the regular expression taken from the Regular Expression Cookbook
// by Jan Goyvaerts and Steven Levithan
function validateIPv6($IP)
{
    return preg_match('/\A
        (?:
            # mixed
            (?:
                # Non-compressed
                (?:[A-F0-9]{1,4}:){6}
                # Compressed with at most 6 colons
                |(?=(?:[A-F0-9]{0,4}:){0,6}
                    (?:[0-9]{1,3}\.){3}[0-9]{1,3}    # and 4 bytes
                    \Z)                # and anchored
                # and at most 1 double colon
                (([A-F0-9]{1,4}:){0,5}|:)((:[A-F0-9]{1,4}){1,5}:|:)
            )
            # 255.255.255.
            (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
            # 255
            (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
            # Standard
            |(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
            # Compressed with at most 7 colons
            |(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
                \Z) # anchored
            # and at most 1 double colon
            (([A-F0-9]{1,4}:){1,7}|:)((:[A-F0-9]{1,4}){1,7}|:)
        )\Z/ix'
,
        $IP
    );
}
?>

In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.

More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).

Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<?php
// literally from the ABNF in rfc3986 (thanks to 'WCP')
function validateIPv6($IP)
{
    return preg_match('/\A
        (?:
            (?:
                    (?:[a-f0-9]{1,4}:){6}
                |
                    ::(?:[a-f0-9]{1,4}:){5}
                |
                    (?:[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){4}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,1}[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){3}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,2}[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){2}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,3}[a-f0-9]{1,4})?::[a-f0-9]{1,4}:
                |
                    (?:(?:[a-f0-9]{1,4}:){0,4}[a-f0-9]{1,4})?::
            )
                (?:
                        [a-f0-9]{1,4}:[a-f0-9]{1,4}
                    |
                        (?:(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}
                            (?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
                )
            |
                (?:
                        (?:(?:[a-f0-9]{1,4}:){0,5}[a-f0-9]{1,4})?::[a-f0-9]{1,4}
                    |
                        (?:(?:[a-f0-9]{1,4}:){0,6}[a-f0-9]{1,4})?::
                )
        )\z/ix'
,
        $IP
    );
}
?>

Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?php
function validateIPv4($IP)
{
    return $IP == long2ip(ip2long($IP));
}

function validateIPv6($IP)
{
    if (strlen($IP) < 3)
        return $IP == '::';

    if (strpos($IP'.'))
    {
        $lastcolon = strrpos($IP':');
        if (!($lastcolon && validateIPv4(substr($IP$lastcolon + 1))))
            return false;

        $IP = substr($IP0$lastcolon) . ':0:0';
    }

    if (strpos($IP'::') === false)
    {
        return preg_match('/\A(?:[a-f0-9]{1,4}:){7}[a-f0-9]{1,4}\z/i'$IP);
    }

    $colonCount = substr_count($IP':');
    if ($colonCount < 8)
    {
        return preg_match('/\A(?::|(?:[a-f0-9]{1,4}:)+):(?:(?:[a-f0-9]{1,4}:)*[a-f0-9]{1,4})?\z/i'$IP);
    }

    // special case with ending or starting double colon
    if ($colonCount == 8)
    {
        return preg_match('/\A(?:::)?(?:[a-f0-9]{1,4}:){6}[a-f0-9]{1,4}(?:::)?\z/i'$IP);
    }

    return false;
}
?>

The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.

Volgende: Having fun with IE part 5 - what item? 11-'09 Having fun with IE part 5 - what item?
Volgende: Inline validatie met een Ajax sausje 11-'09 Inline validatie met een Ajax sausje

Comments


By Tweakers user Freeaqingme, Monday 09 November 2009 14:51

I wonder if it's only coincidence that you reported this bug in the php issue tracker ( http://bugs.php.net/bug.php?id=50117 ) only a few hours after I reported it to ZF's issue tracker: http://framework.zend.com/issues/browse/ZF-8253

So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other? :D

By Tweakers user crisp, Monday 09 November 2009 14:57

freakingme: I think it's coincidence ;) I realised after writing this blogpost that I forgot to report the issues with PHP's filter_var when I wrote about it last time, so I went ahead and reported them last night.

By Tweakers user s.stok, Tuesday 24 November 2009 15:42

I use this version.


PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
    /**
     * Look if this is an legal IP-adress.
     *
     * @param string $psIPAdress
     * @param bool $pbIPv6
     * @return boolean
     */

    function isIPAddr($psIPAdress$pbIPv6=false)
    {
        if ($pbIPv6 === true)
        {
            return preg_match('/^((([0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){6}:[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){5}:([0-9A-Fa-f]{1,4}:)?[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){4}:([0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){3}:([0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){2}:([0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){6}((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|(([0-9A-Fa-f]{1,4}:){0,5}:((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|(::([0-9A-Fa-f]{1,4}:){0,5}((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|([0-9A-Fa-f]{1,4}::([0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})|(::([0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){1,7}:))$/'$psIPAdress);
        }

        return preg_match('/^\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b$/is'$psIPAdress);
    }
?>

[Comment edited on Tuesday 24 November 2009 15:42]


By Tweakers user Freeaqingme, Saturday 30 January 2010 04:14

I just ran accross this issue: http://framework.zend.com/issues/browse/ZF-8640 Which basically describes that it allows \n behind the actual ip, something that's not mentioned in the ipv6 specs ;)

You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).

By Tweakers user crisp, Monday 01 February 2010 22:39

freakingme: you're absolutely right; I mistakingly mixed up the meaning of \Z versus \z 8)7

By Geoffrey Sneddon, Saturday 22 May 2010 18:47

crisp, do you have a corrected version of your test-suite available? I think I've worked out which ones you meant were wrong, but I could be wrong. :)

By richb-hanover, Friday 22 October 2010 12:45

Crisp,

I have added the four test cases cited above to Dartware's compendium of IPv6 Regex test cases. It's at:

http://forums.dartware.com/viewtopic.php?t=452
I would be willing to add other test cases from your suite.

By richb-hanover, Friday 22 October 2010 14:40

Crisp - two more thoughts:

1) We also have an IPv6 Validator page that gives a go/no-go indication for a particular address, and also reformats it into the "best text representation" for display. It's at:

http://www.intermapper.com/ipv6validator

2) Do you know if the current PHP filter_var (PHP >= 5.2) function passes all these test cases?

By Aeron, Tuesday 21 December 2010 14:49

As richb-hanover hasn't read my mail I send a very long time ago:
Here is my homepage describing the shortest possible IPv6 validation regex, the number of IPv6 address representations, test cases and some tips:
http://home.deds.nl/~aeron/regex/

By zyzygy, Wednesday 28 September 2011 15:11

Crisp, I would like to use your code in an open source project. What license is on the code?

By Tweakers user crisp, Thursday 29 September 2011 09:17

zyzygy wrote on Wednesday 28 September 2011 @ 15:11:
Crisp, I would like to use your code in an open source project. What license is on the code?
Consider it GPL :)

By zyzygy, Wednesday 19 October 2011 16:21

Would it be possible to dual license it under Apache 2.0 also?

By Tweakers user crisp, Wednesday 19 October 2011 22:15

zyzygy wrote on Wednesday 19 October 2011 @ 16:21:
Would it be possible to dual license it under Apache 2.0 also?
I have no problem with that. As a matter of fact LGPL would be fine with me as well. I'm not that familiar with all those different licenses nor do I care much.

Comments are closed