IPv6 validation - more caveats

By crisp on Monday 09 November 2009 00:12
Categories: Internet, PHP, Views: 2247

Last week I was taking a nice hot bath while reading the Regular Expression Cookbook by Jan Goyvaerts and Steven Levithan. Really, there is no better way of relaxing :) But then chapter 7.17 made me jump out of the tub, rush to my computer, and - while still wet - start typing the regular expression printed on page 387. The chapter was called 'Matching IPv6 Addresses'.

Having blogged about IPv6 validation just a couple of months ago with the conclusion that most IPv6 validation routines 'out there' are getting it wrong on some (or many) accounts I naturally wanted to know whether the expression offered in this book (and frankly, Jan and Steven are both experts whom I admire greatly) was any better, especially since I have been made aware that my own validation routine is incorrect as well. Fortunately for me, Jan and Steven didn't get it correct 100% either :P

Here's the expression (using PHP):

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
<?php
// This is the regular expression taken from the Regular Expression Cookbook
// by Jan Goyvaerts and Steven Levithan
function validateIPv6($IP)
{
    return preg_match('/\A
        (?:
            # mixed
            (?:
                # Non-compressed
                (?:[A-F0-9]{1,4}:){6}
                # Compressed with at most 6 colons
                |(?=(?:[A-F0-9]{0,4}:){0,6}
                    (?:[0-9]{1,3}\.){3}[0-9]{1,3}    # and 4 bytes
                    \Z)                # and anchored
                # and at most 1 double colon
                (([A-F0-9]{1,4}:){0,5}|:)((:[A-F0-9]{1,4}){1,5}:|:)
            )
            # 255.255.255.
            (?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}
            # 255
            (?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
            # Standard
            |(?:[A-F0-9]{1,4}:){7}[A-F0-9]{1,4}
            # Compressed with at most 7 colons
            |(?=(?:[A-F0-9]{0,4}:){0,7}[A-F0-9]{0,4}
                \Z) # anchored
            # and at most 1 double colon
            (([A-F0-9]{1,4}:){1,7}|:)((:[A-F0-9]{1,4}){1,7}|:)
        )\Z/ix'
,
        $IP
    );
}
?>

In fact, their expression failed on exactly the same cases my routine failed, and two more. As for the two more: their expression allows a leading 0 in the IPv4-part of a mixed IPv6 address for numbers between 10 and 99 which, according to the ABNF of RFC3986, is actually not allowed. The other one is the failure to identify an address in the form of ':10.0.0.1' (only one leading colon instead of two to mark a compressed form) as an invalid address.

More interesting are the cases they failed to correctly identify as valid addresses which I overlooked as well. Those are the cases 'WCP' also pointed out in my previous blogpost: addresses in the form of '::0:a:b:c:d:e:f' and 'a:b:c:d:e:f:0::'. Normally an IPv6 address using compression for "one or more groups of 16 bits of zeros" cannot have more than a total of 7 colons, unless it's the first or the last group (and only that group) that is being compressed, in which case there is a total of 8 colons in the address. Both my approach and the one from the Regexp Cookbook only allowed for a total of 7 colons (of which one double colon).

Even though the expression from the Regexp Cookbook uses a very nifty approach with an 'anchored' look-ahead I would rather recommend the more straight-forward expression that was also posted by 'WCP' in my previous blogpost which is a literal translation of the RFC3986 ABNF on IPv6 addresses:

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
<?php
// literally from the ABNF in rfc3986 (thanks to 'WCP')
function validateIPv6($IP)
{
    return preg_match('/\A
        (?:
            (?:
                    (?:[a-f0-9]{1,4}:){6}
                |
                    ::(?:[a-f0-9]{1,4}:){5}
                |
                    (?:[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){4}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,1}[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){3}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,2}[a-f0-9]{1,4})?::(?:[a-f0-9]{1,4}:){2}
                |
                    (?:(?:[a-f0-9]{1,4}:){0,3}[a-f0-9]{1,4})?::[a-f0-9]{1,4}:
                |
                    (?:(?:[a-f0-9]{1,4}:){0,4}[a-f0-9]{1,4})?::
            )
                (?:
                        [a-f0-9]{1,4}:[a-f0-9]{1,4}
                    |
                        (?:(?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\.){3}
                            (?:[0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])
                )
            |
                (?:
                        (?:(?:[a-f0-9]{1,4}:){0,5}[a-f0-9]{1,4})?::[a-f0-9]{1,4}
                    |
                        (?:(?:[a-f0-9]{1,4}:){0,6}[a-f0-9]{1,4})?::
                )
        )\z/ix'
,
        $IP
    );
}
?>

Finally here's are my own IPv6 validation function fixed for the case of 8 colons (and slightly faster than using a single regular expression):

PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
<?php
function validateIPv4($IP)
{
    return $IP == long2ip(ip2long($IP));
}

function validateIPv6($IP)
{
    if (strlen($IP) < 3)
        return $IP == '::';

    if (strpos($IP'.'))
    {
        $lastcolon = strrpos($IP':');
        if (!($lastcolon && validateIPv4(substr($IP$lastcolon + 1))))
            return false;

        $IP = substr($IP0$lastcolon) . ':0:0';
    }

    if (strpos($IP'::') === false)
    {
        return preg_match('/\A(?:[a-f0-9]{1,4}:){7}[a-f0-9]{1,4}\z/i'$IP);
    }

    $colonCount = substr_count($IP':');
    if ($colonCount < 8)
    {
        return preg_match('/\A(?::|(?:[a-f0-9]{1,4}:)+):(?:(?:[a-f0-9]{1,4}:)*[a-f0-9]{1,4})?\z/i'$IP);
    }

    // special case with ending or starting double colon
    if ($colonCount == 8)
    {
        return preg_match('/\A(?:::)?(?:[a-f0-9]{1,4}:){6}[a-f0-9]{1,4}(?:::)?\z/i'$IP);
    }

    return false;
}
?>

The Regexp Cookbook says "Because of the different notations, matching an IPv6 address isn't nearly as simple as matching an IPv4 address." Based upon my findings with several IPv6 matching algorithms I'd say that even that is an understatement. Implementors of software that deal with IPv6 (and validation of those addresses) should be very much aware of the corner cases introduced by the allowance of address-compression.

Volgende: Having fun with IE part 5 - what item? 17-11
Volgende: Inline validatie met een Ajax sausje 04-11

Comments


By T.net user freakingme, Monday 09 November 2009 14:51

I wonder if it's only coincidence that you reported this bug in the php issue tracker ( http://bugs.php.net/bug.php?id=50117 ) only a few hours after I reported it to ZF's issue tracker: http://framework.zend.com/issues/browse/ZF-8253

So the real question becomes; are you such a big fan of ZF that you're monitoring its issue tracker, or is it mere coincidence that we report the same issue within hours from each other? :D

By T.net user crisp, Monday 09 November 2009 14:57

freakingme: I think it's coincidence ;) I realised after writing this blogpost that I forgot to report the issues with PHP's filter_var when I wrote about it last time, so I went ahead and reported them last night.

By T.net user s.stok, Tuesday 24 November 2009 15:42

I use this version.


PHP:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?php
    /**
     * Look if this is an legal IP-adress.
     *
     * @param string $psIPAdress
     * @param bool $pbIPv6
     * @return boolean
     */

    function isIPAddr($psIPAdress$pbIPv6=false)
    {
        if ($pbIPv6 === true)
        {
            return preg_match('/^((([0-9A-Fa-f]{1,4}:){7}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){6}:[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){5}:([0-9A-Fa-f]{1,4}:)?[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){4}:([0-9A-Fa-f]{1,4}:){0,2}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){3}:([0-9A-Fa-f]{1,4}:){0,3}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){2}:([0-9A-Fa-f]{1,4}:){0,4}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){6}((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|(([0-9A-Fa-f]{1,4}:){0,5}:((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|(::([0-9A-Fa-f]{1,4}:){0,5}((\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b)\.){3}(\b((25[0-5])|(1\d{2})|(2[0-4]\d)|(\d{1,2}))\b))|([0-9A-Fa-f]{1,4}::([0-9A-Fa-f]{1,4}:){0,5}[0-9A-Fa-f]{1,4})|(::([0-9A-Fa-f]{1,4}:){0,6}[0-9A-Fa-f]{1,4})|(([0-9A-Fa-f]{1,4}:){1,7}:))$/'$psIPAdress);
        }

        return preg_match('/^\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b$/is'$psIPAdress);
    }
?>

[Comment edited on Tuesday 24 November 2009 15:42]


By T.net user freakingme, Saturday 30 January 2010 04:14

I just ran accross this issue: http://framework.zend.com/issues/browse/ZF-8640 Which basically describes that it allows \n behind the actual ip, something that's not mentioned in the ipv6 specs ;)

You probably want to replace that \Z to \z as is done here: http://framework.zend.com...php?r1=18986&r2=19949 (bottom few changed lines).

By T.net user crisp, Monday 01 February 2010 22:39

freakingme: you're absolutely right; I mistakingly mixed up the meaning of \Z versus \z 8)7

Comment form
(required)
(required, but will not be displayed)
(optional)

Please enter the code from the image below: