Some thoughts on HTML5's getElementsByClassName
HTML5 will bring us some really cool things, not only in terms of new markup features but also in terms of extended features for forms and a very handy DOM extension in the form of a native (and thus lightning fast) getElementsByClassName method.
Firefox 3 beta already has a native implementation of getElementsByClassName and it's a real kicker. Once javascript libraries start implementing this as a branch in their selector-functions (some already have) you will start to see dramatic performance improvements for certain DOM-queries.
Now there are 2 things that I would like to point out. One is actually a warning and the other one is something that I feel is an oversight in the specification.
First for the warning: most javascript libraries normally return a static array with elements from their DOM-query functions. This is fine and most of the times also the only way. However, the native getElementsByClassName method is supposed to return a 'live NodeList'. That is also mostly fine until you start to do some manipulation on elements from this NodeList that affect the list itself.
A quick example:
That will make you quickly wonder why not all elements having a className of 'foo' are removed in the end
That is also why in my implementation I convert the NodeList to a static array using Array.slice() so it matches the returntype of the alternative branches.
Now there is one thing that made me wonder: getElementsByClassName can be fed multiple classNames (space-seperated) in order to find elements that have all of those classnames, so:
will find all elements that have both className 'foo' and 'bar' (in any order).
That's also nice, but what if I want to select all elements that have either className 'foo' or 'bar'? From a technical point of view it is easier to filter the outcome of getElementsByClassName('foo') on the existance of className 'bar' for each element than to actually do an intersect on getElementsByClassName('foo') and getElementsByClassName('bar').
In fact, JS1.6 already has an Array.filter() generic that can easily be used to accomodate the usecase of selecting elements that match all of several classnames, but a native intersect method does not exist. If getElementsByClassName was specified to be also applicable to NodeLists it would even be easier to accomodate the usecase of selecting elements that have all of several classnames because than you could just as well do:
So did the WHATWG/HTML5 WG aim here for a 'quick win' choosing the most easy implementable solution and disregarding the other, just as valid, usecase where you want to select elements with either one of a list of classnames? I wonder...
Firefox 3 beta already has a native implementation of getElementsByClassName and it's a real kicker. Once javascript libraries start implementing this as a branch in their selector-functions (some already have) you will start to see dramatic performance improvements for certain DOM-queries.
Now there are 2 things that I would like to point out. One is actually a warning and the other one is something that I feel is an oversight in the specification.
First for the warning: most javascript libraries normally return a static array with elements from their DOM-query functions. This is fine and most of the times also the only way. However, the native getElementsByClassName method is supposed to return a 'live NodeList'. That is also mostly fine until you start to do some manipulation on elements from this NodeList that affect the list itself.
A quick example:
JavaScript:
1
2
3
| var foo = document.getElementsByClassName('foo'); for (var i = 0; i < foo.length; i++) foo[i].parentNode.removeChild(foo[i]); |
That will make you quickly wonder why not all elements having a className of 'foo' are removed in the end

That is also why in my implementation I convert the NodeList to a static array using Array.slice() so it matches the returntype of the alternative branches.
Now there is one thing that made me wonder: getElementsByClassName can be fed multiple classNames (space-seperated) in order to find elements that have all of those classnames, so:
JavaScript:
1
| var foobar = document.getElementsByClassName('foo bar'); |
will find all elements that have both className 'foo' and 'bar' (in any order).
That's also nice, but what if I want to select all elements that have either className 'foo' or 'bar'? From a technical point of view it is easier to filter the outcome of getElementsByClassName('foo') on the existance of className 'bar' for each element than to actually do an intersect on getElementsByClassName('foo') and getElementsByClassName('bar').
In fact, JS1.6 already has an Array.filter() generic that can easily be used to accomodate the usecase of selecting elements that match all of several classnames, but a native intersect method does not exist. If getElementsByClassName was specified to be also applicable to NodeLists it would even be easier to accomodate the usecase of selecting elements that have all of several classnames because than you could just as well do:
JavaScript:
1
2
| var foo_and_bar = document.getElementsByClassName('foo').getElementsByClassName('bar'); |
So did the WHATWG/HTML5 WG aim here for a 'quick win' choosing the most easy implementable solution and disregarding the other, just as valid, usecase where you want to select elements with either one of a list of classnames? I wonder...
02-'08 The versioning switch's default is incorrect
02-'08 The road to HTML5: conformance of HTML4 documents
Comments
You could try to reverse the handling of the foo-list: for (var i = foo.length -1 ; i >=0 ; i--)var foo = document.getElementsByClassName('foo');
for (var i = 0; i < foo.length; i++)
foo[i].parentNode.removeChild(foo[i]);
Whether this actually works is up to the browser implementation of getElementsByClassName...
Indeed, reversing the loop and not re-evaluating the length property each iteration would probably work but that's beside the point I was making 
You are right that it also depends on browser implementation: the order in which the elements are returned in the NodeList is still left undefined in the specification. This issue has already been raised on the HTML WG mailing list. I too think that it would be wise to explicitly define the order. Document order would seem to be a rational choice.

You are right that it also depends on browser implementation: the order in which the elements are returned in the NodeList is still left undefined in the specification. This issue has already been raised on the HTML WG mailing list. I too think that it would be wise to explicitly define the order. Document order would seem to be a rational choice.
[Comment edited on Tuesday 12 February 2008 10:22]
The more I find out about HTML5, the better it looks.
I can't wait for the first HTML5/CSS3 compliant browsers.
I can't wait for the first HTML5/CSS3 compliant browsers.
For the more complicated cases you can probably use Selectors API. The current complexity matches class based selectors in CSS which seems about right.
.foo.bar or .foo, .bar doesn't look like a huge difference to me 

The latter version is actually two separate selectors so there is some difference. I suppose that if lots of requests come in for being able to do that case the getElementsByClassName() API might be enhanced in some way.
To remove all items with className foo:
var foo = document.getElementsByClassName('foo');
while (foo.length) foo[0].parentNode.removeChild(foo[0]);
There is nothing 'wrong' with the NodeList model, you simply need to use it the right way.
var foo = document.getElementsByClassName('foo');
while (foo.length) foo[0].parentNode.removeChild(foo[0]);
There is nothing 'wrong' with the NodeList model, you simply need to use it the right way.
Bozozo: I never said that there is something wrong with the NodeList model, just that it's behaviour needs to be understood and that because all non-native getElementsByClassName implementations return a static array this might lead to problems in current applications 

Comments are closed