HTML5 - why not use XML syntax?

By crisp on Sunday 8 July 2007 02:00 - Comments (26)
Category: HTML5, Views: 22.294

The XML-fanboys are at it again, this time tripping over the actual syntax used in W3C documents such as the HTML 5 differences from HTML 4 doc. Next comes a flurry of mails from people suggesting that HTML5 should actually make XML-syntax an author-conformance requirement.

Let's take a look at some of the arguments:

- It is claimed that XML-syntax is more logical and is easier to teach.
I agree to some extend; consistency is a great thing and I do encourage people to for instance always quote their attributes and include tags that are optional by spec. I however do not see any logic in having to explicitly close elements that by nature cannot have any content such as <br>. Also how many people always use implicit elements such as <tbody> in their XHTML markup today?

- It is claimed that XML-syntax is easier to parse
I agree that an XML-parser is much less complex compared to a parser that has to deal with less strict syntax requirements, but we're not talking about XML-parsers here but about HTML-parsers that are still required to deal with non-strict syntax wether that's conforming or not. The complexity of the parser won't be less so probably parsing-speed won't be much less either.

- It is claimed that XML-syntax is easier to read
Since when is markup-syntax meant for human consumption? Should indentation also be made a conformance-requirement? You can always use a prettifier or just reserialize to XML-syntax from a built DOM-tree. This is clearly a non-issue.

- It is claimed that you can use XML-syntax in HTML today because every browser supports it
That is not true, although browsers have never truly implemented HTML as a true SGML-application (by nature only validators use a true SGML parser - you can't blame them for following the specifications when implementations don't) and thus won't trip over the XML short-close syntax for empty elements it is still not possible to use for instance <script/>. Also the use of an XML-declaration will force some browsers into quirksmode.

Now I will present some arguments against making XML-syntax an author-conformance requirement:

- The more strict you make author conformance the less likely people will comply
HTML has been around for some time now and most people have learnt by example. Making the rules more strict will only confuse these people and drive them away from standards as they will not understand why suddenly what they've learned isn't "valid" anymore.

- It will punish people who have done the "right thing" in the past using HTML
Some people that are standards-aware have consciously avoided the "faux XHTML" trap and conformed to strict HTML compliance. An XML-syntax requirement will place extra burdon upon these people to make their documents HTML5-compliant.

- It makes markup needlessly verbose
And therefor doesn't cater to people who have a need for making their documents as small as possible but still be conforming.

XML-fanboys are still allowed to use the XML-serialization of HTML5 (wether it be called XHTML1.5, XHTML5, HTML5/XML or whatever), or are encouraged to join the XHTML2 WG which will probably never see a browser-implementation.

Volgende: prototype: IE and the cost of Element.extend() 08-'07 prototype: IE and the cost of Element.extend()
Volgende: Fixing the web? Fix your browser! 06-'07 Fixing the web? Fix your browser!

Comments


By mynthon, Thursday 22 January 2009 10:42

2 years later im sure youre wrong:

"Let's take a look at some of the arguments:

- It is claimed that XML-syntax is more logical and is easier to teach."

True!
Why i can do
<h1>title</h1>
<p>text
<p>text

and i cant

<h1>title
<p>text
<p>text

? Why i have to remember all exceptions? whay some attributes have valueas and some dn't?

"- It is claimed that XML-syntax is easier to parse"
:: its not about speed. Stricter syntax == less errors.

"- It is claimed that XML-syntax is easier to read
Since when is markup-syntax meant for human consumption? " - since peple write html pages. Developers are humans too (i suppose).

"- It is claimed that you can use XML-syntax in HTML today because every browser supports it"
:: xml syntax != xml document. i can use xml suntax in text document. it is all about less error prone syntax. XML syntax is simple. You open tag - you close tag. Every attribute has value. Thats all.

"Now I will present some arguments against making XML-syntax an author-conformance requirement:"

"- The more strict you make author conformance the less likely people will comply
HTML has been around for some time now and most people have learnt by example. Making the rules more strict will only confuse these people and drive them away from standards as they will not understand why suddenly what they've learned isn't "valid" anymore."
yes, you are right. We still have quirks mode. it is not strict. Everyone should use it.
xhtml showed that peple need it, because it is better for developers, screen readers, browser vendors, and you can use existing tools to parse document (you are able to parse any xhtml with proper syntax using xml parser!)

- It will punish people who have done the "right thing" in the past using HTML
Why? And what is "in the past". 15 years ago? It will also punish peple who didint use doctype! in html2 it was properly! And what about html3? And what about <font color="" size=""/> ???

- It makes markup needlessly verbose
??? do you really think that closing <p /> is needlessy verbose? Do you usehtml only for blog? Have you ever wrote web spiders? Or bots to compare shop offer? For end-user syntax is transparent. For developers it is big simplification if it is clear, without exceptions.
Even IDEs will have problems with "simplified" syntax.

Sorry, all you wrote are lies.

By Rafiki, Thursday 12 March 2009 11:29

mynthon, you are 100% right!


By jindw, Saturday 14 March 2009 13:12

mynthon, you are 100% right!!!!

waiting for xhtml2.

By Tweakers user crisp, Sunday 15 March 2009 01:58

waiting for xhtml2
XHTML2 may have it's uses, but as a markup language there won't be any support from the main browservendors any time soon (if ever)...

so, let me just go into mynthon's reply in more detail:
Why i have to remember all exceptions? whay some attributes have valueas and some dn't?
You probably meant: "why do some elements have optional end-tags"; well, that's just a feature of HTML and it actually doesn't hurt anybody if you decide to use this feature. Browsers are perfectly capable of detecting where the end-tags should be placed because of the content-model (another HTML/SGML feature) of those elements. Nevertheless nobody will keep you from adding those end-tags in your markup; in fact: I encourage you to do so since it will probably speed up parsing and I don't expect anybody to know exactly which end-tags are optional and which are not so it can easier be taught to always include them.

However, the same thing doesn't really count for elements that cannot have content or have explicit end-tags but may not have content; those are just as exceptional in XHTML (following appendix C) as they are in HTML: you cannot write <br></br> in XHTML nor <script src="foo.js"/> when you plan to serve it as text/html.
"Since when is markup-syntax meant for human consumption? " since peple write html pages. Developers are humans too (i suppose).
That is exactly why HTML has features that make it less verbose to author by hand ;)
xml syntax != xml document
True, but this article is about using XML syntax in documents served as HTML. Even IE8 won't support true XHTML and in most cases documents using XML syntax are still invalid HTML documents (although HTML5 will be less strict and does consider the solidus for empty elements as valid).
yes, you are right. We still have quirks mode. it is not strict. Everyone should use it.
I guess this was meant to be an ironic remark. Quirksmode has nothing to do with HTML versus XHTML or syntax as a whole; quirksmode is a browser rendering mode that gets triggered by a lack of DTD or an invalid or incomplete DTD. It has nothing to do with the misconception that XHTML would be strict and HTML not; both can be use in strict or transitional setting (after all: XHTML is nothing more than the XML-serialization of HTML), but for a browser it is all the same unless XHTML is actually served as an XML-application.
xhtml showed that peple need it, because it is better for developers, screen readers, browser vendors, and you can use existing tools to parse document (you are able to parse any xhtml with proper syntax using xml parser!)
The fact that most XHTML on the web is mallformed or invalid shows that people are not able to deal with strict requirements. HTML5 shows that there is no interest in an even more strict markup-language such as XHTML2 and that backwards-compatibility and error-handling is more important. Besides, you are able to parse any HTML document using an HTML parser, which are not less available than XML parsers.
Why? And what is "in the past". 15 years ago? It will also punish peple who didint use doctype!
No, I meant people who now write valid HTML4.01; requiring XML syntax for HTML5 would invalidate almost all valid HTML4.01 (with proper DTD) documents which would be totally unacceptable.
??? do you really think that closing <p /> is needlessy verbose?
All I'm saying is that I like the existing features of HTML markup that enable me to create less-verbose documents ;)

[Comment edited on Sunday 15 March 2009 01:58]


By mynthon, Wednesday 13 May 2009 12:54

look at:
http://www.python.org/dev/peps/pep-0008/

its not about xhtml but about python. Now find out why someone wrote guide "how to write python code". I still think that there should be only one way to do it. It makes code cleaner and easier to mantain.

You can use automats for checking pages, content, etc. without writing your own parser. Just use one designed for xml. And if you want less verbose write everything in single line.

By Tweakers user crisp, Wednesday 13 May 2009 23:23

You can use automats for checking pages, content, etc. without writing your own parser. Just use one designed for xml.
And you can just as easily use a parser designed for HTML, so that's really not an argument ;)

By mynthon, Friday 15 May 2009 08:38

yup. instead of one parser now there will be need of using 2 (xml + html5).

Especially joining documents where one is xml and second is "new syntax" will be soooo easy. No more boring ctrl+c ; ctrl+v - now we need converters. Wow! it is realy simplification.

and you still didn't explained me why:

<p class=ko>kokoko
<p class="moho">kokoko

is ok, while:

<h1>koko
<p class=pol ok>kokoko

isn't? Cant you see all the beginers and their problems? Why sometimes its ok and sometimes its not? I don't understand. How without reading specification i can find out if <video> have to be closed or not? Tell me where in the specification is list of all elements that don't need to be closed? Show me a tutorial for newbie and explain why sometimes
<head>
<meta...>
<p>...

is ok and sometimes not? This is simplification? With xml-compatible syntax there are no such questions! Open-close; quote values - THAT IS ALL YOU NEED TO KNOW!

And all that moaning about backward compatibility. Since <video /> element is in specification THERE IS NO BACKWARD COMPATIBILITY!

Its fun that specification wants to be compatible with IE6 but now ie6 is almost dead (maybe 1 year and it will be below 5%) so there is no need to be compatible with it. Maybe ie7/8 doesnt suport "real" xhtml byt it works with xhtml as text/html very well WITHOUT need for new parsers. I really cant understand why spec-writers cares about 15-years-old websites and they don't care about 2-years-old websites.

By Tweakers user crisp, Sunday 17 May 2009 00:27

yup. instead of one parser now there will be need of using 2 (xml + html5).
Yes, but you'll need that anyway with all the legacy content on the web, and all the non-XML markup that's being created even today. Even with XHTML DTD's most markup today isn't XML-valid and I doubt that wil ever change...
Besides that, for XHTML a treebuilding parser still has to accomodate for several rules specific to (X)HTML.
Especially joining documents where one is xml and second is "new syntax" will be soooo easy.
The HTML WG is working hard on solutions for mixing XML application syntax within the HTML serialisation. The alternative, using namespaces in X(HT)ML isn't really simple either. In fact, I dare say that it's a real pain actually...
and you still didn't explained me why:
Why should I explain when I already stated in my own article that I recommend closing such tags even in HTML? It's a left-over from SGML that I find illogical as well, but can be explained by the fact that saving bytes was more important in the old days and for some tags it was easy to implement the concept of optional (end)tags in parsers.
Tell me where in the specification is list of all elements that don't need to be closed?
Sure: http://www.w3.org/TR/html401/index/elements.html
And all that moaning about backward compatibility. Since <video /> element is in specification THERE IS NO BACKWARD COMPATIBILITY!
Backwards compatibility means that new parsers should be able to cope with legacy documents, it's not about old parsers being able to cope with new documents.
Its fun that specification wants to be compatible with IE6 but now ie6 is almost dead
No, the specification wants to be compatible with documents that have been written relying on some of IE's quirks to render correctly. It handles cases that are too common to ignore. Even though these cases may be invalid according to the HTML4.01 specification, browsers had to and have implemented compatibility workarounds so it's better to just update the specification to match current browser behaviour. If there's a specific example you would like to share I'd love to hear it.
Maybe ie7/8 doesnt suport "real" xhtml byt it works with xhtml as text/html very well WITHOUT need for new parsers.
"it works" is a bad excuse for condoning bad behaviour. XHTML as text/html "works" because browsers have never implemented HTML to the letter, but that doesn't mean that XML-syntax is 'better' nor that XML is 'better'. It means that specwriters found a way to push XML into a markup language with some guidelines to also make it work in browsers of that time that only supported HTML (note: not SGML). You have to realize that XHTML(1.x), even with appendix C, conflicts at points with the HTML(4.01) specification. When people actually started using XHTML in that way (in a text/html setting, mostly just because an XHTML DTD was 'hip') there was no way a browser could actually ever implement HTML4 to the spec.

And with browsers not being able to implement HTML4 but having to keep workarounds to make XHTML as text/html work I don't think it's bad to put that into a new specification, together with other behaviour that to that point had to be reverse-engineered from the monopolist on the browser-market.
I really cant understand why spec-writers cares about 15-years-old websites and they don't care about 2-years-old websites.
Care to explain that with some examples? Do you think that HTML5 will break 2-year-old sites, but not 15-year-old sites?

By mynth, Saturday 6 June 2009 00:57

> Sure: http://www.w3.org/TR/html401/index/elements.html

eh, this is for html41 and it really show how messy html5 can be without easier rules but with dozen of new elemnts.

> "it works" is a bad excuse for condoning bad behaviour.

why, entire html5 spec is written that way: "hey we can add new unnecessary element because it works in every browser" -> applet, embed, iframe, object, canvas -> lol, what is the difference between embed and object, and why iframe cannot be replaced with object?

By Tweakers user crisp, Saturday 6 June 2009 01:39

eh, this is for html41 and it really show how messy html5 can be without easier rules but with dozen of new elemnts.
ok than: http://www.w3.org/TR/html5/syntax.html#optional-tags

but do note that HTML5 is still a work in progress which would explain some of the 'messy-ness' in the current WD. You could however raise your concern in the WG which is open to anyone...
why, entire html5 spec is written that way: "hey we can add new unnecessary element because it works in every browser"
unnecessary in *your* opinion... I think that most additions are very welcome since it will improve semantics.
-> applet, embed, iframe, object, canvas -> lol, what is the difference between embed and object, and why iframe cannot be replaced with object?
backwards compatibility with existing content maybe?

[Comment edited on Saturday 6 June 2009 01:39]


By mynthon, Monday 8 June 2009 13:34

> ok than: http://www.w3.org/TR/html5/syntax.html#optional-tags

huh, easy to remember. Especially for "common peple". There is only assillion of unnecessary exceptions.

> backwards compatibility with existing content maybe?

and what about <FONT />? And embed is for backwards compatibility? Only gecko browsers support it, but gecko browsers support also object, so what is embed for? Whats the difference between embed and object. You say, that mr. smith creating webpage about its cat will know the difference?

I am working with html everyday and i still dont know every exception bacause i don't have time to read carefully every page of specification. Do you really think that peple who are creating their hobbysite or family site will spend dozens of hours on serching if:
"An option element's end tag may be omitted if the option element is immediately followed by another option element, or if it is immediately followed by an optgroup element, or if there is no more content in the parent element.". You are kidding - right?

See now. It is part of what you want (from spec):
------
A p element's end tag may be omitted if the p element is immediately followed by an address, article, aside, blockquote, datagrid, dialog, dir, div, dl, fieldset, footer, form, h1, h2, h3, h4, h5, h6, header, hr, menu, nav, ol, p, pre, section, table, or ul, element, or if there is no more content in the parent element and the parent element is not an a element.

An rt element's end tag may be omitted if the rt element is immediately followed by an rt or rp element, or if there is no more content in the parent element.

An rp element's end tag may be omitted if the rp element is immediately followed by an rt or rp element, or if there is no more content in the parent element.

An optgroup element's end tag may be omitted if the optgroup element is immediately followed by another optgroup element, or if there is no more content in the parent element.

An option element's end tag may be omitted if the option element is immediately followed by another option element, or if it is immediately followed by an optgroup element, or if there is no more content in the parent element.
------

This is what i want:
------
A p element's end tag is necessary

An rt element's end tag is necessary

An rp element's end tag is necessary

An optgroup element's end tag is necessary

An option element's end tag is necessary
------

or shorter version:

Every element must have end tag.

Sorry, but if we want sematic web created by pepole it HAVE TO BE EASY TO CREATE. Otherwise we will end up with dozen of new formats, standars, converters, methods and options. Making mistakes a legal code is not right way. You cant tell that 2+2=5 just because some peple think this is right so we change rules for them. And now if you write 2+2=4 is ok, but 2 +2=5 (with space) it is also ok.

>>> but do note that HTML5 is still a work in progress which would explain some of the 'messy-ness' in the current WD. You could however raise your concern in the WG which is open to anyone...

my english is not good enaugh to participate in discussion.

By Tweakers user crisp, Monday 8 June 2009 14:26

huh, easy to remember. Especially for "common peple". There is only assillion of unnecessary exceptions.
I'd at least expect that in future the spec would contain a convenient table of these 'exceptions' as an appendix. Moreover there has been talk in the WG to produce a document specifically targetted at authors. Like I said: HTML5 is still a work in progress, but it is for the larger part also backwards compatible with HTML4.01 so certain parts of HTML4.01 also apply (so meanwhile and as long as HTML5 is not a recommendation yet you could and should refer to that).
and what about <FONT />?
FONT is still part of HTML5 as a useragent requirement but it is deprecated and thus not conforming from an authoring perspective.
And embed is for backwards compatibility? Only gecko browsers support it
That's not true, almost every browser supports EMBED even though it is from origin indeed a Netscape extension. Support for EMBED in some browsers (mainly IE < 8 ) is better than support for OBJECT. So yes, EMBED is also for backwards compatibility and is in HTML5 even been made conformant.
Whats the difference between embed and object. You say, that mr. smith creating webpage about its cat will know the difference?
I'd say EMBED was included in HTML5 as a convenience method for embedding a plug-in in cases where fallback content is not needed or available, and because it has better support in older browsers (the most common case where EMBED is used - as fallback-content in an OBJECT-element).
See now. It is part of what you want (from spec):
It's not just what I want: it is needed in order to be backwards-compatible with previous HTML specifications, implementations and with existing content on the web. Besides that it doesn't do any harm to keep allowing ommission of certain tags. The common author doesn't actually have to know because it is perfectly valid to close your P's, TD's, OPTION's etcetera - I even recommend it!
my english is not good enaugh to participate in discussion.
It's good enough to participate in discussion with me ;)

[Comment edited on Monday 8 June 2009 14:28]


By mynthon, Monday 8 June 2009 19:44

>FONT is still part of HTML5 as a useragent requirement but it is deprecated and thus not conforming from an authoring perspective.

You cant use it but browser has to support it. It isnt even funny. And what about blink and marquee? I realy like blink. Without marquee we will lost a lot of good pages: http://web_4_all.republika.pl/witamy.htm

> That's not true, almost every browser supports EMBED even though it is from origin indeed a Netscape extension.

you are right. My mistake. Ive used embed only for flash and i get used to it as hack for mozilla browsers.

> I'd say EMBED was included in HTML5...

<object /> is quite cool. but that was discussion about syntax not elements...

> It's not just what I want: it is needed in order to be backwards-compatible with previous HTML specifications, implementations and with existing content on the web.

XHTML1 wasn't compatible and web didnt collapse. Well, it became more semantic. It took 10 years but now something moved forward. HTML5 is imo step backward. Legalizing wrong behavior is way to improve statistics not the web. now browsers supports few parsers so why we cant add another one or maybe there is better solution. i dont understand why html5 can't be line between past and tomorrow or cant be based on xhtml1.

What belongs to past will remain in past. If you want new cool elements like standarized embed, or <font/> supported by browsers by forbidden to developers you have to use HTML5.

you know. In poland every holidays police catches few thousend drunken drivers. For me lagalizing driving after few beers will improve statistics not safety.

By Tweakers user crisp, Monday 8 June 2009 21:42

You cant use it but browser has to support it. It isnt even funny.
You can use it, but that would make your document non-HTML-conformant ;) And why would you want to use it for new documents? All browsers support CSS these days which is a far better alternative than littering your documents with presentational tags and attributes. FONT isn't even conforming in HTML4.01 Strict.
And what about blink and marquee?
blink is a CSS text-decoration property and marquee will be specced for HTML5 ;)
XHTML1 wasn't compatible and web didnt collapse.
XHTML1 can be written in a way that is compatible with HTML4.01 (or at least with existing implementations of HTML4.01), there is a whole appendix devoted to that in the spec. In fact; 99% of all documents with an XHTML DTD are actually served as HTML and thus also treated as HTML by browsers. The majority of those documents would actually break when served as real XHTML (in XHTML-capable browsers, which excludes IE) because they don't even follow the XML syntax requirements. So who's saying that XML is easier?
Well, it became more semantic.
No, XHTML1.x is just the XML serialisation of HTML4.01, so it isn't more or less semantic.
It took 10 years but now something moved forward. HTML5 is imo step backward.
Specifying what browsers have been doing for themselves the last 10 years (and copying/reverse engineering from each other with the obvious result of differences in implementations) is a *major* step forward, especially now with every major browservendor participating.
Legalizing wrong behavior is way to improve statistics not the web.
What 'wrong behaviour' is being legalized? If you're talking about optional (close)tags: that has always been conforming in all HTML specifications. Do you mean making elements such as EMBED and MARQUEE conformant? Well, you just mention the latter yourself as being quite useful.
now browsers supports few parsers so why we cant add another one or maybe there is better solution.
So basically you want to 'fork' the web? Hoping that every author on the planet will follow the new direction, replace all their existing HTML-based tools and software, and suddenly everyone *is* going to conform to more strict syntax and do it right this time (which failed with XHTML)? And browservendors will happilly abide to that by adding yet another parser to their browsers wereas they'd rather see less rendering modes (except for Microsoft who seems to think that adding a new parser for every new version of IE is a good idea 8)7 )

I'd say HTML5 is the best direction we can currently take, even when it is a very difficult route.
i dont understand why html5 can't be line between past and tomorrow or cant be based on xhtml1.
You seem to forget (or didn't know yet) that HTML5 actually makes XML-like syntax conformant in the HTML serialization (which is not the case with HTML4.01) and also has an XML-serialization.
What belongs to past will remain in past.
Left to rot and inaccessible to all mankind in a matter of years? Mankind creating it's own dark era without a digital Rosetta Stone for our offspring? If we can't learn from our past we are unworthy of the future.
you know. In poland every holidays police catches few thousend drunken drivers. For me lagalizing driving after few beers will improve statistics not safety.
You seem to be referring to the 'legalizing wrong behavior' again here for which you didn't mention any concrete examples. You make a nice analogy, but like most it can be reversed as well: criminalizing good-intended behaviour may improve safety but at the much higher cost of losing freedom ;)

[Comment edited on Monday 8 June 2009 21:59]


By mynthon, Tuesday 9 June 2009 16:02

> You seem to be referring to the 'legalizing wrong behavior'

font, marquee, not closing tags, mixing quotes and unquoted values etc. There was a discussion about reserved classes. Dont know where it is now, but an idea was wrong. Isn't it breaking backward comptibility?

> No, XHTML1.x is just the XML serialisation of HTML4.01, so it isn't more or less semantic.

yup, but it made people aware about semantics, structure, code etc. Since now everything you create will be good. No matter how f*cked up it is. You can mix elements. Quote values or not. Can i do this:

<img src=dot.gif> ?
and this
<img src=my dot.gif> ?

I really cant understand it. In real world there is a progress. You know. Vinyl was replaced by tapes. CDs replaced tapes. And now blue-ray/mp3 replaces CDs. And someone could sey: hey, we have super invention:cd but we will not using it because its not backward compatible with magnetic tapes.

> So basically you want to 'fork' the web?
it is forked, and with this 'specification' it will be forked more, because there is a lot of places where mistake can be make. HTML5 does not simplyfy things but complicates it more. If there is an difference in interpreting code you will have to choose only one solution so others will stop working. Like borders for <object/> in IE or styling <hr/>. Displaying alt as title and mixing attributes. As i said, html5 adds new exceptions and you have hope that complicating thing will made it easier for people. Well i think not. I love simple rules. Wash teeth after meal! Not: wash teeth after meal unless it is sunday 25th of may or modulo of sum year+month and day will gave 3 and jupiter is seen on western part of the sky.

Well, there is loose/strict/transitional. I really dont understand why strict cannot be extended and made more xml-like (not xml, but xml-like) by simplyfying rules. As i sad before: tag, quote, values : you opened, you close - simple!

And link you gave: http://www.w3.org/TR/html5/syntax.html#optional-tags says it all. It should be presented for every project manager with title: How to complicate things unnecessary. Well, its my favourite web page for now. It is funny how much energy is wasted on making simpe things really complicated or rather on standarizing things that where done wrong.

and one thing about accesibility - your captcha really sucks. It is my third try to enter it correctly.
well fouth.

By mynthon, Wednesday 10 June 2009 09:21

yeah, i know Molly is bought by Micro-evil-soft but:
http://www.molly.com/2009...real-why-xhtml-discussion

I have another idea. We have to change css spec because i saw a lot of peple writing:
div{margin: 10 10 20 30; padding: 40 px 30 px 10 px 90 px font: sans serif;}

One good thing with this messes syntax in html5 is that google eventually will be standards compliant (well, we can do better sites by we are so rich and big that we rather change spec).

And i have good advice for web authors: You want to have valid html5 page tomorrow? Just write invalid xhtml page today! You will save a lot of time!

By Tweakers user crisp, Thursday 11 June 2009 11:25

font, marquee
afaik font will not be author-conforming (there has been some talk about making it conformant for WYSIWYG-editor output, but I think that's impossible to automatically validate). marquee is a bit of a strange case; I think it is good that it's being specced (it helps interoperability and it is in use on the web), but I feel that such a typical behavioral element shouldn't really be author-conformant in a markup language. However the WG probably saw not much harm in it - as long as the behaviour itself is not a UA-requirement accessibility is not really at stake.
not closing tags, mixing quotes and unquoted values etc
Those are all things that are *optional*. You don't *have to* omit optional end-tags or omit quotes. The fact that you can (because it is allowed and browsers support it) doesn't make it a bad thing. Besides, there is no 'legalizing' here; it's been 'legal' sinds forever.
There was a discussion about reserved classes. Dont know where it is now, but an idea was wrong. Isn't it breaking backward comptibility?
Are microformats breaking backwards compatibility? Or the fact that IE uses certain predefined classnames for their webslices feature? Extending the use of something that already exists is not uncommon; it's easier to implement and degrades gracefully in UA's that don't support it.
yup, but it made people aware about semantics, structure, code etc. Since now everything you create will be good. No matter how f*cked up it is. You can mix elements. Quote values or not. Can i do this: [...]
I must admit that indeed XHTML made people more aware of semantics and structure, but it isn't XHTML that really brought us that because HTML already had all that. And again you are giving an example that really doesn't provide an argument for XHTML because also in HTML it is good practice to always use quotes for attribute values - especially when you don't know the exact rules for when it is allowed to omit them (and you don't have to know those rules if you stick to always quoting attribute values).
I really cant understand it. In real world there is a progress.
It depends on what you call progress. Your example with vinyl being replaced by cd's isn't in all aspects an example of progress because cd's do have some drawbacks: not in the last place because people have to upgrade their stereo's and repurchase all albums they already had on vinyl (because maybe some day recordplayers may cease to exist). Some records may never be republished on cd (loss of heritage), soundquality from cd is suboptimal compared to vinyl (especially with the loudness wars), and cd (and dvd, and blu-ray) itself is still a suboptimal medium because it is so d*mn fragile.
HTML5 does not simplyfy things but complicates it more. [...] html5 adds new exceptions and you have hope that complicating thing will made it easier for people
HTML5 tries to specify as much as possible in order to achieve a couple of goals, among which interoperability is one. That it makes for a complicated spec is a fact, but specifications have a very specific audience. For the average author not everything in the specification will be relevant. They can just go to sites like w3schools.com and learn there everything that is relevant. And maybe the HTML5 WG will at some point produce a document specifically targetted at authors. A specification should be complete and cover all edge-cases. When specifications leave open such cases you cannot rely on interoperable behaviour from browsers (as we already see practice).
I really dont understand why strict cannot be extended and made more xml-like (not xml, but xml-like) by simplyfying rules.
You can already practice those 'simplified rules'. HTML5 even made the solidus conforming for elements with empty content-model in the HTML serialization. The only thing you still can't do as long as you send your documents with an HTML mimetype are things like <img></img> or <script src="foo.js"/>, but that's a choice between being backwards compatible and having some exceptions (that are generally well-known) and being consistent for the mere sake of purity.
and one thing about accesibility - your captcha really sucks. It is my third try to enter it correctly.
well fouth.
We are considering replacing it with reCaptcha, but imo captcha's in general just suck :P Unfortunately we cannot do without...
I have another idea. We have to change css spec because i saw a lot of peple writing:
div{margin: 10 10 20 30; padding: 40 px 30 px 10 px 90 px font: sans serif;}
In general most languages have features that allow for a more compact syntax. When it doesn't present a problem for implementations I don't see any reason why in this case the CSS specification cannot say "if a unit is missing after a length value assume 'px'" (some browsers already do that, so it would help interoperability).
You want to have valid html5 page tomorrow? Just write invalid xhtml page today!
You can just as well write valid HTML4.01 (Strict) today :)

[Comment edited on Thursday 11 June 2009 11:31]


By mynthon, Friday 12 June 2009 22:59

>Are microformats breaking backwards compatibility? Or the fact that IE uses certain predefined classnames for their webslices feature? Extending the use of something that already exists is not uncommon;

well, microformats are using class because its valid and there were no other attribute that can be used. Maybe here is a place where something really useful can be done (role, rel, another attribute)?

I always tought that css is for separion between data and view (like in MVC). Style and class belongs to css. Adding and standarizing special values of class is... well return to good old times that every webmaster/developer etc. would like to forget. Mixing content with presentation. And as i said before - another exception to remember.

And example with IE shows where it all now leads. To another browser wars. But now everything will be "standarized" without true steps forward. Well. Everything i can say is that i hope im wrong.

>The only thing you still can't do as long as you send your documents with an HTML mimetype are things like <img></img> or <script src="foo.js"/>

as you see - html5 really sucks. <script /> will be really useful! What sense is in <script ..></script>? And <img /> - well. It should be changed. How? This way.

<img src="img.png" alt=This is deprecated attribute because of...">
This image shows how html5 sucks. There are 5 items on the chart. Vacuum cleaner has 20 pionts, GW bush 35 points, jet engine 120 points, jena jameson 145 points and html5 339 points of suckness.
</img>

HERE it needs changes not adding video tag - well youtube and THOUSAND of sites work well without video so it is not first need. First need is better alt for images or simpler/better handling of longdesc.

>In general most languages have features that allow for a more compact syntax.
??? i know no language that guesses variable/class/method/library name or value/address just because user made syntax error.

>You can just as well write valid HTML4.01 (Strict) today
well, if you havent used reserved class by accident on element where it doesnt belongs to :) but wait. its backwards compatible...

By Tweakers user crisp, Friday 12 June 2009 23:38

I always tought that css is for separion between data and view (like in MVC). Style and class belongs to css.
No, class is a multipurpose attribute, as defined in HTML4.01:
The class attribute has several roles in HTML:
  • As a style sheet selector (when an author wishes to assign style information to a set of elements).
  • For general purpose processing by user agents.
more later

[Comment edited on Friday 12 June 2009 23:39]


By mynthon, Saturday 13 June 2009 13:05

and what abutt backwards compatibility? Sometimes it is necessary sometimes not.

By Tweakers user crisp, Sunday 14 June 2009 00:44

mynthon: (part of) your posts are starting to make less sense to me; it looks to me like you're dragging things into this discussion that are not relevant or at least in such way that I cannot put them into context. It's also strange for me to have this discussion in a blogpost that I wrote almost 2 years ago. I must say that I still believe that a more strict syntax requirement is unnecessary, and the good thing about HTML5 is that everyone can have it the way they like. You can use the XML serialization and use an XHTML mimetype to serve your documents. Even XML syntax in the HTML serialization has been made conformant. You can tell people that it's good practice to cross the t's (don't omit optional end-tags) and dot the i's (use quotes for attribute values), what more do you want?

You're complaining about some other minor things that are in your opinion 'wrong' in HTML5. Well, some things might be wrong, but they are mostly wrong for a reason, being backwards compatible with what browsers are doing today may be one of those reasons. That does however give us the benefit of early adoption; you can start writing conformant HTML5 documents today without much problem. If the WG were to design a complete new markup language it would take at least a decenium before we would see sufficient support in practice; and only if browservendors would actually want to support it.

Another reason why things could be wrong in you opinion is because it's an opion, and not everybody may share the same. Sometimes compromises must be made, that's life.

Now on some of those detailed points you're trying to make:
as you see - html5 really sucks. <script /> will be really useful!
I'd say this is a technical issue. The HTML serialization doesn't give any meaning to the solidus (because of backwards compatibility reasons), and a parser should know beforehand what the element's content-model is in order to be able to parse it correctly. It's a minor detail (and in the XML serialization you could use it!), does that warrant saying that HTML5 sucks?
<img /> - well. It should be changed. How? This way.
So you want rich fallback for images? Why not use <object> than? This is another case were you have a technical limit (how can a parser know beforehand when it encounters an <img> start-tag whether it has an end-tag and thus content or not?). That besides the fact that the use-case is very limited given the fact that there's an alternative (<object>) and for most cases alt or longdesc will suffice - lots of authors don't even bother to put a proper alt on images...
HERE it needs changes not adding video tag - well youtube and THOUSAND of sites work well without video so it is not first need. First need is better alt for images or simpler/better handling of longdesc.
So being dependant on proprietary technologie for something that's being used on the web more and more is less of a problem than not having enough means of fallback for images that author's aren't likely to give, and if they wanted to would already have alternatives to when the current means are insufficient?
??? i know no language that guesses variable/class/method/library name or value/address just because user made syntax error.
A markup-language is a class different compared to programming languages. It has a whole different audience, and markup-authors shouldn't be compared to programmers, nor should you require the same skills from them. Besides, in most cases the guesses are in line with the author's intentions. In case they are not it is only good if every browser would act the same.
well, if you havent used reserved class by accident on element where it doesnt belongs to :) but wait. its backwards compatible...
I'm not sure what the status of that is in HTML5, or maybe they already dropped it.
and what abutt backwards compatibility? Sometimes it is necessary sometimes not.
When is it necessary and when not in your opinion?

[Comment edited on Sunday 14 June 2009 00:48]


By mynthon, Sunday 14 June 2009 21:52

You are right. This dialogue wasn't about classes, new tags, etc. I would like to see <script /> but if it doesnt work in html5 its ok. Its just few bytes more and we are living in youtube era. I dont care about reserved classes. Ive tried to implement microformats on my companys page but i failed. There was lack of good examples and tutorials for people not involved in microformats, but for me idea is good. I can (and google, yahoo, ms) really easy extract interesting data from webpages. I have in my mind few situations where it will be really helpful.

I just wanted to show that sometimes backwards compatibility is urgent for spec writers (syntax) and sometimes not (reserved classes). So moaning about backward compatibility is something i dont believe in.

>So you want rich fallback for images? Why not use <object> than?

no, it was just example that some things can be done better or in different way. I can say the same about <embed />. So you want to embed something on the page? Why not use <object />. And <video /> - you want to embed movie on the site? Why not use <object />, etc.

>It's also strange for me to have this discussion in a blogpost that I wrote almost 2 years ago.

well. you can close comments for this entry or enjoy probaby better position in google. Or just stop answering. Or i have better idea. If you are not comfortable i promise you that it is my last comment. I feel thats ok, because i would like to write about syntax in html5, not about <video /> tag with ogg support (who knows what is ogg? People are using quicktime or avi and i tought that html5 is for people not for geeks. And if for geeks - well: why not use xml syntax :) )

and remeber. if in 2015 you would like to give pople ability to put your logo on theirs page you shuold write:

You want link to my blog? If you are using html5 put this code on your page:
<a href="crisp"><img src="crisp.png" alt="crisps blog"></a>
But if you are using html5 in xhtml5 mode you HAVE TO put this code:
<a href="crisp"><img src="crisp.png" alt="crisps blog" /></a>
You are not sure what option choose - ask your admin for headers sent with your document and read long and very boring specification which will help you to choose option 1 or 2 (or maybe even 3 if html6 will be released)

When 2 documents written as compatible with specification could be not compatible with themselfs there is something wrong with specification.
2+2=4
2*2=4
but
2*2 != 2+2
?

By Tweakers user crisp, Sunday 14 June 2009 22:52

well. you can close comments for this entry or enjoy probaby better position in google. Or just stop answering. Or i have better idea. If you are not comfortable i promise you that it is my last comment.
I didn't mean it in that way; I like it when a discussion gets active again, but maybe it warrents a follow-up post then ;)

By mynthon_booster, Friday 30 October 2009 16:21

mynthon is 100% correct.

By e-sushi, Friday 30 October 2009 20:18

@mynthon,
You want link to my blog? If you are using html5 put this code on your page:
<a href="crisp"><img src="crisp.png" alt="crisps blog"></a>
But if you are using html5 in xhtml5 mode you HAVE TO put this code:
<a href="crisp"><img src="crisp.png" alt="crisps blog" /></a>
Sorry to say so, but you are wrong! Since HTML5 supports ALL previous doctypes, from HTML2.0 up till XHTML1.1 (which I used to use), it does not matter which of the two you use!

Furthermore, you will probably send "text/html" MIMEs form your server anyway... that is, unless you want your site to look like unparsed XML in Internet Explorer (which still is the most-used browser around).

Comments are closed