xhtml tutorial inaccuracies

croatiankid · June 11, 2007

http://www.w3schools.com/xhtml/default.aspFirst page, first sentence, wrong.See html tutorial inaccuracies.http://www.w3schools.com/xhtml/xhtml_intro.asp

XHTML is aimed to replace HTML

No.

XHTML is a stricter and cleaner version of HTML

No.

XHTML is HTML defined as an XML application

Correct!

W3C defines XHTML as the latest version of HTML. XHTML will gradually replace HTML.

No.

All New Browsers Support XHTMLXHTML is compatible with HTML 4.01.All new browsers have support for XHTML.

OH my god this was wrong then and is wrong now!!Internet explorer does not support xhtml!It's parsing bugs allow it to render xhtml as html, but it does not support it at all!http://www.w3schools.com/xhtml/xhtml_why.aspis so wrong.Using XHTML that is backwards (HTML) compatible (follows Appendix C) has NO advantages over using HTML.

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.

HTML doesn't require things to be marked up correctly???

In HTML, some elements can be improperly nested within each other, like this:This text is bold and italic

They must be properly nested, just as in XHTML. No difference.http://www.w3schools.com/xhtml/xhtml_syntax.asp

Attribute values must be quoted

Same as in HTML.That's it for now!

boen_robot · June 12, 2007

XHTML is aimed to replace HTML

What? Not right? How come? And don't give me the HTML 5 speech again. It's not what the world wants, it's what browser vendors (Microsoft in particular) want. And it's still very fresh, draft and so on.

XHTML is a stricter and cleaner version of HTML

To say something is a "reformulation" is like saying it's another version, with it's differences (stricter and cleaner being the ones in this case). The above statement is correct, though only for XHTML served with the proper MIME type.

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.

HTML does too, but it recovers from errors. XML and XML based languages don't. At least not when served with the correct MIME type.

All New Browsers Support XHTMLXHTML is compatible with HTML 4.01.All new browsers have support for XHTML.

OK. I agree with this one. It is wrong. W3Schools should say "All modern browser support XHTML written in html files..." and perhaps further on explain the MIME type issue. And of course, mention that IE can't yet use the new MIME type.

croatiankid · June 12, 2007

What? Not right? How come? And don't give me the HTML 5 speech again. It's not what the world wants, it's what browser vendors (Microsoft in particular) want. And it's still very fresh, draft and so on.To say something is a "reformulation" is like saying it's another version, with it's differences (stricter and cleaner being the ones in this case).

Where does it say that it's aimed to replace it in the specs? People (w3schools tutorial writers) assumed this.Not exactly, XHTML contains HTML 4.01, but is expanded.This is similar in my view to comparing something like C with C# or C++.Also don't forget that just because the W3C isn't developing HTML anymore, doesn't mean that HTML is not being developed. Remember that HTML 2.0 was made by the IETF.

boen_robot · June 12, 2007

Read the last paragraph of "1. What is XHTML?" of the XHTML 1.0 spec. Doesn't the sentence "The XHTML family is the next step in the evolution of the Internet." mean that XHTML is sort of "the next HTML" where "HTML" stands for "the language of the web", rather then "an SGML based markup language".

Not exactly, XHTML contains HTML 4.01, but is expanded.This is similar in my view to comparing something like C with C# or C++.

That's exactly my point.Oh, and

W3C defines XHTML as the latest version of HTML.

After reading that exact section from the spec above, I completely agree with this too. W3C instead defines XHTML as

XHTML is a family of current and future document types and modules that reproduce, subset, and extend HTML 4

A statement which could be simplified to what the last paragraph says, but not to what W3Schools says.

aspnetguy · June 12, 2007

Where does it say that it's aimed to replace it in the specs? People (w3schools tutorial writers) assumed this.Not exactly, XHTML contains HTML 4.01, but is expanded.This is similar in my view to comparing something like C with C# or C++.Also don't forget that just because the W3C isn't developing HTML anymore, doesn't mean that HTML is not being developed. Remember that HTML 2.0 was made by the IETF.

Since you mentioned C,C++, and C# I'll put this out there.Comparing C and C++ is like comparing HTML3 and HTML4 (with no depreciation). C++ is an extention of the C language. Anything you write in C can be, without any changes, ported to C++.Now C# is nothing like C or C++ in fact the closest language to C# is Java.

croatiankid · June 12, 2007

And of course, mention that IE can't yet use the new MIME type.

IE doesn't just not support the mime type, it doesn't support XHTML. Almost all browsers have a parsing but that ignores the slash in self closing tags.Also, XHTML doesn't use a more strict and cleaner syntax, it uses a different syntax.

justsomeguy · June 12, 2007

Also, XHTML doesn't use a more strict and cleaner syntax, it uses a different syntax.

... which is more strict and cleaner. You're splitting hairs here.

croatiankid · June 12, 2007

Perhaps, but how is it cleaner? You have more required tags (you can't omit html, head, body, tbody (if you omit it, the tbody element doesn't exist, unlike in HTML, where you can omit the tag, but the element exists)).

justsomeguy · June 12, 2007

Clean code is explicit code. It makes sense for a body tag to appear in a document and have a body element exist. It does not make sense for a body tag to not appear in a document and still have the element exist. The code is clean because it is explicit, it is easier to understand then trying to make sense of badly formed HTML that the browser still renders. It doesn't mean that XHTML requires less code to do the same thing, it means that XHTML code is easier to understand then HTML code is.

croatiankid · June 13, 2007

You can write bad XHTML just as you can write bad HTML, only when using certain MIME types (xml and xhtml), it will bring an error and not display it at all.

boen_robot · June 13, 2007

You can write bad XHTML just as you can write bad HTML, only when using certain MIME types (xml and xhtml), it will bring an error and not display it at all.

Which is exactly why XHTML (the real one) is stricter. Because XHTML served with the right MIME type is "real" XHTML. The rest, as you already know is just a file with XHTML syntax served (and thus interpreted) as HTML. Therefore: when you write bad XHTML, you're practically writing bad HTML.Technically speaking, XHTML served as HTML should be interpreted as bad HTML too, but as we all know, user agents don't work this way and they won't. Not in the near future anyway. And certanly not before IE supports the right MIME type.

SuperKoko · July 19, 2007

Comparing C and C++ is like comparing HTML3 and HTML4 (with no depreciation). C++ is an extention of the C language.Anything you write in C can be, without any changes, ported to C++.

No, C++ is not "the next version of C".HTML 4 renders HTML 3.2 obsolete (i.e. now, HTML means HTML 4), and, everybody is strongly advised to use HTML 4 in new Web developments. Though, you're still allowed to use it.Among other things, it means that, flaws, small and big bugs in the documentation of HTML 3.2 won't ever be corrected.HTML 3.2 is not maintained anymore by the W3C.Similarly, C99 renders C90 obsolete. C90 won't have new any Technical Corrigenda. The "C programming language" now is C99.Note that, to convert C to C++, you need changes, very often.Try to compile this valid C90 program in C++.

enum c{r,g}c(c) int c; {return c;}char static*p0;static void*new=(void*)(const struct v {struct i {char k;} k;}*)(sizeof(int)//**/sizeof('a')-sizeof(*p0));static char*p0=("hello world$"+1)-1;const struct v ex={'a'};main(){extern const struct v ex;enum c{r,g}c();char arr[1]="$";struct p0 static;typedef struct i v;struct k {v i;int v;} auto k;auto u=ex.k.k;if(0[arr]!=*p0){goto lbl;return;}if(*p0==c(0)){char p1[2]={0};lbl:0[p1]=*p0++;1[p1]=sizeof(int)-sizeof(r);puts(p1);main();}return 0;}

Note that most changes from C90 to C++ are straightforward (the main incompatibility are implicit conversions from void* to any_object_type*), but C99 and C++ are quite incompatible. For example, the floating point number computing in C99 is much richer than in C++.C++ includes too many features to be easily and efficiently implemented on all platforms.There are good open source C++ implementations (correctly conforming, with zero-overhead exceptions, and with not-too-code-bloating templates) on major platforms (excluding Win32 on which I'm not aware of any implementation without huge-overhead-exceptions).There are many reasons to prefer C over C++, in specific contexts, as there are reasons to prefer C++ over C.So, C++ and C are different languages. Both are useful, but not for the same projects.Compatibility with legacy code is one reason, among others.Once I wrote about it here:http://phpfi.com/233158That's why, there's still a huge demand for trained C engineers.http://www.tiobe.com/tpci.htmPersonally, in the projects I'm involved in, I mostly use C++. But I don't see it as the next version of C.Ideally, C++ ought to be a strict extension of C. That's what Bjarne Stroustrup thinks.But, in no way is it a successor.Objective-C is a strict extension of C.So, in this context, is Objective-C or C++ the successor/next-version of C?

IE doesn't just not support the mime type, it doesn't support XHTML. Almost all browsers have a parsing but that ignores the slash in self closing tags.

Which doesn't interact well with the DATATAG feature of SGML:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><!-- html and head tag implicitly open --><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><title/Valid but unsupported document!/<!-- head tag implicitly closed, and body tag implicitly open --><p> This is a <em/valid/ and <em/conforming/ HTML document. If you don't believe me, read ISO/IEC 8879:1986.<!-- p implicitly closed and reopen --><p> You can try with the W3C validator.<!-- p, body and html implicitly closed-->

The buggy ignore-slash thing, makes it impossible to support this SGML feature.Anyway, I don't put this feature high on my wish list for browser features. CDATA and RCDATA support would be much more useful.

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.
HTML doesn't require things to be marked up correctly???

HTML requires things to be marked up correctly. But the notion of well-formed document is specific to XML.In XML, an acceptable document can be well-formed or valid.A well-formed document has a root element, properly nested elements, properly quoted attributes, etc., but is not necessarily valid.A valid XML document is a well-formed document that has a document type declaration and respects the grammar implied by the DTD.For example, this document is well-formed but invalid:In SGML, there's a single notion: Validity[/i]. A document is either invalid or valid, it can't be half-valid...The notion of well-formedness can exists in XML, because XML documents can be parsed (not perfectly, but enough for many usages) without the context of a given DTD.In SGML documents using tag minimization, an SGML declaration and a DTD are needed to parse a document, because the declarations in the internal and external subsets are required This is required for the SGML parser to find the implicit start and end tags, to translate short references to the contents of their entities, to interpret attribute specifications whose attribute name has been omitted when the attribute value is limited to a declared list of values, and for entities expansion.So, on this point W3Schools is right, but confusing.

Attribute values must be quoted
Same as in HTML.

I didn't expect that from you. I thought you were an heavy SGML user...Attributes values don't need to be quoted in HTML if they contain only name characters ([A-Za-z0-9_.-:]).This is due to the SHORTTAG feature set to YES in the SGML declaration of HTML.See:http://www.w3.org/TR/html401/HTML4.declA good reference is available at:http://www.is-thought.co.uk/book/sgml-5.htm#AttributesThere are many misconceptions about SGML.I would like to point that:

SGML is more human friendly (both writable and readable) than XML.Anybody who uses SGML everyday will confirm that point.Closing every paragraph, list item, table cell, and table rwo explicitly puts visual garbage into your document, and slow down both reading and writing. This is especially annoying when the tags/character-data ratio is raised over the limit of 50%.It's redundant, because when opening a paragraph, it's clear that the previous one is closed...It may be useful to explicit close every tag for an SGML or XML application with which you won't write more than very few documents, when you don't know well the DTD yet.However, it's obvious that for a network protocol such as SOAP, requiring a heavy DTD-aware parser would be an overkill.
XML is more computer friendly than SGML, though it's often argued to be neither human-friendly nor computer-friendly.The need of quoting every attribute is a good example.The SGML feature of not having to quote every attribute doesn't involve any DTD, so it could have been implemented in XML without changing the overall aims of XML.When writing more than one or two documents, this SGML feature becomes very time saving.Moreover, it produces perfectly readable documents. I highly doubt anybody could be confused by <element attribute=value>.It saves bandwidth. It's even a bit easier to read when there are many attributes because quotes (especially double quotes) reduce the visibility of the quoted content.However, it's a bit easier to write a XML parser which only recognizes quoted values.Moreover, I feel that XML was designed to be SGML simplified as most as possible. This feature wasn't needed.
XML is more extensible than SGML with the OMITTAG feature.With omitted tags, adding new elements (and new short references) may cause some problems, sometimes, as the old software, when reading new documents, cannot know where the new elements are closed or opened (if they're implicitly closed or opened), unless the old software has a true, generic, SGML parser, and the document contains a doctype declaration pointing to a system (e.g. an URI) resource containing the new DTD.With XML, old software can easily ignore new elements.As I said, SGML used with generic parsers, is as extensible as XML. A generic SGML parser being far more complex to create than a non-validating or even a validating generic XML parser, XML marks a point here.Note also that, if new SGML elements have no implicit start and end tag, then, there should be no extensibility problem.
XML is even more extensible, thanks to namespaces.Namespaces provide a finer grained specification of the content models of elements and attributes.But the main advantage of namespaces is there use as a modularization tool.It's possible (as in XHTML 1.1) to provide modules. The ones understood by the user application will be treated as needed, and others will be ignored.Namespaces are incompatible with the OMITTAG feature of SGML.
XML simplifies transmission over Internet, because it has a single character repertoire ISO/IEC 10646 (aka Unicode).On this point XHTML and HTML are at the same point.But, for other SGML applications, with more uncommon SGML declarations, there are portability problems. On the other hand, for internal use, SGML is more flexible.
Inserting XML documents inside other XML documents is easy with namespaces.e.g. an ATOM newsfeed can contain an XHTML element.With HTML, that's not that easy, though RCDATA may help.Note that an ATOM newsfeed can quite easily contain HTML code (in a CDATA section), IF the HTML code doesn't contain any ]]> sequence. If it contains such a sequence, then, there's no option but using many ugly < entity references.
Finally, XML is easier to entirely learn by a human being who don't have much time to invest into the technology.This human factor seems a bit ridiculous to long-time SGML users who have gained much more time through the power of SGML then they've lost to learn it (With a good reference manual, one day should be sufficient to learn SGML), but I'm pretty sure this has been one of the major reason of the XML success.

XML is good for unstable, quickly evolving, applications, especially if they're treated (parsed and produced) very much by computers, but not much by human persons.SGML is good for more stable applications (though, SGML applications are flexible enough to be extended quite well) OR for data that is to be massively produced by human persons and/or read by human persons.SGML also has advanced feature that may be very useful sometimes: SUBDOC and CONCUR (concurrent documents... The lack of this feature in XML is sometimes very painful).SGML is good too for internal usage in a bounded entreprise.Maybe there should be an XML editor that permit the user to type SGML (with RCDATA, SHORTREF, OMITTAG), and, then, translate it to XML.The only requirement would be a SGML DTD. The internal subset could be used, but would be automatically removed in the final output, at least for the SGML specific features.Overall, I prefer to distribute HTML 4.01 documents, though, for storage (e.g. as nodes of XML files) and manipulation, XHTML is fine.Is XML cleaner than SGML?The term "cleaner" is not appropriate here.I think that "simplier" in the sense "fewer A4 pages in the specification", is more appropriate.This is an advantage... It's quicker to learn and teach...Another advantage is that, since the average guy is much more likely to learn XML properly than SGML, he is more likely to produce correct XML document (though, for XHTML, this is not exactly true), while the average non-technical guy, who learnt HTML in HTML for dummies, is likely to produce awful HTML code.There is a VERY major factor here.Most HTML books, manuals, references and tutorials are bad, except the W3C recommandation itself, of course. They teach tag soup, not HTML.For XHTML, they are bad, but maybe LESS bad.That's sad, as, learning XML, by reading the spec, and then, reading the XHTML DTD, can be done in a few days, and allow anybody to learn the language correctly enough to use it.And, I'm sure that, I could teach in half an hour, the content model of HTML and XHTML: Which elements and text can be put in which elements, and which elements can have their start or end tag omitted.This is not sufficient to learn HTML or XHTML, but this is a first step: Teaching guys to produce conforming code (the W3C validator can help).

You have more required tags (you can't omit html, head, body, tbody (if you omit it, the tbody element doesn't exist, unlike in HTML, where you can omit the tag, but the element exists)).

This point is interesting. I remember having seen that the DOM produced for XHTML documents, by the Opera browser, included the tbody, even when it was not explicitly there... Which is not correct... This is correct for HTML, though.Well, IIRC, I tested that with XHTML documents served as text/html or whose file extension was .html, so, that's probably not revelant.Anyway, this XHTML oddity (allowing tr outside tbody), is annoying when writing CSS stylesheets, or XSLT, or simply, when interpreting XHTML code from source.

Which is exactly why XHTML (the real one) is stricter. Because XHTML served with the right MIME type is "real" XHTML.

Right, but that's not what is suggested by W3Schools.The W3Schools tutorial should maybe contain something like:"HTML, because of a long history, is treated as tag soup by browsers.XHTML, served as application/xhtml+xml, is interpreted strictly, which is an opportunity to end the vicious circle of developers producing worst and worst HTML code as the browsers are more and more tolerant and use more and more complex algorithms to guess the meaning of the code, which is one of the reason that makes new releases of browsers slower than older ones."I wonder why browsers don't support a "very strict" HTML mode? For example, when the ISO/IEC 15445 document type declaration is detected, this mode could be set.I hope my post is not too lengthy.Ok, it is.Excuse me for this.(Maybe some people have found interesting info in it).

boen_robot · July 19, 2007

Wow. I'm impressed... A LOT.And.... I can't believe that the HTML code you posted is actually valid according to the W3C validator .Yep, I learned quite much from your... well... let's call it "article" as you've practically written enough for such. Do you have a blog or something? Such a long post could have been put there.What is this site about BTW? It seems like a free "text" storage to me.

SGML is more human friendly (both writable and readable) than XML.

I wouldn't be that sure about that. For me personally, having all required stuff is more readable then omitting everything that is known by default. For example, having quotes around all attributes automatically forces me to read the value from start to end, rather then searching for the end and then reading the value. Am I thinking like a machine? God... I hope so

. Anyhow... for others, what seems readable to me may not be readable to them. In other words - readability is subjective. Writability... too, because it doesn't only depend on the person, but also from the editor.

SGML also has advanced feature that may be very useful sometimes: SUBDOC and CONCUR (concurrent documents... The lack of this feature in XML is sometimes very painful).

I'm curious. What are those features and why are they useful?

XML is a markup language where everything has to be marked up correctly, which results in "well-formed" documents.
HTML doesn't require things to be marked up correctly???
HTML requires things to be marked up correctly. But the notion of well-formed document is specific to XML.

I guess you could simply say the definition of "marked up correctly" is different.

XML is good for unstable, quickly evolving, applications, especially if they're treated (parsed and produced) very much by computers, but not much by human persons.SGML is good for more stable applications (though, SGML applications are flexible enough to be extended quite well) OR for data that is to be massively produced by human persons and/or read by human persons

Well, the whole idea of a markup language is that it would be processed by a computer to do something. Content creators were, at least when XML was created, supposed to rely on editors to create a file in whatever sort of standard language they'll be using. Only creators of custom markup languages would've need to worry about typing XML manually.In the real world though, I guess (X)HTML is always produced dynamically, and thus it has to be manually embedded in some other document or other sorts of codes being embedded in it to generate more (X)HTML, as the case with most S3Ls. This all makes WYSIWYG editors useless beyond the actual design of a page, forcing the site author to eventually need to have readable code at hand.Languages like XSLT are already getting close to simplifying that stuff. Stylus Studio has a WYSIWYG XSLT editor when XSLT is used to create (X)HTML pages and XSLT itself is powerful in creating dynamic content if parameters are used. New languages like XProc are taking the next step in creating portable pipelines (a series of actions performed on XML based documents such as XSLT transformations, validations, etc.), simplifying even further the process of creating dynamically driven web sites. The real potential of all this is still to be seen anyway.

I wonder why browsers don't support a "very strict" HTML mode? For example, when the ISO/IEC 15445 document type declaration is detected, this mode could be set.

Because SGML is harder to implement and HTML as such, too. They'll only create such a mode when a generic SGML parser is used, rather then a tag soup, and that would probably never happen.

Most HTML books, manuals, references and tutorials are bad, except the W3C recommendation itself, of course. They teach tag soup, not HTML.

Considering the way HTML is implemented across browsers, HTML has become, by definition - a tag soup. Not according to the specification I mean... just in the real world. So nobody other then HTML's implementers are really wrong here.

This point is interesting. I remember having seen that the DOM produced for XHTML documents, by the Opera browser, included the tbody, even when it was not explicitly there... Which is not correct... This is correct for HTML, though.Well, IIRC, I tested that with XHTML documents served as text/html or whose file extension was .html, so, that's probably not revelant.

Right. As files served with text/html are always processed as tag soup, even if they contain the XHTML DTD.

The W3Schools tutorial should maybe contain something like:"HTML, because of a long history, is treated as tag soup by browsers.XHTML, served as application/xhtml+xml, is interpreted strictly, which is an opportunity to end the vicious circle of developers producing worst and worst HTML code as the browsers are more and more tolerant and use more and more complex algorithms to guess the meaning of the code, which is one of the reason that makes new releases of browsers slower than older ones."

That would be way too technical to worry newbies about. Paraphrasing it into something more newbie friendly would be better, such as perhaps:"HTML files are fault tolerant. If you make a mistake in them, the browser will try to recover from it.HTML files can contain XHTML content, which is unfortunately also fault tolerant.XHTML files however are not fault tolerant, forcing you to write good code that will be compatible across browsers that support XHTML."It's just that speaking about "files" is closer to a newbie then a "MIME type" or "tag soup" or "XML well formness".

SuperKoko · July 20, 2007

Do you have a blog or something? Such a long post could have been put there.

Not yet, but I intend to write my own blogging system.

I wouldn't be that sure about that. For me personally, having all required stuff is more readable then omitting everything that is known by default.

I guess there are three levels of understanding of HTML code: basic, intermediate and advanced.At the basic level, omitting tags improve the clarity of code.For example, ask your little sister, grand mother or friend who never read any line of HTML, what this document means:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><title>Font test page</title><p> The quick brown fox jumps over the lazy dog.<ul>    <li> The sentence above contains all the alphabet characters.    <li> It's short.    <li> It's not my dog! Mine is not lazy!</ul>

If necessary, explain the elements they don't understand, after some time. <li> means "list item", means "paragraph", the DOCTYPE line describes the format of the document.Then, try again with this document:

<?xml version="1.0"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml/"><head><title>Font test page</title></head><body>  <p> The quick brown fox jumps over the lazy dog.</p>  <ul>	<li> The sentence above contains all the alphabet characters.</li>	<li> It's short.</li>	<li> It's not my dog! Mine is not lazy!</li>  </ul></body></html>

For an intermediate developer, who learnt HTML or XHTML and see which elements are in which elements, and basically understand how they nest, but doesn't know the DTD well, then, the second version is maybe, more clear.But, for an advanced developer, both are clear. The first one is just smoother and quicker to decode.Note that the p element is an abbreviation for paragraph.For somebody who never read a single line of HTML code, if the element was named paragraph, it would be clearer.But, I'm sure that, after thirty seconds of HTML learning, becomes very clear for ever. It's simple enough to visually mark the source code, as if it was an icon.So, for intermediate and advanced programmers, and <li> are perfectly clear, and contains less noise than if they were called <paragraph> and <list-item>.

For example, having quotes around all attributes automatically forces me to read the value from start to end, rather then searching for the end and then reading the value.

It's fine when reading source codes contain few markups and much text... But, when reading source code that contains 70% of markups, any redundancy hides the true content of the document!When reading a source code, you're often happy to be able to run smoothly over an attribute value (except for class attributes, which are quickly recognized by the long class= prefix), because they're far from being essential part of the document; they're neither part of the essence of the structure of the document, not part of the contents of the document.Are you really sure of your statment.

<element lang="fr"></element><element lang=fr></element><!-- Are not the fr letters more visible in the second sample? Well, that's probably pretty subjective -->

Moreover, machine-generated content usually contains repeated patterns (e.g. posts on forum).When reading source code, the first one is long to read.But, for the next ones, you can visually recognize the pattern, and identify the elements.This is easier if the number and size of markups is minimal.This become obvious when looking at more extreme syntaxes.Have a look at the HTML 3.0 math formula syntax (HTML 3.0 was far too ambitious for its time, and the project was drop).To read the samples, you just need to understand that _ means , ^ means , { means <box>, } means </box>.http://www.w3.org/MarkUp/html3/maths.htmlNow, compare it to the presentational MathML.http://www.w3.org/TR/MathML/chapter2.html#fund.examplesIf you had to read more than three formula on a math web site. Which syntax would you prefer?

Am I thinking like a machine? God... I hope so

Maybe the code samples you've read are too simple, and you didn't need to seek for the content hidden behind a tag noise.Maybe, you're thinking like a machine. Congratulations!

Writability... too, because it doesn't only depend on the person, but also from the editor.

For writability of anything that contains many tags (HTML code is full of tags), there are quite simple criteria.

Number of keystrokes.
Number of carpal tunnel syndrome.

An XHTML editor that lets yourself omitting closing tags and automatically emit them, either after you open the element or, preferably, when you open a new one which close the previous one, will be good for the above described criteria.

I'm curious. What are those features and why are they useful?

SUBDOC is not a very needed feature, as it can be simluated with specific XML elements. It's just a way to standardly create documents in several parts and group them together as you wish.HTML doesn't have the SUBDOC feature enabled.With the SUBDOC feature, you could, for example, define a book (e.g. a W3C recommandation) which contains a number of sections and subsections.For each subsection, a file would be produced, as for the W3C HTML 4.01 recommandation.Then, it would be possible to define, a main page that includes the contents of all the sections, and gives a result that could be compared to the all-in-one-page HTML5 draft.Benefits:

You could download the entire book at once, by following the sub-documents links.
And yet, the subsections would still be in separate file, easy to acess individually.
You might even use an editor that has several rendering mode for the page... One that replaces sub-documents with links working like usual anchors, and another that literally puts the contents of the sub-document inside the mother document.

Currently, following the anchors is not a good idea, as some of them point to external resources.As I said, this feature could easily be put in XML. For example, using the simple convention (that could make a candidate for a microformat) of using iframes or anchors with a specific class type, to integrate sub-documents.Sub-documents are declared as general entities, in the internal subset, typically.If HTML did support SUBDOC, the syntax would be:

<!-- Invalid HTML document; HTML doesn't support the SUBDOC feature --><!-- This illustrates the use of the SGML feature --><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" [  <!ENTITY section1 SYSTEM "section1.html" SUBDOC --  note that the sub-document may be a different SGML application, with a different DTD...  if the document interpretor supports it -->  <!ENTITY section2 SYSTEM "section2.sgml" SUBDOC>]><title>Entity test</title><h1>Book</h1>  <h2>Section1</h2>     &section1;  <h2>Section2</h2>     &section2;

Reference: http://www.is-thought.co.uk/book/sgml-6.htmThe feature of SGML that XML lacks is not SUBDOC, but CONCUR, which is disabled in HTML too.Concurrent documents allow one to specify two different trees, for two different document types, in a single document, without duplicating the contents.Because, a single document often contains more than one way to express the data.The idea is great, though the way SGML did it is not perfect.http://www.is-thought.co.uk/book/sgml-10.htm#CONCURFor example, one may want to show a blog as an RSS feed as well as an HTML or XHTML document.(Currently this is possible by using XSLT as an XML stylesheet translating the document to XHTML).You may also put logical information that has already been standardized, into the XHTML document. This is the most interesting thing, in my opinion.For example, you may want to put RDF semantic information for many resources for which you give in legible XHTML format in an XHTML site, blog or forum, and yet want to give the RDF form to automatic tools.Thanks goodness, for simple semantic information, we have microformats. Microformats are an XHTML specific way to put several document trees into one... Since several classes can be given to a single element, microformats have the power of giving as many "document views" as you wish of a single document.But, microformats are not generic. They're specific to HTML and XHTML, and cannot be generalized to arbitrary XML documents. Moreover, they give an assymetry between the XHTML or HTML document and the concurrent document, and I feel they are not general enough to express very complex trees.I don't argue that something similar to CONCUR could not be implemented in XML. There's just no current standard for that, and XML parsers don't handle this type of thing.I can even think of a candidate syntax... Extending the notion of namespaces. If a day, XML parsers adopt it (i.e. permitting the software to have only the view of the document types it understands without bothering at all with what it doesn't understand), this may be a powerful thing.

<!-- this element may be at a deep level in an XML document --><both:element xmlns-concurr:both="first=URI1 second=URI2">	<first:second:someElement> <!-- first:second: is equivalent to both: -->	  <first:element1>	  <second:element2><second:element3>		  <xmltext:both>Text data emitted in both documents</xmltext:both> <!-- xmltext would indicate that the PCDATA will be emitted in the specified documents -->			  <first:element4>			  <xmltext:first>Text data emitted only in the first document</xmltext:first>			  </first:element4>	  </second:element3></second:element2>	  </first:element1>	</first:second:someElement></both:element>

Of course, you could use the "global" namespace to store any of the three namespace prefixes.The first:second: notation could be useful if more than two concurrent documents are used.For example, if there are ten concurrent documents, with namespaces names first, second, ... ten. The five:three:seven: prefix would open use these three namespaces.Would be a way to specify the two documents:

<element xmlns="URI1">	<someElement>		<element1>		  Text data emitted in both documents		  <element4>			Text data emitted only in the first document		  </element4>		</element1>	</someElement></element>

And:

<element xmlns="URI2">	<someElement>		<element2><element3>		  Text data emitted in both documents		</element2></element3>	</someElement></element>

I guess you could simply say the definition of "marked up correctly" is different.

Actually, the notion of "valid" is the same in SGML and XML. Validity measures the syntax as well as the grammar checked against a DTD.XML has a weaker notion. Well-formedness. A well-formed document is a document that has no obvious syntax error when ignoring the DTD.SGML is too versatile and dependent on the DTD to make this notion useful, as, without looking at the DTD, parsers really cannot say whether the document looks syntaxically correct or not.This is mainly due to tag omission and short references.In the order of correctness, for HTML:semantically conforming > syntaxically conforming > valid.In XHTML:semantically conforming > syntaxically conforming > valid > well-formed.The difference between "semantically conforming" and "syntaxically conforming" is not done by the standard, but only by me...Syntaxically non-conforming means that, automated tools may see that something is wrong.Semantically non-conforming means that, only human people can see that it's wrong.

<?xml version="1.0"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- well-formed, invalid XHTML document><html><em> <hello> Hello!</hello> </em>  <font size="48">	<tr><table> well-formed document!</table></tr>  </font></html>

<?xml version="1.0"? encoding="US-ASCII"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- Valid, syntaxically non-conforming XHTML document --><!-- Automated tools may complain --><html><head><title>Valid non-conforming document!</title></head><body>  <ins>Inline content forbidden in an <ins> at block level!</ins>  <pre>	Inside a <small>preformatted</small> block, inline elements changing the font are <big>not</big> permitted.	These subtle content models cannot be expressed with the limited XML DTD power.  </pre></body></html>

Similarly, a valid HTML transitional document may contain attributes like width="hundred%" which are not conforming!

<?xml version="1.0"? encoding="US-ASCII"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- Semantically non-conforming XHTML document --><html lang="fr" xmlns="http://www.w3.org/1999/xhtml"> <!-- argues to use french language -->  <head><title>Hello world!</title></head> <!-- title in english -->  <body>	<p>Some dialog recorded there</p>	<ul> <!-- a dialog is an ordered list! If the XHTML interpretor re-orders the ul, it will mess up the dialog -->	  <li>How far are you from <acronym title="New extended wall">New</acronym> York?</li>	  <li>22 kilometers.</li>	  <li>How tall are you?</li>	  <li>1m70.</li>	</ul>  </body>  </html>

For the conformance and validity notions, XHTML and HTML are the same.Well-formedness is specific to XML.But, the wording of the W3schools expressed the idea that the notion of conforming or even valid HTML document, doesn't exist.

Well, the whole idea of a markup language is that it would be processed by a computer to do something.

SGML is perfectly processable by computers. The parser is just much more complex to write than for XML.However, it only needs to be written once! Theorically, writing a SGML parser in C or C++, portable to all platforms, is not hard.So, once guys have written libsgml, nobody should have to re-write it.Unfortunately, on the Win32 platform, with all the commercial issues, everything tends to be re-written by everybody.

Content creators were, at least when XML was created, supposed to rely on editors to create a file in whatever sort of standard language they'll be using.

It's hard to find a good XHTML editor.Typing it manually with a generic XML editor, or even a generic text editor, is usually easier then trying to make an XHTML editor, do its job correctly.And, of course, WYSIWYG editors are awful.SGML was designed to be an editable form of documents... Editable with a simple pure text editor.That's what I call a human-friendly format.Moreover, everything that is directly produced by a human, with a text editor, tends to be legible in this form.Indentation is optimal. The places where tags are omitted, empty start or end tags are used, are naturally chosen for optimal legibility. The choice of legible, intuitive, simple, short references, can improve legibility and writability too.Automatic indentation gives correct results, but manual indentation, gives better layouts, because of subtle exceptions to general rules, that human people naturally adopt.The same format for every platform... No need to have a specific editor. I think that SGML conventions are a good thing.Even if you use a generic XML editor, which, from the DTD, allow you to type SGML-like code, when distributing the XML file, it will be less editable and a bit less legible than its in SGML form.SGML is specifically designed to be input by human, processed by machines (e.g. to produce postscript documents), stored in their unprocessed form, and read in their unprocessed form.XML is designed to be generated by machines (but still be possible to input by human... just more painfully than with SGML), for example, from another more human-friendly form (e.g. an SGML form) or purely generated by machine (e.g. for the SOAP protocol), processed by machines, stored in a machine-readable relatively-human-legible form.Because the need of storage of a human-friendly form, I think that XML with a good editor cannot replace SGML well.It's not too bad, though.Something that could be interesting: Create an SGML format for every specific purpose (as currently, many people create an XML format for every specific purpose), input documents, archive them -- because everything that has been directly typed by a human ought to be archived; personally, I archive everything I type -- translate them to XML, store the XML file in a quick to access database, then XSLT to a specific language of interest (e.g. XHTML), and, if judged useful, convert from XHTML to HTML by minimizing tags to save bandwidth and produce more legible documents.

That would be way too technical to worry newbies about. Paraphrasing it into something more newbie friendly would be better, such as perhaps:"HTML files are fault tolerant. If you make a mistake in them, the browser will try to recover from it.HTML files can contain XHTML content, which is unfortunately also fault tolerant.XHTML files however are not fault tolerant, forcing you to write good code that will be compatible across browsers that support XHTML."It's just that speaking about "files" is closer to a newbie then a "MIME type" or "tag soup" or "XML well formness".

You're right. It's much better.

boen_robot · July 20, 2007

Do you have a blog or something? Such a long post could have been put there.
Not yet, but I intend to write my own blogging system.

Yeah. Me too.There's actually even a standard equivalent to SUBDOC in XML - XInclude. Actually, XLink also defines it's own way of referencing (parts of) other documents, placing them right there in the output. Creating a custom method could also be done easily, as you said. I guess SUBDOC was not added to keep the core language clean. And by clean I mean "easy to learn, implement, etc.".The CONCUR thing is interesting. But if you ask me, the core WG wouldn't implement it in that way. What I think they might agree on is if it's something like that:

<both:element xmlns:both="URI1 URI2" xmlns:first="URI1" xmlns:second="URI2">

Not only it would be easier to implement and easier to make backdraw compatable (since the URI spec doesn't allow a space in a single URI anyway), but I actually think it's more readable too (again: readability is subjective). I admit it would be more verbose though, so the shortened form you're suggesting could also be useful if agreed upon.Then again, the combination of namespaces with XInclude could allow you to include the document itself with another namespace, which would have been something like the CONCUR feature. That is, if XInclude didn't preserved namespaces. So an extension to XInclude to tweak the namespace resolving is probably a more possible solution.In any case, we'll have to wait until something like XML 2.0 or XInclude 1.1 is in the W3C cloud.

Maybe, you're thinking like a machine. Congratulations!

Thank you. You're too kind

.

It's hard to find a good XHTML editor.Typing it manually with a generic XML editor, or even a generic text editor, is usually easier then trying to make an XHTML editor, do its job correctly.And, of course, WYSIWYG editors are awful.

I never said it's easy to find one. It's just how it was all supposed to be in a perfect world and as we all know, the world is not perfect

.

SuperKoko · July 20, 2007

I guess SUBDOC was not added to keep the core language clean. And by clean I mean "easy to learn, implement, etc.".

I wholeheartly agree.This thing is better put outside of the core language, as it would uselessly bloat parsers and applications that would not do anything with it, but just ignore it!

The CONCUR thing is interesting. But if you ask me, the core WG wouldn't implement it in that way. What I think they might agree on is if it's something like that:
<both:element xmlns:both="URI1 URI2" xmlns:first="URI1" xmlns:second="URI2">

That's cleaner, indeed.It has also one advantage, it gives a better notation when more than two namespaces are involved... For example, it gives a way to provide a prefix for three or more namespaces inside a set of ten namespaces!

I admit it would be more verbose though, so the shortened form you're suggesting could also be useful if agreed upon.

Allowing both forms is not a great idea as it would complicate things, and gives dilemma...Your form is more consistent with the spirit of namespaces. It's more symmetric, and I like it much.That time, I find it more readable too.Unless there are many concurrent documents, I don't think it would bloat the contents too much.

Not only it would be easier to implement and easier to make backdraw compatable (since the URI spec doesn't allow a space in a single URI anyway)

This is not very different for xmlns:both="first=URI1 second=URI2"... Once the space is detected, it's easy to see what it means.But, again, I find your notation more orthogonal, and more flexible.I find it refreshing. It improved my view of the concept. Really.If a shortened form had to be provided, maybe something like:

<both:element xmlns:first="URI1" xmlns:second="URI2" xmlns:third="URI3" xmlns:second_third_and_another="second third URI_another">

The value of an xmlns attribute would be a blank separated list of URI and namespace names defined by previous attributes.Something should also be made to ensure that URI and namespace names are not confused. Maybe a notation nm://namespace_name.I didn't weight the benefit against the drawback of this shortened form, so it may be a bad idea. [History shows that untested ideas are often bad ones -- like 'export' and exception specifications in C++]Ideally, if this shortened version had to be introduced, it would not be introduced at the first version of the "concurrent namespaces" feature, but only at the next version if, and only if, the experience proves that the verbosity is a problem.

There's actually even a standard equivalent to SUBDOC in XML - XInclude.

I'm happy to learn that.Overall, I'm amazed to see how the XML "horse" is powerful, while being so much lighter than the robust SGML "camel"...Namespaces and XML schemas are astonishing... They couldn't be thought of, in the full SGML spec... But became a relatively natural idea in XML which is basically a dumbed down SGML... Through removing things, new concepts appear... Less is better, sometimes.

xhtml tutorial inaccuracies

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Archived