Jump to content


  • Posts

  • Joined

  • Last visited

SuperKoko's Achievements


Newbie (1/7)



  1. If possible, it would refuse the creation of such thread with a message:Please, use a short, descriptive summary of your problem, and don't use non-descriptive words such as "help me", "need help", "urgent!!!!!!!!!!!!!!!!!!!".Another possibility: Automatically sending a PM to the thread author, so that he modifies the title of his thread himself.The first solution is the better, if it's possible.
  2. You are joking. Are you?Malicious guys would use it, as a feature to remove threads!
  3. I wholeheartly agree.This thing is better put outside of the core language, as it would uselessly bloat parsers and applications that would not do anything with it, but just ignore it! That's cleaner, indeed.It has also one advantage, it gives a better notation when more than two namespaces are involved... For example, it gives a way to provide a prefix for three or more namespaces inside a set of ten namespaces! Allowing both forms is not a great idea as it would complicate things, and gives dilemma...Your form is more consistent with the spirit of namespaces. It's more symmetric, and I like it much.That time, I find it more readable too.Unless there are many concurrent documents, I don't think it would bloat the contents too much. This is not very different for xmlns:both="first=URI1 second=URI2"... Once the space is detected, it's easy to see what it means.But, again, I find your notation more orthogonal, and more flexible.I find it refreshing. It improved my view of the concept. Really.If a shortened form had to be provided, maybe something like:<both:element xmlns:first="URI1" xmlns:second="URI2" xmlns:third="URI3" xmlns:second_third_and_another="second third URI_another"> The value of an xmlns attribute would be a blank separated list of URI and namespace names defined by previous attributes.Something should also be made to ensure that URI and namespace names are not confused. Maybe a notation nm://namespace_name.I didn't weight the benefit against the drawback of this shortened form, so it may be a bad idea. [History shows that untested ideas are often bad ones -- like 'export' and exception specifications in C++]Ideally, if this shortened version had to be introduced, it would not be introduced at the first version of the "concurrent namespaces" feature, but only at the next version if, and only if, the experience proves that the verbosity is a problem. I'm happy to learn that.Overall, I'm amazed to see how the XML "horse" is powerful, while being so much lighter than the robust SGML "camel"...Namespaces and XML schemas are astonishing... They couldn't be thought of, in the full SGML spec... But became a relatively natural idea in XML which is basically a dumbed down SGML... Through removing things, new concepts appear... Less is better, sometimes.
  4. Good try.Strangely, your guess was wrong.The vendor distribute a browser whose name start with a 'O' and ends with a 'a', with a 'p', a 'e' and a 'r' between those two letters.And, this is my favorite browser.
  5. Right, readibility is subjective.
  6. Right, one advantage of XML over SGML is that more tools are available.On the other hand, it's possible to first input the data in SGML, translate it in XML with a command line tool that uses SGML, and then, internally use only the XML form. The SGML form would be archived, of course. The HTML5 specification defines, in a single document, the HTML5 DOM, which is an abstract view of the data tree, as well as two syntaxes, and an API, for this DOM. This is one of the good sides of the HTML5 specification.The two syntaxes are the HTML syntax (looking like SGML, but not an SGML application), and the XML syntax (a real XML application).HTML5 with the HTML syntax is called "HTML5".HTML5 with the XML syntax is called "XHTML5".There are a few differences, related to the limits of syntaxes, though: XHTML 2.0 will have very few elements, and will be more based on good semantic practice such as specifications of the role of elements with role attributes, so that, microformats will be used as much as possible, giving an extremely dynamically extensible language, giving a good presentation in every browser, and yet, allowing anybody to create extensions that will be displayed or handled even better in browsers recognizing the new element roles. The elements will still be recognized in browsers unaware of this role, and properly output, but the semantic benefit of the element role won't be used by these browsers.XHTML5 and HTML5 continue in the way of HTML 4.01, but dropping the SGML part not supported by current browsers, and adding new elements, some of them being really useful.The divergence of XHTML5 and XHTML 2.0, in my opinion, proves that the Web standardizing community has internal conflicts... This is a bad thing.Some people say that the W3C wants to drop HTML in favor of XHTML, though, the W3C didn't say that...But, the position of the WHATWG is relatively clear: I predict flamewars of WHATWG-HTML/XHTML vs W3C-XHTML...For an unaware person, wanting to learn an hyper-text language, which one to choose? XHTML5 or XHTML 2.0?I predict conflicts that will be much worse than the gentle flamewars of XHTML 1.0 vs HTML 4.01.The war begun with the creation of the WHATWG, with some reactions of gurus:http://www.molly.com/2007/06/14/defy-the-p...t-stop-for-now/With hot reactions.Issues are raised... HTML5 drops the document type... How will browser recognize HTML6?HTML5 browsers are required to render HTML 4.01 code, interpreted as HTML5.The slogan "fixing the web" is a strength of the WHATWG advocates, as it's easy to argue that, without giving any argument as to how it fixes anything, and how there were things broken...Yourself, how would you choose XHTML 2.0 vs XHTML5?Myself, I'm not sure.Google and wikipedia will give you more information about the WHATWG...
  7. Not yet, but I intend to write my own blogging system. I guess there are three levels of understanding of HTML code: basic, intermediate and advanced.At the basic level, omitting tags improve the clarity of code.For example, ask your little sister, grand mother or friend who never read any line of HTML, what this document means:<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><title>Font test page</title><p> The quick brown fox jumps over the lazy dog.<ul> <li> The sentence above contains all the alphabet characters. <li> It's short. <li> It's not my dog! Mine is not lazy!</ul> If necessary, explain the elements they don't understand, after some time. <li> means "list item", <p> means "paragraph", the DOCTYPE line describes the format of the document.Then, try again with this document: <?xml version="1.0"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><html xmlns="http://www.w3.org/1999/xhtml/"><head><title>Font test page</title></head><body> <p> The quick brown fox jumps over the lazy dog.</p> <ul> <li> The sentence above contains all the alphabet characters.</li> <li> It's short.</li> <li> It's not my dog! Mine is not lazy!</li> </ul></body></html> For an intermediate developer, who learnt HTML or XHTML and see which elements are in which elements, and basically understand how they nest, but doesn't know the DTD well, then, the second version is maybe, more clear.But, for an advanced developer, both are clear. The first one is just smoother and quicker to decode.Note that the p element is an abbreviation for paragraph.For somebody who never read a single line of HTML code, if the element was named paragraph, it would be clearer.But, I'm sure that, after thirty seconds of HTML learning, <p> becomes very clear for ever. It's simple enough to visually mark the source code, as if it was an icon.So, for intermediate and advanced programmers, <p> and <li> are perfectly clear, and contains less noise than if they were called <paragraph> and <list-item>. It's fine when reading source codes contain few markups and much text... But, when reading source code that contains 70% of markups, any redundancy hides the true content of the document!When reading a source code, you're often happy to be able to run smoothly over an attribute value (except for class attributes, which are quickly recognized by the long class= prefix), because they're far from being essential part of the document; they're neither part of the essence of the structure of the document, not part of the contents of the document.Are you really sure of your statment.<element lang="fr"></element><element lang=fr></element><!-- Are not the fr letters more visible in the second sample? Well, that's probably pretty subjective --> Moreover, machine-generated content usually contains repeated patterns (e.g. posts on forum).When reading source code, the first one is long to read.But, for the next ones, you can visually recognize the pattern, and identify the elements.This is easier if the number and size of markups is minimal.This become obvious when looking at more extreme syntaxes.Have a look at the HTML 3.0 math formula syntax (HTML 3.0 was far too ambitious for its time, and the project was drop).To read the samples, you just need to understand that _ means <sub>, ^ means <sup>, { means <box>, } means </box>.http://www.w3.org/MarkUp/html3/maths.htmlNow, compare it to the presentational MathML.http://www.w3.org/TR/MathML/chapter2.html#fund.examplesIf you had to read more than three formula on a math web site. Which syntax would you prefer? Maybe the code samples you've read are too simple, and you didn't need to seek for the content hidden behind a tag noise.Maybe, you're thinking like a machine. Congratulations! For writability of anything that contains many tags (HTML code is full of tags), there are quite simple criteria. Number of keystrokes. Number of carpal tunnel syndrome. An XHTML editor that lets yourself omitting closing tags and automatically emit them, either after you open the element or, preferably, when you open a new one which close the previous one, will be good for the above described criteria. SUBDOC is not a very needed feature, as it can be simluated with specific XML elements. It's just a way to standardly create documents in several parts and group them together as you wish.HTML doesn't have the SUBDOC feature enabled.With the SUBDOC feature, you could, for example, define a book (e.g. a W3C recommandation) which contains a number of sections and subsections.For each subsection, a file would be produced, as for the W3C HTML 4.01 recommandation.Then, it would be possible to define, a main page that includes the contents of all the sections, and gives a result that could be compared to the all-in-one-page HTML5 draft.Benefits: You could download the entire book at once, by following the sub-documents links. And yet, the subsections would still be in separate file, easy to acess individually. You might even use an editor that has several rendering mode for the page... One that replaces sub-documents with links working like usual anchors, and another that literally puts the contents of the sub-document inside the mother document. Currently, following the anchors is not a good idea, as some of them point to external resources.As I said, this feature could easily be put in XML. For example, using the simple convention (that could make a candidate for a microformat) of using iframes or anchors with a specific class type, to integrate sub-documents.Sub-documents are declared as general entities, in the internal subset, typically.If HTML did support SUBDOC, the syntax would be: <!-- Invalid HTML document; HTML doesn't support the SUBDOC feature --><!-- This illustrates the use of the SGML feature --><!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" [ <!ENTITY section1 SYSTEM "section1.html" SUBDOC -- note that the sub-document may be a different SGML application, with a different DTD... if the document interpretor supports it --> <!ENTITY section2 SYSTEM "section2.sgml" SUBDOC>]><title>Entity test</title><h1>Book</h1> <h2>Section1</h2> &section1; <h2>Section2</h2> &section2; Reference: http://www.is-thought.co.uk/book/sgml-6.htmThe feature of SGML that XML lacks is not SUBDOC, but CONCUR, which is disabled in HTML too.Concurrent documents allow one to specify two different trees, for two different document types, in a single document, without duplicating the contents.Because, a single document often contains more than one way to express the data.The idea is great, though the way SGML did it is not perfect.http://www.is-thought.co.uk/book/sgml-10.htm#CONCURFor example, one may want to show a blog as an RSS feed as well as an HTML or XHTML document.(Currently this is possible by using XSLT as an XML stylesheet translating the document to XHTML).You may also put logical information that has already been standardized, into the XHTML document. This is the most interesting thing, in my opinion.For example, you may want to put RDF semantic information for many resources for which you give in legible XHTML format in an XHTML site, blog or forum, and yet want to give the RDF form to automatic tools.Thanks goodness, for simple semantic information, we have microformats. Microformats are an XHTML specific way to put several document trees into one... Since several classes can be given to a single element, microformats have the power of giving as many "document views" as you wish of a single document.But, microformats are not generic. They're specific to HTML and XHTML, and cannot be generalized to arbitrary XML documents. Moreover, they give an assymetry between the XHTML or HTML document and the concurrent document, and I feel they are not general enough to express very complex trees.I don't argue that something similar to CONCUR could not be implemented in XML. There's just no current standard for that, and XML parsers don't handle this type of thing.I can even think of a candidate syntax... Extending the notion of namespaces. If a day, XML parsers adopt it (i.e. permitting the software to have only the view of the document types it understands without bothering at all with what it doesn't understand), this may be a powerful thing. <!-- this element may be at a deep level in an XML document --><both:element xmlns-concurr:both="first=URI1 second=URI2"> <first:second:someElement> <!-- first:second: is equivalent to both: --> <first:element1> <second:element2><second:element3> <xmltext:both>Text data emitted in both documents</xmltext:both> <!-- xmltext would indicate that the PCDATA will be emitted in the specified documents --> <first:element4> <xmltext:first>Text data emitted only in the first document</xmltext:first> </first:element4> </second:element3></second:element2> </first:element1> </first:second:someElement></both:element> Of course, you could use the "global" namespace to store any of the three namespace prefixes.The first:second: notation could be useful if more than two concurrent documents are used.For example, if there are ten concurrent documents, with namespaces names first, second, ... ten. The five:three:seven: prefix would open use these three namespaces.Would be a way to specify the two documents: <element xmlns="URI1"> <someElement> <element1> Text data emitted in both documents <element4> Text data emitted only in the first document </element4> </element1> </someElement></element> And: <element xmlns="URI2"> <someElement> <element2><element3> Text data emitted in both documents </element2></element3> </someElement></element> Actually, the notion of "valid" is the same in SGML and XML. Validity measures the syntax as well as the grammar checked against a DTD.XML has a weaker notion. Well-formedness. A well-formed document is a document that has no obvious syntax error when ignoring the DTD.SGML is too versatile and dependent on the DTD to make this notion useful, as, without looking at the DTD, parsers really cannot say whether the document looks syntaxically correct or not.This is mainly due to tag omission and short references.In the order of correctness, for HTML:semantically conforming > syntaxically conforming > valid.In XHTML:semantically conforming > syntaxically conforming > valid > well-formed.The difference between "semantically conforming" and "syntaxically conforming" is not done by the standard, but only by me...Syntaxically non-conforming means that, automated tools may see that something is wrong.Semantically non-conforming means that, only human people can see that it's wrong.<?xml version="1.0"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- well-formed, invalid XHTML document><html><em> <hello> Hello!</hello> </em> <font size="48"> <tr><table> well-formed document!</table></tr> </font></html> <?xml version="1.0"? encoding="US-ASCII"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- Valid, syntaxically non-conforming XHTML document --><!-- Automated tools may complain --><html><head><title>Valid non-conforming document!</title></head><body> <ins>Inline content forbidden in an <ins> at block level!</ins> <pre> Inside a <small>preformatted</small> block, inline elements changing the font are <big>not</big> permitted. These subtle content models cannot be expressed with the limited XML DTD power. </pre></body></html> Similarly, a valid HTML transitional document may contain attributes like width="hundred%" which are not conforming! <?xml version="1.0"? encoding="US-ASCII"?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"><!-- Semantically non-conforming XHTML document --><html lang="fr" xmlns="http://www.w3.org/1999/xhtml"> <!-- argues to use french language --> <head><title>Hello world!</title></head> <!-- title in english --> <body> <p>Some dialog recorded there</p> <ul> <!-- a dialog is an ordered list! If the XHTML interpretor re-orders the ul, it will mess up the dialog --> <li>How far are you from <acronym title="New extended wall">New</acronym> York?</li> <li>22 kilometers.</li> <li>How tall are you?</li> <li>1m70.</li> </ul> </body> </html> For the conformance and validity notions, XHTML and HTML are the same.Well-formedness is specific to XML.But, the wording of the W3schools expressed the idea that the notion of conforming or even valid HTML document, doesn't exist. SGML is perfectly processable by computers. The parser is just much more complex to write than for XML.However, it only needs to be written once! Theorically, writing a SGML parser in C or C++, portable to all platforms, is not hard.So, once guys have written libsgml, nobody should have to re-write it.Unfortunately, on the Win32 platform, with all the commercial issues, everything tends to be re-written by everybody. It's hard to find a good XHTML editor.Typing it manually with a generic XML editor, or even a generic text editor, is usually easier then trying to make an XHTML editor, do its job correctly.And, of course, WYSIWYG editors are awful.SGML was designed to be an editable form of documents... Editable with a simple pure text editor.That's what I call a human-friendly format.Moreover, everything that is directly produced by a human, with a text editor, tends to be legible in this form.Indentation is optimal. The places where tags are omitted, empty start or end tags are used, are naturally chosen for optimal legibility. The choice of legible, intuitive, simple, short references, can improve legibility and writability too.Automatic indentation gives correct results, but manual indentation, gives better layouts, because of subtle exceptions to general rules, that human people naturally adopt.The same format for every platform... No need to have a specific editor. I think that SGML conventions are a good thing.Even if you use a generic XML editor, which, from the DTD, allow you to type SGML-like code, when distributing the XML file, it will be less editable and a bit less legible than its in SGML form.SGML is specifically designed to be input by human, processed by machines (e.g. to produce postscript documents), stored in their unprocessed form, and read in their unprocessed form.XML is designed to be generated by machines (but still be possible to input by human... just more painfully than with SGML), for example, from another more human-friendly form (e.g. an SGML form) or purely generated by machine (e.g. for the SOAP protocol), processed by machines, stored in a machine-readable relatively-human-legible form.Because the need of storage of a human-friendly form, I think that XML with a good editor cannot replace SGML well.It's not too bad, though.Something that could be interesting: Create an SGML format for every specific purpose (as currently, many people create an XML format for every specific purpose), input documents, archive them -- because everything that has been directly typed by a human ought to be archived; personally, I archive everything I type -- translate them to XML, store the XML file in a quick to access database, then XSLT to a specific language of interest (e.g. XHTML), and, if judged useful, convert from XHTML to HTML by minimizing tags to save bandwidth and produce more legible documents. You're right. It's much better.
  8. Strangely, I read the first line quicker and smoother than the second line, while I thought it would be less legible...I didn't expect that.Anyway, used sparingly, it isn't harmful. Personally, I don't like it much, but, it is in the standard, and, if I had to write a browser, I would support in "strict" mode. You're allowed to.
  9. It shows that the HTML SGML declaration includes the DATATAG feature (part of the SHORTTAG set of features), which makes:<element/Some text/ Equivalent to: <element>Some text</element> This is not a well known used feature, though that might be useful for things like <em/a few words/.That makes <br/> have different semantics than in XML... In HTML, it is equivalent to: <br>> Browsers are not the only tools to parse HTML.The non-conformance of browsers, make the interpretation by libsgml, or other SGML tools, of HTML code, different.If you think that, non-conforming browsers behavior impose the standard, then, you should recognize that CDATA is not part of HTML, as it's supported by no browser... But Opera, which, from the de facto standard of all other browsers, is buggy. That causes problems for: People wishing to use this not-very-much-used feature. People using proper tools such as libsgml. Something working in a way that the standard doesn't agree with, and claiming to be conforming to this standard, has a bug.The intent of B.3.7 is to recall to implementors that these uncommonly used SGML features, that had a very poor browser support, are part of the language, and should be implemented, and recall to web authors, that they should be careful as most of them have a pretty bad support currently.I highly doubt you can convince anybody that CDATA is useless and should not be implemented. For example, when HTML code displayed in HTML documents, the bandwidth saved, and the very much increased legibility of source code are worthy.If it has a poor support, it's because browser vendors are more focused on interpreting tag soup and adding new fancy features, than introducing features that don't appear in the "HTML for dummies" book, and so are not used by enough people.
  10. XHTML 1.0 is a reformulation of HTML 4.01 as an XML application, and, for example, can be used to store the contents of a blog articles and comments inside enclosing XML files, and then, be served either as XHTML or HTML... A translation from XHTML to HTML being possible, ensuring that bandwidth is saved by omitting useless tags, and using tag minimization.However, HTML5/XHTML5 and XHTML 2.0 seems to fork... In my opinion, this is a bad thing, and may cause many problems. Personally, I don't like the way HTML5 is taking... Making it a non-SGML application because... Just because no browser on earth is correct enough to use a SGML parser... So, this becomes, de facto standard.Nevertheless, HTML5 and XHTML5 will be equivalent languages with different syntaxes. HTML5 is not the latest version... It's the "next" version, and isn't likely to be released before some time.Well, once, I sent a bug report to some browser vendor for a defect in the support of HTML 4.01, and they answered that it is "irrevelant", because this HTML feature wouldn't be present in HTML5...So, you're somehow right, HTML5 is already in use for not fixing bugs in current browsers, because HTML5 is about admitting that current tools are broken and normalize their current broken behavior. e.g. Microsoft doesn't really participate to this working group, but everything will be done to adapt the standard to make IE conforming.Okay, I stop bashing HTML5. HTML5 has good sides too. I understood that the W3C didn't specifically recommend using upper case, but only told that, for the sake of clarity, they would use, in the W3C standard document, uppercase letters for element names and lowercase letters for attributes.This paragraph is in the "Document conventions" section.The point is to make a clear difference, in the standard paper, between these two types of tokens.Similarly, if you read some other standard papers, you may see that, there are conventions such as underlining identifiers and highlighting keywords to make them appear differently... That doesn't mean you must (or even can) do it in real code.However, it's sure that, the W3C endorses this practice, and, a normal author who read the W3C standard, would normally tend to use this canonical notation.Moreover, the sentence "the convention is meant to encourage readability", seems to indicate that this convention is not limited to their document, otherwise I don't see why they would have used the word "encourage".So, you're right. Or, simply, libsgml... to translate it to an XML document and manipulate it with thousands of XML tools.I agree on the fact that, the XML form is more convenient to process with tools. There are more XML tools available than SGML tools. You're right, although the behavior of IE for an XHTML DOCTYPE, HTML 4.0 Strict DOCTYPE and an unrecognized DOCTYPE is identical.The quirk-mode/strict-mode switch algorithm is available at:http://msdn2.microsoft.com/en-us/library/ms535242.aspxBut that's not real XHTML support. IE just has XHTML in its "non-tag-soup" list of markup languages.
  11. No, C++ is not "the next version of C".HTML 4 renders HTML 3.2 obsolete (i.e. now, HTML means HTML 4), and, everybody is strongly advised to use HTML 4 in new Web developments. Though, you're still allowed to use it.Among other things, it means that, flaws, small and big bugs in the documentation of HTML 3.2 won't ever be corrected.HTML 3.2 is not maintained anymore by the W3C.Similarly, C99 renders C90 obsolete. C90 won't have new any Technical Corrigenda. The "C programming language" now is C99.Note that, to convert C to C++, you need changes, very often.Try to compile this valid C90 program in C++.enum c{r,g}c(c) int c; {return c;}char static*p0;static void*new=(void*)(const struct v {struct i {char k;} k;}*)(sizeof(int)//**/sizeof('a')-sizeof(*p0));static char*p0=("hello world$"+1)-1;const struct v ex={'a'};main(){extern const struct v ex;enum c{r,g}c();char arr[1]="$";struct p0 static;typedef struct i v;struct k {v i;int v;} auto k;auto u=ex.k.k;if(0[arr]!=*p0){goto lbl;return;}if(*p0==c(0)){char p1[2]={0};lbl:0[p1]=*p0++;1[p1]=sizeof(int)-sizeof(r);puts(p1);main();}return 0;} Note that most changes from C90 to C++ are straightforward (the main incompatibility are implicit conversions from void* to any_object_type*), but C99 and C++ are quite incompatible. For example, the floating point number computing in C99 is much richer than in C++.C++ includes too many features to be easily and efficiently implemented on all platforms.There are good open source C++ implementations (correctly conforming, with zero-overhead exceptions, and with not-too-code-bloating templates) on major platforms (excluding Win32 on which I'm not aware of any implementation without huge-overhead-exceptions).There are many reasons to prefer C over C++, in specific contexts, as there are reasons to prefer C++ over C.So, C++ and C are different languages. Both are useful, but not for the same projects.Compatibility with legacy code is one reason, among others.Once I wrote about it here:http://phpfi.com/233158That's why, there's still a huge demand for trained C engineers.http://www.tiobe.com/tpci.htmPersonally, in the projects I'm involved in, I mostly use C++. But I don't see it as the next version of C.Ideally, C++ ought to be a strict extension of C. That's what Bjarne Stroustrup thinks.But, in no way is it a successor.Objective-C is a strict extension of C.So, in this context, is Objective-C or C++ the successor/next-version of C? Which doesn't interact well with the DATATAG feature of SGML:<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"><!-- html and head tag implicitly open --><meta http-equiv=Content-Type content="text/html; charset=us-ascii"><title/Valid but unsupported document!/<!-- head tag implicitly closed, and body tag implicitly open --><p> This is a <em/valid/ and <em/conforming/ HTML document. If you don't believe me, read ISO/IEC 8879:1986.<!-- p implicitly closed and reopen --><p> You can try with the W3C validator.<!-- p, body and html implicitly closed--> The buggy ignore-slash thing, makes it impossible to support this SGML feature.Anyway, I don't put this feature high on my wish list for browser features. CDATA and RCDATA support would be much more useful. HTML doesn't require things to be marked up correctly???HTML requires things to be marked up correctly. But the notion of well-formed document is specific to XML.In XML, an acceptable document can be well-formed or valid.A well-formed document has a root element, properly nested elements, properly quoted attributes, etc., but is not necessarily valid.A valid XML document is a well-formed document that has a document type declaration and respects the grammar implied by the DTD.For example, this document is well-formed but invalid:In SGML, there's a single notion: Validity[/i]. A document is either invalid or valid, it can't be half-valid...The notion of well-formedness can exists in XML, because XML documents can be parsed (not perfectly, but enough for many usages) without the context of a given DTD.In SGML documents using tag minimization, an SGML declaration and a DTD are needed to parse a document, because the declarations in the internal and external subsets are required This is required for the SGML parser to find the implicit start and end tags, to translate short references to the contents of their entities, to interpret attribute specifications whose attribute name has been omitted when the attribute value is limited to a declared list of values, and for entities expansion.So, on this point W3Schools is right, but confusing. Same as in HTML. I didn't expect that from you. I thought you were an heavy SGML user...Attributes values don't need to be quoted in HTML if they contain only name characters ([A-Za-z0-9_.-:]).This is due to the SHORTTAG feature set to YES in the SGML declaration of HTML.See:http://www.w3.org/TR/html401/HTML4.declA good reference is available at:http://www.is-thought.co.uk/book/sgml-5.htm#AttributesThere are many misconceptions about SGML.I would like to point that: SGML is more human friendly (both writable and readable) than XML.Anybody who uses SGML everyday will confirm that point.Closing every paragraph, list item, table cell, and table rwo explicitly puts visual garbage into your document, and slow down both reading and writing. This is especially annoying when the tags/character-data ratio is raised over the limit of 50%.It's redundant, because when opening a paragraph, it's clear that the previous one is closed...It may be useful to explicit close every tag for an SGML or XML application with which you won't write more than very few documents, when you don't know well the DTD yet.However, it's obvious that for a network protocol such as SOAP, requiring a heavy DTD-aware parser would be an overkill. XML is more computer friendly than SGML, though it's often argued to be neither human-friendly nor computer-friendly.The need of quoting every attribute is a good example.The SGML feature of not having to quote every attribute doesn't involve any DTD, so it could have been implemented in XML without changing the overall aims of XML.When writing more than one or two documents, this SGML feature becomes very time saving.Moreover, it produces perfectly readable documents. I highly doubt anybody could be confused by <element attribute=value>.It saves bandwidth. It's even a bit easier to read when there are many attributes because quotes (especially double quotes) reduce the visibility of the quoted content.However, it's a bit easier to write a XML parser which only recognizes quoted values.Moreover, I feel that XML was designed to be SGML simplified as most as possible. This feature wasn't needed. XML is more extensible than SGML with the OMITTAG feature.With omitted tags, adding new elements (and new short references) may cause some problems, sometimes, as the old software, when reading new documents, cannot know where the new elements are closed or opened (if they're implicitly closed or opened), unless the old software has a true, generic, SGML parser, and the document contains a doctype declaration pointing to a system (e.g. an URI) resource containing the new DTD.With XML, old software can easily ignore new elements.As I said, SGML used with generic parsers, is as extensible as XML. A generic SGML parser being far more complex to create than a non-validating or even a validating generic XML parser, XML marks a point here.Note also that, if new SGML elements have no implicit start and end tag, then, there should be no extensibility problem. XML is even more extensible, thanks to namespaces.Namespaces provide a finer grained specification of the content models of elements and attributes.But the main advantage of namespaces is there use as a modularization tool.It's possible (as in XHTML 1.1) to provide modules. The ones understood by the user application will be treated as needed, and others will be ignored.Namespaces are incompatible with the OMITTAG feature of SGML. XML simplifies transmission over Internet, because it has a single character repertoire ISO/IEC 10646 (aka Unicode).On this point XHTML and HTML are at the same point.But, for other SGML applications, with more uncommon SGML declarations, there are portability problems. On the other hand, for internal use, SGML is more flexible. Inserting XML documents inside other XML documents is easy with namespaces.e.g. an ATOM newsfeed can contain an XHTML element.With HTML, that's not that easy, though RCDATA may help.Note that an ATOM newsfeed can quite easily contain HTML code (in a CDATA section), IF the HTML code doesn't contain any ]]> sequence. If it contains such a sequence, then, there's no option but using many ugly < entity references. Finally, XML is easier to entirely learn by a human being who don't have much time to invest into the technology.This human factor seems a bit ridiculous to long-time SGML users who have gained much more time through the power of SGML then they've lost to learn it (With a good reference manual, one day should be sufficient to learn SGML), but I'm pretty sure this has been one of the major reason of the XML success. XML is good for unstable, quickly evolving, applications, especially if they're treated (parsed and produced) very much by computers, but not much by human persons.SGML is good for more stable applications (though, SGML applications are flexible enough to be extended quite well) OR for data that is to be massively produced by human persons and/or read by human persons.SGML also has advanced feature that may be very useful sometimes: SUBDOC and CONCUR (concurrent documents... The lack of this feature in XML is sometimes very painful).SGML is good too for internal usage in a bounded entreprise.Maybe there should be an XML editor that permit the user to type SGML (with RCDATA, SHORTREF, OMITTAG), and, then, translate it to XML.The only requirement would be a SGML DTD. The internal subset could be used, but would be automatically removed in the final output, at least for the SGML specific features.Overall, I prefer to distribute HTML 4.01 documents, though, for storage (e.g. as nodes of XML files) and manipulation, XHTML is fine.Is XML cleaner than SGML?The term "cleaner" is not appropriate here.I think that "simplier" in the sense "fewer A4 pages in the specification", is more appropriate.This is an advantage... It's quicker to learn and teach...Another advantage is that, since the average guy is much more likely to learn XML properly than SGML, he is more likely to produce correct XML document (though, for XHTML, this is not exactly true), while the average non-technical guy, who learnt HTML in HTML for dummies, is likely to produce awful HTML code.There is a VERY major factor here.Most HTML books, manuals, references and tutorials are bad, except the W3C recommandation itself, of course. They teach tag soup, not HTML.For XHTML, they are bad, but maybe LESS bad.That's sad, as, learning XML, by reading the spec, and then, reading the XHTML DTD, can be done in a few days, and allow anybody to learn the language correctly enough to use it.And, I'm sure that, I could teach in half an hour, the content model of HTML and XHTML: Which elements and text can be put in which elements, and which elements can have their start or end tag omitted.This is not sufficient to learn HTML or XHTML, but this is a first step: Teaching guys to produce conforming code (the W3C validator can help). This point is interesting. I remember having seen that the DOM produced for XHTML documents, by the Opera browser, included the tbody, even when it was not explicitly there... Which is not correct... This is correct for HTML, though.Well, IIRC, I tested that with XHTML documents served as text/html or whose file extension was .html, so, that's probably not revelant.Anyway, this XHTML oddity (allowing tr outside tbody), is annoying when writing CSS stylesheets, or XSLT, or simply, when interpreting XHTML code from source. Right, but that's not what is suggested by W3Schools.The W3Schools tutorial should maybe contain something like:"HTML, because of a long history, is treated as tag soup by browsers.XHTML, served as application/xhtml+xml, is interpreted strictly, which is an opportunity to end the vicious circle of developers producing worst and worst HTML code as the browsers are more and more tolerant and use more and more complex algorithms to guess the meaning of the code, which is one of the reason that makes new releases of browsers slower than older ones."I wonder why browsers don't support a "very strict" HTML mode? For example, when the ISO/IEC 15445 document type declaration is detected, this mode could be set.I hope my post is not too lengthy.Ok, it is.Excuse me for this.(Maybe some people have found interesting info in it).
  • Create New...