Can't HTML be more compact as a language?

martin_angelov1992 · September 22, 2012

Hello, I recently started trying to make html that is sent to browser more compact. I made a template engine that keeps original html files but compress files that are sent to user browsers by removing whitespaces. I also started doing stuff like name="<?=$input['uril']?>", which equals something like name="1" instead of what people use like name="url", to save even more space. I can do this with class names and IDs so at browser it is shown identifiers like 1,2,3 instead of footer-content and such and I still will know at my code at server which is content-footer due to it can be <?=$class['content-footer']?> there.Also I know that Gzip compressing can be done but I feel that even more compressing can be done.Problem is, tags like "div", "script", "head" and such can't be replaced with identifiers like 1,2,3 (atleast I haven't heard of a way). I see at html5 there are more comprehensive words tags like "audio" and "video". And the attributes like "href" and "type" are also a problem.My idea is, can't HTML be like C++ for example? I mean C++ is easy to read by people and extend/develop but when it has to be executed by a computer, it is transformed to ASM and thus a program is small and easy for computer to read.I think that the computer should not be given comprehensive words to read, but bytes instead.Is there a reason that something like this haven't been done with HTML, and are there any plans for this to be done?I didn't find more proper place to ask this. Thanks.

Edited September 22, 2012 by martin_angelov1992

Ingolme · September 22, 2012

HTML is not a programming language, its purpose is completely different to C++. It is meant to describe data.HTML was designed to be human-readable. There are no plans to make it more compact. The syntax <?=$class['content-footer']?> is a PHP syntax (and actually is not supported on all servers) shorthand for <?php echo $class['content-footer']; ?> As for class names: It is best that you give classes and IDs a descriptive name so that people can understand what the element is for because semantics is an important feature of HTML.

martin_angelov1992 · September 22, 2012

Yes but, I kind of feel that there must be a way to send to browsers, something that is easy (and compact) for them to read, not something that is easy for people to read. Most people don't use source code to understand what an element is for .

boen_robot · September 22, 2012

At W3C*, there is already EXI, but no browser has native support for it yet.If you're enough of a C++ dev, consider either writing your own EXI parser for one or more Open Source browsers (the closed ones will follow if both Mozilla and Chrome have it) OR take an existing parser (e.g. EXIP) and hook it into browsers.HTML was originally designed to be a language based on the SGML markup language (similarly to how XHTML is an XML based language). SGML is designed to "save" typing, making the implementation figure the rest based on the language's DTD. This in turn means far less bytes being sent. Having mentioned EXI's problem above, can you guess what SGML's problem was?Yep, you guessed it. No browser supports the full features of SGML. This is because they never had an SGML parser to begin with, but made "HTML parsers" that aren't actually compatible with SGML.* In case you don't realize it, W3Schools is not affiliated with W3C.P.S. Damn... me, you and EXIP's developer are all Bulgarians... doesn't anyone else think about these things? :lol:

davej · September 22, 2012

Why bother? If the webpage contains even one image that image file is probably vastly larger than the html file.

martin_angelov1992 · September 22, 2012

Images get cached

boen_robot · September 22, 2012

Well, strictly speaking, other assets can be cached as well... unless we're talking about a PHP page that's displaying some up-to-the-moment information (like in a forum, or some sort of item list).And in that case, the time that you'd save over the wire will be spent encoding and decoding the content, so minifying the HTML (by eliminating the white space) and doing a GZip compression is pretty much more than enough.

martin_angelov1992 · September 22, 2012

Most used pages are the PHP pages and indeed the fact that most people wait 1,5 seconds to see a page and not 1 or less second, is not a big pain but I still feel that improvements can be made, like creating a new more light-weight web protocol for example.

boen_robot · September 22, 2012

Yes, but again, we're talking not just about PHP pages that are just "being" PHP pages... like most pages on the web. Those can be cached. The fact that people often don't do that is a developer problem, not a technology problem.No, we're talking about those that are applications. Which can't really be cached, since they change too often.One of the biggest problems for such dynamic applications is the maintenance of session information, which is actually why a lot of work that could otherwise be done by the client is done by the server, thus making caching non-feasable.Browser vendors are addressing that part with Google's SPDY protocol, which is an extension of HTTP, and will eventually be formalized into HTTP 2.0. Combined with web sockets and app manifests (which are implemented in the latest browsers today), you'd pretty much be able to cache the app itself on the client (which may slow down the initial load, but significantly improve overall performance) and only download the actual data over the wire (rather than the data, and the whole page, as you do currently).

martin_angelov1992 · September 22, 2012

Thanks.Its nice that you are informed so much. I must admit that I never heard of EXI and SPDY. I did a search before posting here, but I couldn't find any information on how to make web pages more light except the html and js minifying and gzip compression methods. It would be nice if this SPDY gets green light.

davej · September 24, 2012

Generally I always see a second or more delay when going to a new address and it certainly isn't the 50kB html file that is requiring a second of transmission time -- because it should require less than 0.025 second.

justsomeguy · September 25, 2012

I would tend to agree that other bottlenecks like browser rendering speed and network bandwidth have a much greater effect on the time to download and display a page then the size of the actual content. It has already been pointed out that images aren't compressed when they are sent and a single image may be larger than all other resources. Minifying HTML or Javascript and compressing the data over the wire reduce the file size significantly enough that there are really other issues at play at that point. I have a Javascript file that's almost 1.5MB in the development version, and once it gets sent by the server it's only about 140KB. That means it takes a very short amount of time to effectively download 1.5MB of code. The delay in the application has to do with the browser rendering the interface, not the time to download the code. This is especially true with older browsers like IE6 or IE7.

davej · September 25, 2012

I'm sure someone has made a nifty graph of all the delays -- but I couldn't find one in a quick search. If anyone finds or knows of a nice summary please post the link.

Edited September 25, 2012 by davej

justsomeguy · September 25, 2012

You can see a timing graph in most browser developer consoles.

Can't HTML be more compact as a language?

Recommended Posts

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in