gidmeister2 Posted January 31, 2015 Share Posted January 31, 2015 I created a program that downloads web-pages, then parses html, removes dangerous tags and attributes, and lets the user specify to some extent what he wants to keep. There are "html sanitizers" out there, but the ones I used (with asp.net pages) were defective. Unfortunately I wrote the program without a parsing guide to html, I just took the tags I knew, and put them in a stack, and then popped the stack etc. I would think w3schools would have a guide to all html, so that I can just feed that guide into a parser, perhaps indicating which tags are dangerous. So my question is - where is that guide? Thanks Link to comment Share on other sites More sharing options...
Ingolme Posted January 31, 2015 Share Posted January 31, 2015 There's a reference of every single HTML tag here: http://www.w3schools.com/tags/default.asp The rules to parse HTML are pretty complicated because it allows so many unusual syntaxes. Some tags don't have closing tags, element and attribute names aren't case sensitive, attributes may or may not be wrapped in quotation marks. Building an HTML parser from scratch would require a lot of work. Link to comment Share on other sites More sharing options...
amandakilimanjaro Posted February 1, 2015 Share Posted February 1, 2015 why not use the w3.org tag list they define the standard html tags Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now