I created a program that downloads web-pages, then parses html, removes dangerous tags and attributes, and lets the user specify to some extent what he wants to keep. There are "html sanitizers" out there, but the ones I used (with asp.net pages) were defective.
Unfortunately I wrote the program without a parsing guide to html, I just took the tags I knew, and put them in a stack, and then popped the stack etc.
I would think w3schools would have a guide to all html, so that I can just feed that guide into a parser, perhaps indicating which tags are dangerous. So my question is - where is that guide?
Thanks