vchris 3 Posted July 30, 2008 Report Share Posted July 30, 2008 Anyone know of a great script that convert word files to valid xhtml? I don't like programs people create, they're usually slow and I can't put any custom stuff in there. A small script in coldfusion or other server side language could do the job a lot faster and could be customized easily. I'm converting large documents with large tables, images, table of contents, graphics... Quote Link to post Share on other sites
Natechs 0 Posted July 30, 2008 Report Share Posted July 30, 2008 Try HTML-Kit. It has the option to remove Word surplus tags and then you can convert that into XHTML! Quote Link to post Share on other sites
23.12.2012 1 Posted July 30, 2008 Report Share Posted July 30, 2008 I think he meant Microsoft Word-to-XHTML Quote Link to post Share on other sites
Natechs 0 Posted July 30, 2008 Report Share Posted July 30, 2008 Yeah. If you use HTML-Kit, you can removed all those Word tags and make it into XHTML. Quote Link to post Share on other sites
vchris 3 Posted July 30, 2008 Author Report Share Posted July 30, 2008 Can I do batch jobs with this program? Will it automatically clean the html create by MS word?edit: forgot to mention free! Quote Link to post Share on other sites
justsomeguy 1,135 Posted July 30, 2008 Report Share Posted July 30, 2008 PSPad includes HTML Tidy also. You can configure HTML Tidy to do several different things, it's in one of the menus of PSPad. Quote Link to post Share on other sites
Natechs 0 Posted July 30, 2008 Report Share Posted July 30, 2008 Can I do batch jobs with this program? Will it automatically clean the html create by MS word?edit: forgot to mention free!No, it won't allow batch jobs. You have to do it all at once. But it will clean it up and also it is free! Quote Link to post Share on other sites
vchris 3 Posted August 1, 2008 Author Report Share Posted August 1, 2008 HTML-Kit is a nice little program. It's light which is fast and as powerful as dreamweaver with it's plugins. HTML Tidy cleans up the Word HTML but not as much as I need. I'm creating a plugin to do the rest of the work Quote Link to post Share on other sites
vchris 3 Posted August 4, 2008 Author Report Share Posted August 4, 2008 What is the code required to remove all attributes except for colspan and rowspan? I got this:\swidth="?[0-9]+"?|\sheight="?[0-9]+"?|\salign="?.*"?|\svalign="?.*"?|\sstyle=['|"](.|\s)*['|"]|\snowrap|\sclass="?.*"?|\slang="?.*"?|\sxml:lang="?.*"?The only problem I have with this line is the line returns in the style attribute, \s doesn't seem to work. Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.