vchris Posted July 30, 2008 Share Posted July 30, 2008 Anyone know of a great script that convert word files to valid xhtml? I don't like programs people create, they're usually slow and I can't put any custom stuff in there. A small script in coldfusion or other server side language could do the job a lot faster and could be customized easily. I'm converting large documents with large tables, images, table of contents, graphics... Link to comment Share on other sites More sharing options...
Natechs Posted July 30, 2008 Share Posted July 30, 2008 Try HTML-Kit. It has the option to remove Word surplus tags and then you can convert that into XHTML! Link to comment Share on other sites More sharing options...
23.12.2012 Posted July 30, 2008 Share Posted July 30, 2008 I think he meant Microsoft Word-to-XHTML Link to comment Share on other sites More sharing options...
Natechs Posted July 30, 2008 Share Posted July 30, 2008 Yeah. If you use HTML-Kit, you can removed all those Word tags and make it into XHTML. Link to comment Share on other sites More sharing options...
vchris Posted July 30, 2008 Author Share Posted July 30, 2008 Can I do batch jobs with this program? Will it automatically clean the html create by MS word?edit: forgot to mention free! Link to comment Share on other sites More sharing options...
justsomeguy Posted July 30, 2008 Share Posted July 30, 2008 PSPad includes HTML Tidy also. You can configure HTML Tidy to do several different things, it's in one of the menus of PSPad. Link to comment Share on other sites More sharing options...
Natechs Posted July 30, 2008 Share Posted July 30, 2008 Can I do batch jobs with this program? Will it automatically clean the html create by MS word?edit: forgot to mention free!No, it won't allow batch jobs. You have to do it all at once. But it will clean it up and also it is free! Link to comment Share on other sites More sharing options...
vchris Posted August 1, 2008 Author Share Posted August 1, 2008 HTML-Kit is a nice little program. It's light which is fast and as powerful as dreamweaver with it's plugins. HTML Tidy cleans up the Word HTML but not as much as I need. I'm creating a plugin to do the rest of the work Link to comment Share on other sites More sharing options...
vchris Posted August 4, 2008 Author Share Posted August 4, 2008 What is the code required to remove all attributes except for colspan and rowspan? I got this:\swidth="?[0-9]+"?|\sheight="?[0-9]+"?|\salign="?.*"?|\svalign="?.*"?|\sstyle=['|"](.|\s)*['|"]|\snowrap|\sclass="?.*"?|\slang="?.*"?|\sxml:lang="?.*"?The only problem I have with this line is the line returns in the style attribute, \s doesn't seem to work. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.