Jump to content

How to clean up text


migroo

Recommended Posts

Okay I have a text field that people are going to cut and past into a lot and a lot of characters people use such as ' are often not properly read and there is just a mall square box instead. Is there a way of filtering this with a php function or something?

Link to comment
Share on other sites

If you're talking about the Unicode quotes and things that Word likes to use instead of actual quotes, you can use this function to replace those characters in a string:

function sanitize_ms_chars(&$val, $i){  $find = array(	'“',	'”',	'‘',	'’',	'…',	'—',	'–',	chr(145),	chr(146),	chr(147),	chr(148),	chr(151),	chr(0xe2) . chr(0x80) . chr(0x98),	chr(0xe2) . chr(0x80) . chr(0x99),	chr(0xe2) . chr(0x80) . chr(0x9c),	chr(0xe2) . chr(0x80) . chr(0x9d),	chr(0xe2) . chr(0x80) . chr(0x93),	chr(0xe2) . chr(0x80) . chr(0x94)  );  $replace = array(	'"',	'"',	"'",	"'",	'...',	'-',	'-',	"'",	"'",	'"',	'"',	'-',	"'",	"'",	'"',	'"',	'-',	'-'  );    $val = str_replace($find, $replace, $val);}

Note that doesn't return a value, it operates directly on the string you pass to it. Because of that, you can use that function to remove those characters from every element in an array, e.g.:array_walk_recursive($_POST, 'sanitize_ms_chars'); // sanitize all of $_POST

Link to comment
Share on other sites

<rant>I just have to say that it drives me nuts when I read online news stories from respected organizations and they are chockful of control characters where quotations, line breaks, and em-dashes should be.</rant>

Link to comment
Share on other sites

My company has a system that has several HTML boxes on the admin side, things like a message to show on the login page, descriptions for content, etc. Everyone always pastes stuff into those boxes directly from Word, and it comes through with so much extra cruft attached to it that it's just ridiculous. They paste in some text and it includes a bunch of <p>, <span>, and <font> tags, all of the extra mso tags and attributes, plus any random characters Word wants to substitute for perfectly usable ASCII, and then they wonder why the text doesn't look like the rest of the text on the screen. I blame Word!

Link to comment
Share on other sites

Thanks for all the tips guys I believe most of what is being cut and pasted in comes from Adobe Indesign. And they have all the goofy little half quotes and such. THANK you very much.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...