Jump to content

Character Encoding


ShadowMage

Recommended Posts

So, first a little background on the setup/system we're running.  We have an ERP system where users enter memos and notes on quotes/orders. We also have an intranet that pulls this information back out to run reports and such.  The problem I'm facing, is that users are copy/pasting data into the memo text which is creating a character encoding issue.  The memo text is displaying in the web browser as the black diamond with a question mark for certain specific characters (like apostrophe's).  Manually typed text doesn't seem to have this issue, nor does text that comes from the intranet and is inserted into the ERP system.  It is only data that's copy/pasted directly into the ERP system.  How do I detect these wrongly encoded strings and display them properly?

I did try to just use utf8_encode() on the text, but that changes the black diamond to a white box.  I also see there is a iconv() function to convert strings to specific encodings, but it requires an input encoding, and I have no idea what the starting charset is.  I found the mb_detect_encoding() function, but that gives me a fatal error.  I tried to enable the mbstring extension in php.ini but that didn't seem to make any difference.

Link to comment
Share on other sites

If your HTML document has the encoding set to UTF-8 using a <meta> tag and your PHP files are UTF-8 encoded, any form data should automatically be converted to UTF-8 no matter what the user pastes in.

If you have a database connection, make sure that the encoding of the connection is UTF-8 as well.

Link to comment
Share on other sites

The problem is not data coming from an HTML form.  As mentioned, memos that are entered from the intranet (the HTML forms) are functioning correctly.  The problem appears when users copy/paste directly into the ERP system.

To try to further clarify, memos can be entered in one of two ways.  Through a form on the intranet or directly into the ERP system.  Those memos that come from the intranet form are functioning fine and do not have any character encoding issues.  Entering memos directly doesn't cause issues UNLESS the user copy/pastes data (from an Outlook email, for example) into the memo.
The intranet and the ERP system are two separate entities, not really connected in any way except for sharing data back and forth through an ODBC connection.  The intranet is of our own design and we have full control over it.  As mentioned, it connects to the ERP system's database through ODBC to retrieve data and occasionally very limited writing of data to Memos and certain other similar things.  The ERP system (Epicor, in case you're wondering) is not our own and we have very, very limited control of the database it uses and none over how it interacts with said database.

Link to comment
Share on other sites

I'm not sure what an ERP is, but it seems to be storing data with the wrong encoding in the database. The only solution is to detect the encoding of the string and convert it after retrieving it from the database, as you were trying earlier.

If utf8_encode() is not working, it means that the encoding is probably not ISO-8859-1. I don't think I can help much if mb_detect_encoding() is not working. You will have to find out why the function is not available.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...