Jump to content

Javascript dealing with multibyte characters


yapning

Recommended Posts

My Colleague told me that last time he met quite alot of unsolvable problems when handling a certain multibyte/multilingual characters using Javascript. Is it true? Meaning a certain javascript codes such as Data Validation and any codes dealing with Text/String value may not fully work when interact with a certain problematic multilingual characters?I have not developed much of multilingual webpages before, so I really need opinion/suggestion to decide whether to use more javascript/ajax that can interact with Text/String without problems to build more interactive webpages.

Link to comment
Share on other sites

If you just compare strings you will be fine.But be aware that string lengths , charAt and similar functions may surprise you if you use incorrect character encodings when saving the fileHere is an example using Korean (output results may vary depending on the encoding of the file)

<script type="text/javascript">var a = "가나다";var b = "가나다";/* prints differently depending on browser */ for(i=0;i<a.length;i++){   document.write(a[i]+" ");}/* prints each character of the string*/ for(i=0;i<a.length;i++){   document.write(a.charAt(i)+" ");}/* prints the length (should be 3) */document.writeln("length = " + a.length+"<br>");/* prints true, because the strings are the same */document.writeln("a=b?  "+ (a==b)+"<br>");</script>

I'm sure you can find many resources about dealing with these issues if you search the web... Trust me, Korean sites use a lot of javascript, AJAX, etc..., too!

Link to comment
Share on other sites

I just tested your above script in IE, FireFox and Opera, and all browsers seem to parse it differently.http://www.ifcode.com/test-korean.htmlIn IE shows a few undefined's with your document.write and writeln, but displays the length correctly as 3, all 3 characters are shown with the substring and a=b is true.In FireFox, everything shows up as it should do, no escape codes.In Opera, the same as IE with the undefined's, and the korean text shows up as blocks.The page is uploaded as Unicode with no charset defined, not sure if that matters - but FireFox seems to sort it all correctly anyway, and the length of the string is always correct in each browser, so it's not as if it's parsing the escape codes in each instance is it?

Link to comment
Share on other sites

Seems that accessing like an array is unreliable... charAt() seems to work fine in IE and FF, but accessing it like an array only worked in FFThe results do seem to be affected by the encoding of the file. I actually had only reviewed the above script using the "tryit" editor at w3schools, but didn't realize the page wasn't being sent as unicode. If you paste the code into that editor you will see what I was talking about... maybe... I wonder if the tryit page was doing some conversions or something - and that's why the escape codes where being printed.I have edited the above post to reflect some of Blue's insight...I have written a vocabulary quiz in php and javascript in the past, and I did find that file encoding was critical. The file encoding should match any character encoding you specifiy either through http headers or html meta tags.Unicode solves MANY problems.

Link to comment
Share on other sites

Hi, Thanks for replying. I was wondering whether if I am not doing any process on character by character, meaning :1. I may just use javascript to get the value from the textfield keyed-in by the user and interactively display on the webpage through the use of innerHTML or submit the form, or 2. another example is, what if I just dynamically add another selectbox with Korean characters or any other multibyte characters onto the webpage when the user click on the Add button, or 3. one last example is, what if I display all the records on the webpage after retriving the xml data from the ajax result which the data can be in any kind of encoding and languageWill there be any problem or will the Text/String be corrupted when going through the javascript? I have tested some the above example with some multilingual characters, and they work fine, however, the characters I used in my test is merely a very small parts of possible multilingual characters that will appear on the webpage. Anyway, most of my usage on javascript for dealing with Text/String like the 3 examples I shown above should be only with assignment (=), not charAt or any other character based functions, so would it be any problem?Thanks.

Link to comment
Share on other sites

I don't think it should be any problem. Just use unicode as your encoding if you plan to use many languages on a page.As far as javascript corrupting the strings, in the example posted above, Blue helped me realize about the only problem comes when the file is not encoded properly.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...