justsomeguy Posted April 21, 2009 Share Posted April 21, 2009 I'm trying to store German and Chinese text and I'm running into problems getting everything out of the database and onto a web page. Does anyone use a SQL Server 2000 database to store UTF-8 data? Link to comment Share on other sites More sharing options...
jesh Posted April 21, 2009 Share Posted April 21, 2009 Are you storing it with varchar/text or are you using nvarchar/ntext? Link to comment Share on other sites More sharing options...
justsomeguy Posted April 21, 2009 Author Share Posted April 21, 2009 I've got a test table set up with a text column and an ntext column. This is the text I'm testing with:Überblick schließen über Befähiger Möglichkeits欢迎来到六西格玛基础课程。我是萨In the table, through Query Analyzer, I'm seeing this text get stored:Überblick schließen über Befähiger Möglichkeits欢迎来到六西格玛基础课程。我是萨 (the Chinese text in the DB is actually stored in ampersand-hex codes, not regular characters; the German is regular characters).I've got a loop that builds a Javascript array full of objects to use for JSON (the actual application sends the data back through XHR), and prints a table, like so: while (!dbcon.eof){ o = new Object(); o.c1 = dbcon.fields.item("comment").value; o.c3 = dbcon.fields.item("comment3").value; o.c1utf = UTF8.encode(new String(dbcon.fields.item("comment").value)); o.c3utf = UTF8.encode(new String(dbcon.fields.item("comment3").value)); c.push(o); Response.Write("<tr>"); Response.Write("<td>" + dbcon.fields.item("comment").value + "</td>"); Response.Write("<td>" + UTF8.encode(new String(dbcon.fields.item("comment").value)) + "</td>"); Response.Write("<td>" + dbcon.fields.item("comment3").value + "</td>"); Response.Write("<td>" + UTF8.encode(new String(dbcon.fields.item("comment3").value)) + "</td>"); Response.Write("</tr>"); dbcon.MoveNext();} ..which produces this table: <tr><td>Überblick schließen über Befähiger Möglichkeits</td><td>Ã?berblick schlieÃ?en über Befähiger Möglichkeits</td><td>Überblick schließen über Befähiger Möglichkeits</td><td>Ã?berblick schlieÃ?en über Befähiger Möglichkeits</td></tr><tr><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td></tr> So, the Chinese text looks fine, it prints the same characters I saved, and the German looks fine without the UTF8 encoding. With the UTF8 it seems to munge the German characters, for both field types.Lastly, I print the JSON version of the array I built: [ { "c1":"Überblick schließen über Befähiger Möglichkeits", "c3":"Überblick schließen über Befähiger Möglichkeits", "c1utf":"Ã\u009cberblick schlieÃ\u009fen über Befähiger Möglichkeits", "c3utf":"Ã\u009cberblick schlieÃ\u009fen über Befähiger Möglichkeits" }, { "c1":"欢迎来到六西格玛基础课程。我是萨\r\n", "c3":"欢迎来到六西格玛基础课程。我是萨\r\n", "c1utf":"欢迎来到六西格玛基础课程。我是萨\r\n", "c3utf":"欢迎来到六西格玛基础课程。我是萨\r\n" }] I've got 2 versions of the JSON stringify, one that UTF8-encodes the string values first, and one that doesn't. If it doesn't encode the values, I get an example JSON object like this:{"id":9047,"proj":393,"name":"Steve","uid":"steve","app_ver":138,"loc":"","page":"","post_date":1240332651434,"val_date":0,"val_by":"","comment":"Überblick \r\nschließen \r\nüber\r\nBefähiger\r\nMöglichkeits","resolution":"","resolved":0,"fix_ver":""}So it looks like it sends the data through normally, but when it displays on the page it writes it like this:�berblickschlie�en�berBef�higerM�glichkeitsIf I use the version that does encode, the JSON comes through like this:{"id":9047,"proj":393,"name":"Steve","uid":"steve","app_ver":138,"loc":"","page":"","post_date":1240332651434,"val_date":0,"val_by":"","comment":"Ã\u009cberblick \r\nschlieÃ\u009fen \r\nüber\r\nBefähiger\r\nMöglichkeits","resolution":"","resolved":0,"fix_ver":""}And gets displayed like this:�œberblickschlie�ŸenüberBefähigerMöglichkeitsI believe that JSON data is supposed to be UTF-8 encoded, which explains why the last 3 lines show up correctly, but I'm at a loss why the other characters aren't coming through. For the Chinese text, the JSON string and browser text are both made up entirely of question marks.My experience with international text has been thankfully limited, but this issue has been going on for a while and I can't see what the problem is. Link to comment Share on other sites More sharing options...
jesh Posted April 21, 2009 Share Posted April 21, 2009 So it sounds like the data is being stored correctly in the database but it is failing when it gets displayed in the browser. Is that right?If so, have you tried setting the Content-Type for the page? On our website, I just submitted data ("Überblick schließen über Befähiger Möglichkeits") with a form which got saved in the database in an nvarchar column, and when I load that content back up on the next page, it displays correctly. This is the HTML that is included on that page: <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> We don't do any character encoding of the data other than HTML Encoding or URL Encoding, depending on where we're using it. And, in this particular, case, we just display the content on the page as it comes from the database.Here's a snippet of what it looks like in C# to fetch the data into an object: string header = datareader["ColumnName"] as string; And this is a snippet of how that string would be written into the HTML: htmlBuilder.Append("<div>" + header + "</div>"); I haven't yet had a reason to pass this type of data in a JSON object, however. Link to comment Share on other sites More sharing options...
justsomeguy Posted April 22, 2009 Author Share Posted April 22, 2009 It's been a while since I initially set this thing up so it's taking me a while to break away from what I'm working on and go through this again. Some of what I write below might be too much info, I'm just trying to get my thoughts in writing.This is a project management site, this stuff probably applies to all pages but I'm just testing with the comments page. So it looks like it uses a common header for the pages that includes this content type:<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">I'm thinking that might need to be UTF-8. Is the charset just "UTF-8" (apparently it is, I see your tag above)?It looks like I'm also using the IE7 Javascript library for this, but I'm not testing with IE so that probably isn't affecting anything. So then for the comments page it prints a table like so: <table class="commentTable" cellpadding="0" cellspacing="0"> <thead> <tr> <td> </td> <td><a href="java script:void(0);" onclick="set_sort('name');">Name</a></td> <td><a href="java script:void(0);" onclick="set_sort('app_ver');">Application</a></td> <td><a href="java script:void(0);" onclick="set_sort('loc');">Location</a></td> <td><a href="java script:void(0);" onclick="set_sort('page');">Page</a></td> <td>Change Request</td> <td><a href="java script:void(0);" onclick="set_sort('post_date');">Post Date</a></td> <td><a href="java script:void(0);" onclick="set_sort('val_date');">Validated</a></td> <td>Resolution</td> <td><a href="java script:void(0);" onclick="set_sort('fix_ver');">Fix Version</a></td> </tr> </thead> <tbody id="comment_rows"> <tr> <td colspan="10" style="text-align: center;">Loading comments...</td> </tr> </tbody> </table> <iframe id="file_download" width="0" height="0" scrolling="no" frameborder="0"></iframe> <script type="text/javascript">get_comments();</script> The get_comments function sends out a request, and it looks like it's setting content-type there as well: query = "page_mode=get_comments"; query += "&pid=" + pid; xmlHttp.onreadystatechange = do_get_comments; xmlHttp.open("post", req_script, true); xmlHttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"); xmlHttp.send(query); I wonder if using UTF-8 there and displaying it on the page with the other charset is going to cause a problem.The AJAX callback evals the JSON object and then calls the other functions to sort and show: var retval = xmlHttp.responseText.split(sep); // check response if (retval[0] != "0") { comments = eval(retval[1]); sort_comments(); show_comments(); } Sorting is probably irrelevant, the show function does some filter checks and then starts displaying the row. Here's the td for the actual comment text: c = ce('td'); e.appendChild(c); c.innerHTML = comments[i].comment.split("\n").join("<br>"); ce = document.createElement. So it looks like it's just setting the text using innerHTML.So the AJAX request that goes out includes a content-type header, which I'm not sure even matters (that only applies to the request I assume, not the response), but the ASP page that processes that request does not send any headers at all other than the default and some cache-busting stuff. This is the code that runs to get the comments: case 'get_comments': pid = get_form_var("pid"); pid = parseInt(pid, 10); if (isNaN(pid)) { echo("0"); Response.End(); } var json_obj = new Array; dbcon.open("SELECT * FROM proj_comments WHERE proj=" + pid + " ORDER BY post_date"); while (!dbcon.eof) { cur_obj = new Object; cur_obj.id = dbcon.fields.item("id").value; cur_obj.proj = dbcon.fields.item("proj").value; cur_obj.name = dbcon.fields.item("name").value; cur_obj.uid = dbcon.fields.item("uid").value; cur_obj.app_ver = dbcon.fields.item("app_ver").value; cur_obj.loc = dbcon.fields.item("loc").value; cur_obj.page = dbcon.fields.item("page").value; cur_obj.post_date = dbcon.fields.item("post_date").value; cur_obj.val_date = dbcon.fields.item("val_date").value; cur_obj.val_by = dbcon.fields.item("val_by").value; cur_obj.comment = dbcon.fields.item("comment").value; cur_obj.resolution = dbcon.fields.item("resolution").value; cur_obj.resolved = dbcon.fields.item("resolved").value; cur_obj.fix_ver = dbcon.fields.item("fix_ver").value; json_obj.push(cur_obj); dbcon.movenext(); } dbcon.close(); echo("1" + sep); echo(JSON.stringify(json_obj)); break; The JSON.stringify is what encodes as UTF8 for all string values. echo is just an alias for Response.Write.So I guess I need to try 2 things. One is to change the content type on the page in general to UTF8. I'll try that first and hopefully it won't break everything else. The second thing is to have the AJAX response send back a content-type header also. If that doesn't work, I'll force the rest of the world to speak English only. Honeywell doesn't need their stuff translated to Mandarin Chinese anyway, right? I'll have to give China a call and have a little heart-to-heart. And really, who speaks German? Link to comment Share on other sites More sharing options...
justsomeguy Posted April 22, 2009 Author Share Posted April 22, 2009 I'm not sure if this has anything to do with SQL anymore, but whatever.From everything I can tell, it looks like all charsets are now set to utf-8, but I'm seeing exactly the same thing. There's one thing that's really bugging me though. According to Firebug, the text is coming through like this:Ã\u009cberblick \r\nschlieÃ\u009fen \r\nüber\r\nBefähiger\r\nMöglichkeitsIt's showing up on the page like this:�œberblickschlie�ŸenüberBefähigerMöglichkeitsWhy is it showing the last 3 correct but not the first 2? That doesn't make any sense to me, it would seem like it should be either all or none. Either it's UTF, or it's not. It looks like it is, so I'm confused. Link to comment Share on other sites More sharing options...
jesh Posted April 22, 2009 Share Posted April 22, 2009 I realized that there was a place on our site that uses AJAX to fetch the data back from the server where the response is a JSON string. When I look in Firebug at the response that comes back, the section of the data that is returned looks like this: \u003eÜberblick schließen über Befähiger Möglichkeits\u003c Where \u003e equates to ">" and \u003c equates to "<".Also, the Response Header has the following: Content-Type: application/json; charset=utf-8 Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.