Jump to content

Sql Server 2000 Utf


Recommended Posts

I've got a test table set up with a text column and an ntext column. This is the text I'm testing with:Überblick schließen über Befähiger Möglichkeits欢迎来到六西格玛基础课程。我是萨In the table, through Query Analyzer, I'm seeing this text get stored:Überblick schließen über Befähiger Möglichkeits欢迎来到六西格玛基础课程。我是萨 (the Chinese text in the DB is actually stored in ampersand-hex codes, not regular characters; the German is regular characters).I've got a loop that builds a Javascript array full of objects to use for JSON (the actual application sends the data back through XHR), and prints a table, like so:

while (!dbcon.eof){  o = new Object();  o.c1 = dbcon.fields.item("comment").value;  o.c3 = dbcon.fields.item("comment3").value;  o.c1utf = UTF8.encode(new String(dbcon.fields.item("comment").value));  o.c3utf = UTF8.encode(new String(dbcon.fields.item("comment3").value));  c.push(o);  Response.Write("<tr>");  Response.Write("<td>" + dbcon.fields.item("comment").value + "</td>");  Response.Write("<td>" + UTF8.encode(new String(dbcon.fields.item("comment").value)) + "</td>");  Response.Write("<td>" + dbcon.fields.item("comment3").value + "</td>");  Response.Write("<td>" + UTF8.encode(new String(dbcon.fields.item("comment3").value)) + "</td>");  Response.Write("</tr>");  dbcon.MoveNext();}

..which produces this table:

<tr><td>Überblick schließen über Befähiger Möglichkeits</td><td>Ã?berblick schlieÃ?en über Befähiger Möglichkeits</td><td>Überblick schließen über Befähiger Möglichkeits</td><td>Ã?berblick schlieÃ?en über Befähiger Möglichkeits</td></tr><tr><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td><td>欢迎来到六西格玛基础课程。我是萨</td></tr>

So, the Chinese text looks fine, it prints the same characters I saved, and the German looks fine without the UTF8 encoding. With the UTF8 it seems to munge the German characters, for both field types.Lastly, I print the JSON version of the array I built:

[  {	"c1":"Überblick schließen über Befähiger Möglichkeits",	"c3":"Überblick schließen über Befähiger Möglichkeits",	"c1utf":"Ã\u009cberblick schlieÃ\u009fen über Befähiger Möglichkeits",	"c3utf":"Ã\u009cberblick schlieÃ\u009fen über Befähiger Möglichkeits"  },  {	"c1":"欢迎来到六西格玛基础课程。我是萨\r\n",	"c3":"欢迎来到六西格玛基础课程。我是萨\r\n",	"c1utf":"欢迎来到六西格玛基础课程。我是萨\r\n",	"c3utf":"欢迎来到六西格玛基础课程。我是萨\r\n"  }]

I've got 2 versions of the JSON stringify, one that UTF8-encodes the string values first, and one that doesn't. If it doesn't encode the values, I get an example JSON object like this:{"id":9047,"proj":393,"name":"Steve","uid":"steve","app_ver":138,"loc":"","page":"","post_date":1240332651434,"val_date":0,"val_by":"","comment":"Überblick \r\nschließen \r\nüber\r\nBefähiger\r\nMöglichkeits","resolution":"","resolved":0,"fix_ver":""}So it looks like it sends the data through normally, but when it displays on the page it writes it like this:�berblickschlie�en�berBef�higerM�glichkeitsIf I use the version that does encode, the JSON comes through like this:{"id":9047,"proj":393,"name":"Steve","uid":"steve","app_ver":138,"loc":"","page":"","post_date":1240332651434,"val_date":0,"val_by":"","comment":"Ã\u009cberblick \r\nschlieÃ\u009fen \r\nüber\r\nBefähiger\r\nMöglichkeits","resolution":"","resolved":0,"fix_ver":""}And gets displayed like this:�œberblickschlie�ŸenüberBefähigerMöglichkeitsI believe that JSON data is supposed to be UTF-8 encoded, which explains why the last 3 lines show up correctly, but I'm at a loss why the other characters aren't coming through. For the Chinese text, the JSON string and browser text are both made up entirely of question marks.My experience with international text has been thankfully limited, but this issue has been going on for a while and I can't see what the problem is.

Edited by justsomeguy
Link to comment
Share on other sites

So it sounds like the data is being stored correctly in the database but it is failing when it gets displayed in the browser. Is that right?If so, have you tried setting the Content-Type for the page? On our website, I just submitted data ("Überblick schließen über Befähiger Möglichkeits") with a form which got saved in the database in an nvarchar column, and when I load that content back up on the next page, it displays correctly. This is the HTML that is included on that page:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

We don't do any character encoding of the data other than HTML Encoding or URL Encoding, depending on where we're using it. And, in this particular, case, we just display the content on the page as it comes from the database.Here's a snippet of what it looks like in C# to fetch the data into an object:

string header = datareader["ColumnName"] as string;

And this is a snippet of how that string would be written into the HTML:

htmlBuilder.Append("<div>" + header + "</div>");

I haven't yet had a reason to pass this type of data in a JSON object, however.

Edited by jesh
Link to comment
Share on other sites

It's been a while since I initially set this thing up so it's taking me a while to break away from what I'm working on and go through this again. Some of what I write below might be too much info, I'm just trying to get my thoughts in writing.This is a project management site, this stuff probably applies to all pages but I'm just testing with the comments page. So it looks like it uses a common header for the pages that includes this content type:<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">I'm thinking that might need to be UTF-8. Is the charset just "UTF-8" (apparently it is, I see your tag above)?It looks like I'm also using the IE7 Javascript library for this, but I'm not testing with IE so that probably isn't affecting anything. So then for the comments page it prints a table like so:

			  <table class="commentTable" cellpadding="0" cellspacing="0">				<thead>				  <tr>					<td> </td>					<td><a href="java script:void(0);" onclick="set_sort('name');">Name</a></td>					<td><a href="java script:void(0);" onclick="set_sort('app_ver');">Application</a></td>					<td><a href="java script:void(0);" onclick="set_sort('loc');">Location</a></td>					<td><a href="java script:void(0);" onclick="set_sort('page');">Page</a></td>					<td>Change Request</td>					<td><a href="java script:void(0);" onclick="set_sort('post_date');">Post Date</a></td>					<td><a href="java script:void(0);" onclick="set_sort('val_date');">Validated</a></td>					<td>Resolution</td>					<td><a href="java script:void(0);" onclick="set_sort('fix_ver');">Fix Version</a></td>				  </tr>				</thead>				<tbody id="comment_rows">				  <tr>					<td colspan="10" style="text-align: center;">Loading comments...</td>				  </tr>				</tbody>			  </table>			  <iframe id="file_download" width="0" height="0" scrolling="no" frameborder="0"></iframe>			  <script type="text/javascript">get_comments();</script>

The get_comments function sends out a request, and it looks like it's setting content-type there as well:

	query = "page_mode=get_comments";	query += "&pid=" + pid;	xmlHttp.onreadystatechange = do_get_comments;	xmlHttp.open("post", req_script, true);	xmlHttp.setRequestHeader("Content-Type", "application/x-www-form-urlencoded; charset=UTF-8");	xmlHttp.send(query);

I wonder if using UTF-8 there and displaying it on the page with the other charset is going to cause a problem.The AJAX callback evals the JSON object and then calls the other functions to sort and show:

	  var retval = xmlHttp.responseText.split(sep);	  // check response	  if (retval[0] != "0")	  {		comments = eval(retval[1]);		sort_comments();		show_comments();	  }

Sorting is probably irrelevant, the show function does some filter checks and then starts displaying the row. Here's the td for the actual comment text:

	c = ce('td');	e.appendChild(c);	c.innerHTML = comments[i].comment.split("\n").join("<br>");

ce = document.createElement. So it looks like it's just setting the text using innerHTML.So the AJAX request that goes out includes a content-type header, which I'm not sure even matters (that only applies to the request I assume, not the response), but the ASP page that processes that request does not send any headers at all other than the default and some cache-busting stuff. This is the code that runs to get the comments:

  case 'get_comments':	pid = get_form_var("pid");	pid = parseInt(pid, 10);	if (isNaN(pid))	{	  echo("0");	  Response.End();	}	var json_obj = new Array;	dbcon.open("SELECT * FROM proj_comments WHERE proj=" + pid + " ORDER BY post_date");	while (!dbcon.eof)	{	  cur_obj = new Object;	  cur_obj.id = dbcon.fields.item("id").value;	  cur_obj.proj = dbcon.fields.item("proj").value;	  cur_obj.name = dbcon.fields.item("name").value;	  cur_obj.uid = dbcon.fields.item("uid").value;	  cur_obj.app_ver = dbcon.fields.item("app_ver").value;	  cur_obj.loc = dbcon.fields.item("loc").value;	  cur_obj.page = dbcon.fields.item("page").value;	  cur_obj.post_date = dbcon.fields.item("post_date").value;	  cur_obj.val_date = dbcon.fields.item("val_date").value;	  cur_obj.val_by = dbcon.fields.item("val_by").value;	  cur_obj.comment = dbcon.fields.item("comment").value;	  cur_obj.resolution = dbcon.fields.item("resolution").value;	  cur_obj.resolved = dbcon.fields.item("resolved").value;	  cur_obj.fix_ver = dbcon.fields.item("fix_ver").value;	  json_obj.push(cur_obj);	  dbcon.movenext();	}	dbcon.close();	echo("1" + sep);	echo(JSON.stringify(json_obj));  break;

The JSON.stringify is what encodes as UTF8 for all string values. echo is just an alias for Response.Write.So I guess I need to try 2 things. One is to change the content type on the page in general to UTF8. I'll try that first and hopefully it won't break everything else. The second thing is to have the AJAX response send back a content-type header also. If that doesn't work, I'll force the rest of the world to speak English only. Honeywell doesn't need their stuff translated to Mandarin Chinese anyway, right? I'll have to give China a call and have a little heart-to-heart. And really, who speaks German?

Edited by justsomeguy
Link to comment
Share on other sites

I'm not sure if this has anything to do with SQL anymore, but whatever.From everything I can tell, it looks like all charsets are now set to utf-8, but I'm seeing exactly the same thing. There's one thing that's really bugging me though. According to Firebug, the text is coming through like this:Ã\u009cberblick \r\nschlieÃ\u009fen \r\nüber\r\nBefähiger\r\nMöglichkeitsIt's showing up on the page like this:�œberblickschlie�ŸenüberBefähigerMöglichkeitsWhy is it showing the last 3 correct but not the first 2? That doesn't make any sense to me, it would seem like it should be either all or none. Either it's UTF, or it's not. It looks like it is, so I'm confused.

Link to comment
Share on other sites

I realized that there was a place on our site that uses AJAX to fetch the data back from the server where the response is a JSON string. When I look in Firebug at the response that comes back, the section of the data that is returned looks like this:

\u003eÜberblick schließen über Befähiger Möglichkeits\u003c

Where \u003e equates to ">" and \u003c equates to "<".Also, the Response Header has the following:

Content-Type: application/json; charset=utf-8

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

  • Create New...