Jump to content

Recommended Posts

I have a form that I use to send me data that includes the ½ value. Instead of %00BD, I am getting %C2%BD. Can anyone tell me how to correct this, or if it's even correctable?

Wayne

 

Link to post
Share on other sites

I think it looks correct. It is passing a two-byte UTF-8 encoded character. The %## encoding in a URL can only have two characters following the %. If you have a multi-byte character, each byte will be encoded on its own. The server should be able to decode it without a problem as long as it is treating strings as UTF-8.

Link to post
Share on other sites
2 hours ago, Ingolme said:

I think it looks correct. It is passing a two-byte UTF-8 encoded character. The %## encoding in a URL can only have two characters following the %. If you have a multi-byte character, each byte will be encoded on its own. The server should be able to decode it without a problem as long as it is treating strings as UTF-8.

OK. I can accept that answer. However, I am having the data sent to me via email, and when I look at it in my text editor the %C2 character is different from any character I am using. Why does the %C2 portion of the field get encoded to %C2 and not %00?

Link to post
Share on other sites

In UTF-8 encoding, any Unicode character with a code higher than 127 is split into multiple bytes. Each byte starts off with a  number of 1s indicating how many of the following bytes belong to the same character. UTF-8 encoded bytes have structures like the following:

  • 0XXXXXXX
  • 110XXXXX 10XXXXXX
  • 1110XXXX 110XXXXX 10XXXXXX

Your character has to be split into two bytes because its Unicode value is highter than 127. Its binary representation is: 10111101. These bits are split into the following two UTF-8 bytes:

11000010 10111101.

The hexadecimal representation of the above bytes is C2 BD

 

If you're actually seeing the C2 character, it means that the the software reading the bytes is not aware that it is UTF-8 encoded. The email has to contain a header indicating that it is using UTF-8 as the character encoding.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...