WayneCa Posted May 23, 2020 Share Posted May 23, 2020 I have a form that I use to send me data that includes the ½ value. Instead of %00BD, I am getting %C2%BD. Can anyone tell me how to correct this, or if it's even correctable? Wayne Link to comment Share on other sites More sharing options...
Ingolme Posted May 23, 2020 Share Posted May 23, 2020 I think it looks correct. It is passing a two-byte UTF-8 encoded character. The %## encoding in a URL can only have two characters following the %. If you have a multi-byte character, each byte will be encoded on its own. The server should be able to decode it without a problem as long as it is treating strings as UTF-8. Link to comment Share on other sites More sharing options...
WayneCa Posted May 23, 2020 Author Share Posted May 23, 2020 2 hours ago, Ingolme said: I think it looks correct. It is passing a two-byte UTF-8 encoded character. The %## encoding in a URL can only have two characters following the %. If you have a multi-byte character, each byte will be encoded on its own. The server should be able to decode it without a problem as long as it is treating strings as UTF-8. OK. I can accept that answer. However, I am having the data sent to me via email, and when I look at it in my text editor the %C2 character is different from any character I am using. Why does the %C2 portion of the field get encoded to %C2 and not %00? Link to comment Share on other sites More sharing options...
Ingolme Posted May 23, 2020 Share Posted May 23, 2020 In UTF-8 encoding, any Unicode character with a code higher than 127 is split into multiple bytes. Each byte starts off with a number of 1s indicating how many of the following bytes belong to the same character. UTF-8 encoded bytes have structures like the following: 0XXXXXXX 110XXXXX 10XXXXXX 1110XXXX 110XXXXX 10XXXXXX Your character has to be split into two bytes because its Unicode value is highter than 127. Its binary representation is: 10111101. These bits are split into the following two UTF-8 bytes: 11000010 10111101. The hexadecimal representation of the above bytes is C2 BD If you're actually seeing the C2 character, it means that the the software reading the bytes is not aware that it is UTF-8 encoded. The email has to contain a header indicating that it is using UTF-8 as the character encoding. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now