Jump to content

ANSI question


sepoto

Recommended Posts

Does PHP5 support ANSI character encoding for the fgets() function? I can not find any information on this topic. Also what if my text file were UTF-8? What if my text file were UTF-16? I am trying to track down some problems I am having with strings in PHP5. Thanks a lot this is a very helpful forum.Eric

Link to comment
Share on other sites

ANSI is an encoding, not a charset.UTF-8 is both an encoding and a charset."Encoding" is how numbers are read and written from and to a file. ANSI specifies that each byte is the number of a character, therefore allowing up to 255 different numbers. The "UTF-8" encoding is a little more complex, where certain numbers in one byte indicate that the next byte is also part of the complete number. "UTF-8" allows up to 65535 (3 byte) different numbers."Charset" is a number-to-character mapping. While the ANSI encoding may say "200", it's up to the charset to decide if 200 is the character È (which is what ISO 8859-1 would say), И (which is what Windows-1251 would say) or something else.Now... why was this long intro needed... PHP doesn't support any encodings or charsets directly... it wirtes whatever bytes you give to it, and those depend on the encoding of your PHP file. If the file is encoded as ANSI, PHP will write ANSI bytes to files, and if the PHP file is encoded as UTF-8, UTF-8 bytes will be written to the file.As for reading, AFAIK, PHP5 only reads bytes from the file, ANSI style. However, since all ANSI charsets and UTF-8 have the first 127 characters and bytes the same, that doesn't stop PHP from implementing everything they've implemented, which is related to dealings with newlines and the like. That includes fgets() of course.Also, if you're outputting the data you read, that data is outputted "as is" - it's up to the browser to read the bytes and display the characters. Therefore, reading and properly displaying data from a UTF-8 file is only possible if the whole output is encoded as UTF-8, and specified as such.

Link to comment
Share on other sites

Thank you that was very educational. I had been having a lot of problems with my strpos(). I finally have it dialed in. I am using Notepad to create an ANSI .txt file that I read in to PHP using the standard I/O functions. I had been having problems but I just started a new .txt file and retyped everything in from scratch and now it works just fine. For the life of me I just don't know still what happened but now that I understand a bit about how PHP deals with things when it comes to reading files I will be better prepared in the future.Thanks!

ANSI is an encoding, not a charset.UTF-8 is both an encoding and a charset."Encoding" is how numbers are read and written from and to a file. ANSI specifies that each byte is the number of a character, therefore allowing up to 255 different numbers. The "UTF-8" encoding is a little more complex, where certain numbers in one byte indicate that the next byte is also part of the complete number. "UTF-8" allows up to 65535 (3 byte) different numbers."Charset" is a number-to-character mapping. While the ANSI encoding may say "200", it's up to the charset to decide if 200 is the character È (which is what ISO 8859-1 would say), И (which is what Windows-1251 would say) or something else.Now... why was this long intro needed... PHP doesn't support any encodings or charsets directly... it wirtes whatever bytes you give to it, and those depend on the encoding of your PHP file. If the file is encoded as ANSI, PHP will write ANSI bytes to files, and if the PHP file is encoded as UTF-8, UTF-8 bytes will be written to the file.As for reading, AFAIK, PHP5 only reads bytes from the file, ANSI style. However, since all ANSI charsets and UTF-8 have the first 127 characters and bytes the same, that doesn't stop PHP from implementing everything they've implemented, which is related to dealings with newlines and the like. That includes fgets() of course.Also, if you're outputting the data you read, that data is outputted "as is" - it's up to the browser to read the bytes and display the characters. Therefore, reading and properly displaying data from a UTF-8 file is only possible if the whole output is encoded as UTF-8, and specified as such.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...