Jump to content

convert an ANSI text file to utf-8 format

Recommended Posts

Hi all,  I upload a text file to extract info to put into my database on GoDaddy and when I run my php code on it, it tells me that it can't read the file because it is in ansi-xxxx format.  In my php code I'm using $var = fgets() to read each line and then put the $vars into the correct table of the database.   So I have clicked the button at the top of the code editor and converted the text file to utf-8 - but the conversion leaves the file with 2 odd characters at the beginning of the file and puts a blank line between each line.   When I delete the 2 characters and the blank lines and I run my code, everything works as it should and updates my tables.    My question is: Can I do a conversion on the ANSI text file using php without any manual manipulating?  I've read about the utf-8_encode(), but it says it encodes an ISO-8859-1 file, but mine is an ANSI, or is the ISO-8859-1 an umbrella to a lot of different codes?  Can I convert the whole file at once or do i set up a loop to read a line, convert it and write to a new file?   Am I interpreting this correctly?  I'd appreciate a code snippet so I can see how to set it up - or a reference to more reading so I can learn.  Thank you very much!

Link to post
Share on other sites

the conversion leaves the file with 2 odd characters at the beginning

That's the UTF-8 BOM.  You can use this to strip the BOM from the beginning of a string:

// check for a variety of byte order marks at the beginning of the string and remove them if present
function strip_bom($str)
    $boms = [
        pack('CCC', 0xef, 0xbb, 0xbf),        # UTF-8
        pack('CC', 0xff, 0xfe),               # UTF-16 (BE)
        pack('CC', 0xfe, 0xff),               # UTF-16 (LE)
        pack('CCCC', 0x0, 0x0, 0xfe, 0xff),   # UTF-32 (BE)
        pack('CCCC', 0xff, 0xfe, 0x0, 0x0),   # UTF-32 (LE)

    foreach ($boms as $b) {
        if (substr($str, 0, strlen($b)) == $b) {
            return substr($str, strlen($b));
    return $str;

You can also check if the line is empty and skip it if so.

while (($line = fgets($handle, 4096)) !== false) {
  $line = trim(strip_bom($line));
  if ($line === '') {

  // process $line

You can also convert a character encoding:



If your database is set up to store UTF data, make sure you insert data using the correct encoding. 

Obviously, if the file starts with the correct encoding then you don't need to do anything special.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...