Jump to content

UTF CSV parsing stripping characters


justsomeguy

Recommended Posts

I'm still doing research, but I thought I would ask here to see if anyone has run into this before. I'm trying to parse a CSV file that is saved as UTF-8 with BOM. This is the code I'm testing:

<?php header('Content-Type: text/html; charset=UTF-8'); ini_set('display_errors', 1);error_reporting(E_ALL); $fname = 'French_Language_Upload_File_full_utf.csv'; $contents = file_get_contents($fname); echo '<pre>' . print_r(mb_detect_order(), true) . '</pre><br>';echo 'encoding: ' . mb_detect_encoding($contents) . '<br>'; echo '<pre>' . $contents . '</pre><br>'; $csv = new SplFileObject($fname);$csv->setFlags(SplFileObject::READ_CSV | SplFileObject::READ_AHEAD | SplFileObject::SKIP_EMPTY); echo '<pre>';foreach ($csv as $row){  print_r($row);}echo '</pre>';

What I'm seeing is that some lines that start with certain characters are having those characters stripped. One of the lines in the CSV file contains this, for example: common,inactive_students,Inactive students,Inactive students,Étudiants inactifs After CSV parsing, it looks like this: Array( [0] => common [1] => inactive_students [2] => Inactive students [3] => Inactive students [4] => tudiants inactifs) This only happens with lines that start with the certain characters, if that same accented capital E appears in the middle of the text it does not get lost. It also doesn't strip all characters, this one is correct: Array( [0] => common [1] => ug_admin_news_permission [2] => At this time User Group Admins are only able to add news items to their own user groups. [3] => At this time User Group Admins are only able to add news items to their own user groups. [4] => À l’heure actuelle, les administrateurs peuvent seulement ajouter des nouveaux items à leurs propres groupes d’utilisateurs.) This line, though: common,r_u_sure_add,Are you sure you want to add the selected users?,Are you sure you want to add the selected users?,Êtes-vous sûr(e) de vouloir ajouter les utilisateurs sélectionnés ? Ends up like this: Array( [0] => common [1] => r_u_sure_add [2] => Are you sure you want to add the selected users? [3] => Are you sure you want to add the selected users? [4] => tes-vous sûr(e) de vouloir ajouter les utilisateurs sélectionnés ?) This also happens with fgetcsv. Has anyone run into this before? Any ideas? The server is running PHP 5.2.17.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...