ala888 Posted October 6, 2014 Share Posted October 6, 2014 Title says it all. How do I normalize all white-space from all charsets and stuff? preg_replace('/+/',' ',$str); does not work if it is not in ascii Link to comment Share on other sites More sharing options...
Ingolme Posted October 6, 2014 Share Posted October 6, 2014 The space character is the same in all encodings. If you just want to remove spaces that will work. If you want to include line breaks and tabs as whitespace then you can select the appropriate characters. Functions like trim() consider the following characters whitespace, and since they're all single-byte characters they should work for pretty much any encoding: s or x20 or " " Space t Tab n New line r Carriage return 0 Null byte x0B Vertical tab The following regular expression will normalize all those: preg_replace('/[x20tnr0x0B]+/',' ',$str); For these characters the character set doesn't matter because almost all character sets share the same characters from 0 to 127. Link to comment Share on other sites More sharing options...
ala888 Posted October 7, 2014 Author Share Posted October 7, 2014 as an individual who is new to the agonizingly painful world of strings and their various encodings, is there a good online tutorial available that goes througheverything from collations to hex to collations, and how everything goes together. I dont know whats going on with strings in general. Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now