son Posted January 15, 2009 Share Posted January 15, 2009 I really struggle with the regular expressions. I want to allow all alpha, number, spaces and ;:-' and usedif (!eregi ('^[[:alnum:][:punct:][:space:]]{2,40}$', stripslashes(trim($_POST['heading'])))) {to find that this does not include the £ symbol. I tried to add:if (!eregi ('^[[:alnum:][:punct:][:space:]\£]{2,40}$', stripslashes(trim($_POST['heading'])))) {without luck.How would I best achieve my objective?Thanks,Son Link to comment Share on other sites More sharing options...
boen_robot Posted January 15, 2009 Share Posted January 15, 2009 How about: if (!eregi ('^([[:alnum:][:punct:][:space:]]|\£){2,40}$', stripslashes(trim($_POST['heading'])))) { ? Link to comment Share on other sites More sharing options...
son Posted January 16, 2009 Author Share Posted January 16, 2009 How about:if (!eregi ('^([[:alnum:][:punct:][:space:]]|\£){2,40}$', stripslashes(trim($_POST['heading'])))) { ? Tried this before without luck. Was also wondering if there is a more general way of allowing all characters, which will display ok in browser and restrict all those, which cause problems (ASCII \xE2\x80\x99, \xE2\x80\x98, \xe2\x80\x9c,\xe2\x80\x9d, \xe2\x80\x93 for example display the square question mark instead of character)? I tried to use the reg expression to achieve exactly that...Son Link to comment Share on other sites More sharing options...
boen_robot Posted January 16, 2009 Share Posted January 16, 2009 Make sure you use UTF-8 everywhere. Otherwise, there will be differences from one encoding to the next and the only sure way would be to turn the sign into it's entity code.I believe in regex you can use \xhh to match the character with "hh" hex code.e.g. if (!eregi ('^([[:alnum:][:punct:][:space:]]|\xA3){2,40}$', stripslashes(trim($_POST['heading'])))) { (I'm not completely sure if A3 is the hex code for a pound... I looked at the Windows character map app) Link to comment Share on other sites More sharing options...
son Posted January 19, 2009 Author Share Posted January 19, 2009 Make sure you use UTF-8 everywhere. Otherwise, there will be differences from one encoding to the next and the only sure way would be to turn the sign into it's entity code.I believe in regex you can use \xhh to match the character with "hh" hex code.e.g.<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> so this side of things should be fine. I use the regular expressions to not let characters go through which cause display issues. Is there another way to achieve this? I mean, there are professional CMS systems and I wondered how they programm that this does not happen. With those corportate user surely they do a lot of copy/paste from Office applications (as my friend does)?SonAlso, just to say: the db table is set to latin_swedish collation. NOt sure if that would create the issues I am getting... Link to comment Share on other sites More sharing options...
boen_robot Posted January 19, 2009 Share Posted January 19, 2009 I said "everywhere", so yeah. If the string you're testing comes from a DB which is not in UTF-8, it's possible that the problem is there.In addition, having a charset meta doesn't guarantee you get UTF-8 in the end. You should explicitly set it as a header, like so: <?php header('Content-Type: text/html; charset=utf-8'); ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> Link to comment Share on other sites More sharing options...
son Posted January 19, 2009 Author Share Posted January 19, 2009 I said "everywhere", so yeah. If the string you're testing comes from a DB which is not in UTF-8, it's possible that the problem is there.In addition, having a charset meta doesn't guarantee you get UTF-8 in the end. You should explicitly set it as a header, like so:<?php header('Content-Type: text/html; charset=utf-8'); ?><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml"><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> The DB itself is UTF-8, I only meant the collation. In addtion, have <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />on all pages...???Son Link to comment Share on other sites More sharing options...
boen_robot Posted January 19, 2009 Share Posted January 19, 2009 And the header? (note the first line in my example) Does it change anything? Link to comment Share on other sites More sharing options...
son Posted January 19, 2009 Author Share Posted January 19, 2009 And the header? (note the first line in my example) Does it change anything?Tried the first line, but still have the weird characters (Â for example)?Son Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.