Jump to content

Allow specific chars to username


loonie35

Recommended Posts

This regular expression will allow alphanumeric characters (case insensitive) and the underscore. If you want to add other symbols, we need to make some changes, and some symbols will need to be escaped while others don't, so please explain.

$username =  $whatever; $pattern = '/^\w+$/'; if (!preg_match($pattern, $username) ) {	// BAD USERNAME }

Link to comment
Share on other sites

(Before anyone walks off because they aren't needed, see the question at the bottom of this post.)The following regex will add the two characters š and ž to the allowed group:

'/^[\w\x9a\x9e]+$/'

If the other characters are that important, you'll have to validate usernames using a different (probably more expensive for the server's resources) technique than only regexes. This is because regular expressions can only understand the characters represented by the decimal numbers 0-255 (hexadecimal 00-ff). All but two of the characters are too high in the Unicode standard to be represented by two digits as regexes require.For starters, the characters must be represented by character entity references in the browser. So the username should, at the database level, actually contain those references. To validate the username, we could

  1. find all the necessary characters and record their positions;
  2. remove them from the string;
  3. validate the string via regex;
  4. insert the references at the recorded positions;
  5. store the username.

The references for all of the characters follow (the first two being possibly unnecessary):

š ž ā č ē ģ ī ķ ļ ņ ō ū

Now how can we make PHP understand these characters when the user submits the registration form?

Link to comment
Share on other sites

I think I found the answer, but it will take a good bit of work. The Multibyte String extension can handle characters above 255, but you have to compile your PHP installation with it enabled. Does anyone know what's involved in ripping out the old PHP and replacing it with a new compilation?Once that's accomplished, you can validate emails with mb_ereg_match, which uses the older, slower, slightly more esoteric "ereg" syntax (which apparently allows 4 hex digits per character code). Most importantly, ereg doesn't support generic character classes like \w.According to http://www.regextester.com/index.html, the ereg regex (note that no digits are mentioned)

^[a-zA-Zēūīōāšģķļžčņ]+$

matches the expected "ēūīōāšģķļžčņFooBar" but also "ēūīōāšģķļžčņFooBar0". But that might be a bug in the site's implementation of ereg, not a real feature. So here's the regex I suggest (until someone familiar with ereg shows up):

^[a-zA-Z0-9ēūīōāšģķļžčņ]+$

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...