Jump to content

Regular Expression


zanfranceschi

Recommended Posts

<pre><?php$target = "my.e-mail_with_symbols@domain.com.br";$re = "/^([a-z0-9\._\-]*)(@{1})([a-z0-9\._\-]*)(\.[a-z0-9]{1,3})$/i";echo "<b>$re</b> \t\t\t <b>$target</b>\n\n\n";if(preg_match($re, $target, $out)) {	echo "<span style=\"color: blue;\">matchs</span>";} else {	echo "<span style=\"color: red;\">doesn't match</span>";}echo "\n\n\n\n";print_r($out);?></pre>

I just started trying some things with regular expressions, and I found this not to be that difficult, but i'm sure i just understand the basics.If someone who knows regexp could see if what I mean is what I wrote, I'd be very thankful!Here it goes:^ = must start with the first "block";([a-z0-9\._\-]*) = a to z, 0 to 9, with ".", "_" and "-" allowed infinite times(@{1}) = "@" must come after the first block only once([a-z0-9\._\-]*) = 3rd block, the same as the first(\.[a-z0-9]{1,3}) = last block preceeded by an dot (mandatory) having alphanumeric chars. Can find this pattern at least once and at most 3 tmes.$ = must end after the abovei = makes the pattern case insensitive.Is all this right? Am I on the right track of regexp? Am I using some unusual/bad practice with the patterns?Thanks a lot!!

Link to comment
Share on other sites

You're generally on the right track, but there is a "more correct" way to do the last part. Instead of doing this:([a-z0-9\._\-]*)(\.[a-z0-9]{1,3})I think this will work better:([a-z0-9_\-]+\.)+([a-z]{2,3})This means that you look for a sequence of one or more letters and numbers (plus _ and -), followed by a dot. You look for that entire sequence one or more times, and then after that is a 2-3 character sequence. You probably don't need digits in the TLD, I don't think any TLDs have numbers in them, and there are no 1-character TLDs.I also replaced the first asterisk with a plus (1 or more, instead of 0 or more. 0 characters for the first subpattern is not valid), so the new pattern looks like this:$re = "/^([a-z0-9\._\-]+)(@{1})([a-z0-9_\-]+\.)+([a-z]{1,3})$/i";

Link to comment
Share on other sites

Thanks a lot!!!I learned new things from you.Didn't know that "*" meant 0 or more and "+" 1 or more. And you used a lot of parenthesis too, so i see it's right to separete pattern groups with ().And why didn't you put the last dots with the last group? Is this just a matter of taste or is there a logic behind it I can't see?Thanks a lot again!!! I see a new world of possibilities with regexpr!!!

Link to comment
Share on other sites

I moved the last dot because that's just how I see it in my head. After the @ sign, there is this group, 1 or more times:([a-z0-9_\-]+\.)All of these will match that pattern:www.akjsdf.w3schools.mail.Since there are 1 or more allowed, you can have as many of those patterns as you want:(www.)(mail.)(w3schools.)www.mail.w3schools.followed by the TLD, which is just "com" or whatever:www.mail.w3schools.comSo, if you strip off the TLD, then you can separate the entire domain into seqences of letters and numbers followed by a dot.The way you were matching it earlier, like this:([a-z0-9\._\-]*)(\.[a-z0-9]{1,3})Would have also matched this:guy@.....comWhich obviously isn't a valid domain name.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...