Jump to content

Regular Expressions


ThePsion5

Recommended Posts

Ok, I'm using the following regular expression to try and match HTML frame tags:

<(frame|FRAME)[\s\S]*?/(\1)?>

To break it down:<(frame|FRAME) Should match the opening tag[\s\S]*? Should match any number of letters or numbers inside the tag/(\1)?> Is a back reference to the first subpattern (frame|FRAME) plus the closing slash and bracket, and should match the closing tags /frame>, /FRAME>, or />This seems correct to me, but so far i'm not getting any matches for my test data...any ideas?

Link to comment
Share on other sites

How about using the "i" pattern mod for ignore case and escaping the \ in \1.

<?php$html ="<FRAME src='test.html'></FRAME>";$re = "/<(frame)(.*)><\/\\1>/i";echo preg_match($re, $html, $matches);print_r($matches);?>

HTH

Link to comment
Share on other sites

That actually worked out very well, thanks :)Now I have another, slightly more complex difficulty - I'm trying to parse out the attributes of a HTML/XML tag using a regular expression, this is what I have so far:This code matches the tag itself:

<[\s\S]+>

And this code matches the attributes and their respective values:

 [\S]+[\s]*=[\s]*(\'|")?[\s\S]+(\1)

I break the second regular expression down like this:[\S+[\s]? - At least one letter+ 0 or more spaces=[\s]*(\'|")? - The equal sign plus 0 or 1 opening quotations[s\S\]+(\1) - The contents of the attribute (1 or more characters) and the closing quotations (back referencing the quotation subpattern)Is this anywhere near correct? lolEDIT: I've been trying the aforementioned regular expression using this test data:

href="www.google.com" title="Google"

With the preg_grep() function...unfortunately, it returns the entire string as a single match, while I would like the expression to return each attribute as a match...What might I be doing incorrectly? Thanks in advance!

Link to comment
Share on other sites

Still Make my Brain Hurt
Me to0o0, you got off in the deep water :) Here's something that might help you.haacked.comThen you have to consider those nested tags. :)EDIT:One thing I can recommend is to make the patterns include newlines:
<?php$html ="<FRAMEsrc='test.html'></FRAME>";$re = "/<(frame)(.*)><\/\\1>/is";echo preg_match($re, $html, $matches);print_r($matches);?>

Link to comment
Share on other sites

Unfortunately, I'm still stuck on my second regular expression designed to match attributes. My modified regular expression looks like this:

\w+\s*=\s*("|')?[^\1]+\1

\w+\s* - Matches 1 or more 'word' characters and any number of whitespace characters=\s*("|')? - Matches the equal sign, any number of whitespace characters, and 0-1 single or double quote values[^\1]*\1 - Matches a 0 or more characters not equal to the end quote and the end quote itselfI'm still not sure why this doesn't work...testing it with the string href="http://www.google.com" title="Google" returns the entire string as a match. Any ideas?EDIT: I've tried breaking th regex into bits and work with them from their...this is the segment that should match the attribute values: =("|\')[a-zA-Z]+\1But this returns the entire test string as well, which seems very strange to me...I wonder what may be up? I feel like I'm missing something very basic.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...