Fmdpa Posted November 9, 2010 Share Posted November 9, 2010 I just discovered a bug in my regex pattern for parsing ubb code. It happens only in very specific situations. $text = '[i]i asdfasdf[/i]';$regex = preg_match('/\[i\]([^(\[\/i\])].*)\[\/i\]/s', $text); The match fails when there is an "i" directly following the opening bbcode tag. That's all. If I put a space between the "i" and the tag, then the match is successful. What's wrong? Link to comment Share on other sites More sharing options...
justsomeguy Posted November 9, 2010 Share Posted November 9, 2010 This part:[^(\[\/i\])]Is telling it to look for anything that is not an open paren, open or close bracket, slash, or i. It's not negating that entire thing, it's negating each character. That's what the square brackets do, they define a character class. So it should behave the same if you have parens or square brackets or a slash instead of the i. Link to comment Share on other sites More sharing options...
ShadowMage Posted November 9, 2010 Share Posted November 9, 2010 I think that you can replace:([^(\[\/i\])].*)with(.*?)To make it ungreedy, meaning it will take as few characters as possible to still match the pattern. I can't remember if that's the exact syntax, though. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 9, 2010 Author Share Posted November 9, 2010 Actually, I had the non-greedy match earlier, but the error still persisted. [^(\[\/i\])]I thought this meant that the entire group was negated. How would a negate the group as a whole? Link to comment Share on other sites More sharing options...
justsomeguy Posted November 9, 2010 Share Posted November 9, 2010 It can't be in square brackets. Square brackets don't indicate a group or pattern, they indicate a class or range of characters. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 9, 2010 Author Share Posted November 9, 2010 Well...I can only negate characters if they are in a character class, right? Link to comment Share on other sites More sharing options...
justsomeguy Posted November 10, 2010 Share Posted November 10, 2010 I believe so. It sounds you want to use an ungreedy match like ShadowMarge suggested. It's not going to handle nested tags, but neither would the original pattern. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 10, 2010 Author Share Posted November 10, 2010 This is the only regex I could come up with that ensures a properly opened/closed pair of tags. Would a regex that handles nested tags be much more complicated? Link to comment Share on other sites More sharing options...
justsomeguy Posted November 10, 2010 Share Posted November 10, 2010 Frankly, I'm not even sure how to write that. You would need to either use backreferences and only support a limited nesting depth, or use recursion in the pattern.http://www.php.net/manual/en/regexp.reference.recursive.php Link to comment Share on other sites More sharing options...
birbal Posted November 10, 2010 Share Posted November 10, 2010 i broked up the open tag and closed tag into diffenrent patern and replaced with the html open tags. i am not counting the content ("some text")in preg_match. it is formatting the text beetween the properly closed and opend bbcode.so it is doing nested tag also (only html not surely for xhtml).some thing like: suppose (patern>>replace):[i]>><i>[/i]>></i>[b]>><b>[/b]>></b>text:[i][b]some text[/b][/i]change into:<i><b>some text</b></i> text: [b ][i ]some text[/i ][/b ]change into:<b><i>some text</i></b>as far till now it is working well with me. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 10, 2010 Author Share Posted November 10, 2010 birbal, what happens when you don't close the bbcode tag, such as this: [b]bold text Will it convert to an unclosed html <b> tag, causing the rest of the page to be bold? Link to comment Share on other sites More sharing options...
birbal Posted November 10, 2010 Share Posted November 10, 2010 birbal, what happens when you don't close the bbcode tag, such as this:CODEbold textWill it convert to an unclosed html <b> tag, causing the rest of the page to be bold? yes it will make rest of thing bold. if close tag is not there.:/ Link to comment Share on other sites More sharing options...
birbal Posted November 10, 2010 Share Posted November 10, 2010 this morning i was trying to fix that above problem and found a way.though there may have some problems. it seems to me ok till now and validating ok . if i am wrong any way correct me. need some correction and confirmation from experts.patern1 (\[b\])[a-zA-Z0-9\s\[\]]{1,1000}(?=\[/b\]) patern2 (?<=\[b\])[a-zA-Z0-9\s\[\]]{1,1000}(\[\/b\]) replace1<b>replace2</b>what it will do1st patern will replace value when (its true) there minimum 1 letter maximum 1000 letters and a close "[/ b]" tag after the letters (beetween 1-1000) then. if its true it will replace with "<b>. if it false then it will not be replaced."2nd patern will replace to </b> when there is a open tag and minimum 1 text after it.i think now it will add nested tag and it also look that there is a closing tag. Link to comment Share on other sites More sharing options...
Fmdpa Posted November 10, 2010 Author Share Posted November 10, 2010 I never figured out how to use the look-ahead assertions. It looks like that might be the key. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.