ckrudelux Posted November 24, 2010 Share Posted November 24, 2010 This first row is replaced but not the other.. preg_replace("/<item>(.*)<\/item>/", "<p>$1</p>", $content); <item>content</item><item>content</item> How do make both of them replaced Link to comment Share on other sites More sharing options...
Ingolme Posted November 24, 2010 Share Posted November 24, 2010 Your regular expression is likely to be taking up all the content between the first <item> tag and the las </item> tag.You should try a search that's not so greedy, like <item>(.*)?<\/item>Since this is XML, I'd actually recommend using the DOM or XSLT instead. Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 Your regular expression is likely to be taking up all the content between the first <item> tag and the las </item> tag.You should try a search that's not so greedy, like <item>(.*)?<\/item>Since this is XML, I'd actually recommend using the DOM or XSLT instead.No this was for an other thing it was just easier to explain this way Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 No this was for an other thing it was just easier to explain this way Well adding the "?" didn't make any differences in the output Link to comment Share on other sites More sharing options...
Ingolme Posted November 24, 2010 Share Posted November 24, 2010 Actually, you might need to add "g" as a modifier as well. Link to comment Share on other sites More sharing options...
birbal Posted November 24, 2010 Share Posted November 24, 2010 This first row is replaced but not the other..preg_replace("/<item>(.*)<\/item>/", "<p>$1</p>", $content); <item>content</item><item>content</item> How do make both of them replaced i am not sure are you talkin about to match it whole<item>content</item><item>content</item> AS $contentOR youe are trying to match it isolately in different situation <item>content</item> AND <item>content</item> i am not sure which one is your second/first row you are reffereing.though for the second case it will work i think <item>[a-z\s]*<\/item> Link to comment Share on other sites More sharing options...
jeffman Posted November 24, 2010 Share Posted November 24, 2010 EDIT. See Posts 12-13 below. The principle here is correct, but the character class is messed up.The biggest problem is the dot character specifically will not match line breaks, so pretty-printed XML/HTML creates a problem. Birbal's solution comes close, but it will miss caps and digits, etc. To really capture EVERYTHING, try this: "/<item>([^\n\n]*)<\/item>/" The central part will be interpreted as "match any character that is a line break or not a line break," which really does mean everything. Of course, you can do that with any character, but using the line break sequence will remind you why you are formatting the expression this way. Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 i am not sure are you talkin about to match it whole<item>content</item><item>content</item> AS $contentOR youe are trying to match it isolately in different situation <item>content</item> AND <item>content</item> i am not sure which one is your second row you are reffereing.though for the second case it will work i think <item>[a-z\s]*<\/item> $content is just referring to this:<item>content</item><item>content</item> This is the first match <item>content</item> This is the second match (starting at row 2): <item>content</item> The second match isn't found for some reason maybe preg_replace only can find matches with in a line and not on several lines. Which I have hard time to believe. Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 The biggest problem is the dot character specifically will not match line breaks, so pretty-printed XML/HTML creates a problem. Birbal's solution comes close, but it will miss caps and digits, etc. To really capture EVERYTHING, try this:"/<item>([^\n\n]*)<\/item>/" The central part will be interpreted as "match any character that is a line break or not a line break," which really does mean everything. Of course, you can do that with any character, but using the line break sequence will remind you why you are formatting the expression this way. Why are you using two line brakes? Link to comment Share on other sites More sharing options...
birbal Posted November 24, 2010 Share Posted November 24, 2010 ^\n will match all character except new line and \n will match new line also. Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 ^\n will match all character except new line and \n will match new line also.Still don't work, it don't getting replaced :)this is my line:preg_replace("/<item>([^\n\n]*)<\/item>/", "replaced", $file) Link to comment Share on other sites More sharing options...
jeffman Posted November 24, 2010 Share Posted November 24, 2010 Yeah, you're right. I changed your html without realizing it, so I didn't notice the character class is broken. give me a minute. Link to comment Share on other sites More sharing options...
jeffman Posted November 24, 2010 Share Posted November 24, 2010 What I forgot was the negation operator ( ^ ) affects everything in the braces, not just the character immediately following. So we'll try a similar idea with a character that has a matching negation character:"/<item>([\s\S]*?)<\/item>/"\s == all whitespace characters, including \n and \r\S == all characters that are not whitespace characters.So [\s\S] means "match all whitespace characters and all non-whitespace characters, which again means everything.The ? operator is needed for this now, just as Ingolme explained in Post #5 Link to comment Share on other sites More sharing options...
jeffman Posted November 24, 2010 Share Posted November 24, 2010 Okay, I really am sleepy today. I was sure a simple modifier would handle this, but I couldn't remember it. Using the \s character reminded me. The whole pattern can be simplified to this, which is almost what you had to begin with:"/<item>(.*?)<\/item>/s"where the "s" modifier allows dot to match linebreaks also. Link to comment Share on other sites More sharing options...
ckrudelux Posted November 24, 2010 Author Share Posted November 24, 2010 What I forgot was the negation operator ( ^ ) affects everything in the braces, not just the character immediately following. So we'll try a similar idea with a character that has a matching negation character:"/<item>([\s\S]*?)<\/item>/"\s == all whitespace characters, including \n and \r\S == all characters that are not whitespace characters.So [\s\S] means "match all whitespace characters and all non-whitespace characters, which again means everything.The ? operator is needed for this now, just as Ingolme explained in Post #5That was very helpful and yes it works now.. Thank you very much for this information Link to comment Share on other sites More sharing options...
Dilated Posted November 24, 2010 Share Posted November 24, 2010 To clear things up:The simplest way to do it is, "preg_replace('/<item>(.*?)</item>/s', '<p>$1</p>', $content);". An ungreedy sign must always be put after the quantifier, not after the grouping. If it is after anything but the quantifier, it becomes the "{0,1}" quantifier.A negated set negates all characters in the set. [^\n\n] is redundant, and the same thing as [^\n].The "g" modifier doesn't exist in PHP - this behavior is enabled by default for preg_replace. (You can limit preg_replace's iterations by using the 4th parameter.) Also, the reason so many people are inclined to use "/[\s\S]*/" over "/.*/s" is because javascript wrongly never specified the "s" modifier in their version of PCRE, and a lot of people are used to javascript. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.