Jump to content

Migrating to PCRE


boen_robot

Recommended Posts

I have a certain place in a project where I do

$out = ereg_replace('\\\$', '\\\\', $in);

The basic idea is that if the string ends with a "\", the slash itself should be replaced with "\\". And this works for now.Because this function is deprecated in PHP 5.3, and is probably scheduled for removal in PHP 6, I want to rewrite it to the PCRE equivalent. For some reason, the very equivalent:

$out = preg_replace('/\\\$/', '\\\\', $in);

Is not performing as expected.This is one sample input string:

\"changed\

The expected output is

\"changed\\

but what happens instead is that the original string is returned, i.e. nothing is matched.Is there some kind of a suble difference in regex dialects I'm not aware of? What could it be? Ideas?

Link to comment
Share on other sites

One slash to escape the slash in PHP, one "literal" slash (from PHP's perspective) for the regex escape slash, one literal (to both PHP and the regex) slash. I verified that

'/\\\$/'

is evaluated as

/\\$/

(or at least var_dump() shows it like that)Which (AFAIK) should mean a literal backslash at the end of the string, though like I said, it doesn't match for some reason.For the sake of experiment, I tried using two and four slashes.

'/\\$/'

is evaluated as

/\$/

and obviously doesn't work.

'/\\\\$/'

similarly to having three slashes is evaluated as

/\\$/

but again doesn't work properly.

Link to comment
Share on other sites

I have a certain place in a project where I do
$out = ereg_replace('\\\$', '\\\\', $in);

The basic idea is that if the string ends with a "\", the slash itself should be replaced with "\\". And this works for now.

I am surprised that this works, as it would appear that the first backward slash in the pattern \\\$ escapes the second, and that the third escapes the $ metacharacter that would otherwise indicate the end of the line.Roddy
Link to comment
Share on other sites

At first, I was surprised myself. I thought that a backslash escapes a backslash in PHP only if it's at the end of a string. As it turns out, it escapes it everywhere, but it's optional, so if you want to write "\\a", you have to write "\\\a" or "\\\\a", but not just "\\a".But again, what could PCRE be doing?

Link to comment
Share on other sites

The only thing I can think of would be that it's parsing the pattern incorrectly, where for some reason it's seeing \\$ as a slash followed by an escaped $. I can't imagine why it would parse it that way, it's pretty counter-intuitive, but I guess parentheses could clear that up.

Link to comment
Share on other sites

At first, I was surprised myself. I thought that a backslash escapes a backslash in PHP only if it's at the end of a string. As it turns out, it escapes it everywhere, but it's optional, so if you want to write "\\a", you have to write "\\\a" or "\\\\a", but not just "\\a".
No, no! If you were using the character a only as an example, it is the wrong character to be used for that. According to the TextWrangler manual \a is a special character for the so-called hexidecimal BEL or alarm character: \x07.By way of further suggestion, why not substitute with hexidecimal characters where you are having trouble. For example, you could rewrite $ as \x24. Parentheses might also work as Justsomeguy suggested.Roddy
Link to comment
Share on other sites

After doing some interesting combinations with the pattern (the parenthesis and hex included), I decided to go wild, and try something with the replacement... and it worked (why must it always be the one place you thought of as "unlikely to affect the result"?). I later found out why.The following did the trick:

$out = preg_replace('/\\\$/m', '\\\\\\', $in);

Notice the difference? The pattern itself was OK (thought I added the "m" modifier... just in case), but the replacement had to have 6 slashes (evaluated as 3 slashes), not 4 slashes (evaluated as 2 slashes). The reason, for anyone interested is on preg_replace()'s page. And more precisely, in the replacement parameter description. It says that backreferences are accessible with "\\n", where n is the number. The tricky part is that this is what you write as a PHP string, but is evaluated as "\n" (not to be confused with the PHP string "\n", which is evaluated as a new line).So, to escape it all, you have to have 4 slashes evaluated as 2 in the string, which evaluates to a literal "\" in the replacement, plus 2 slashes, evaluated as one more "\" in the string, which is then treated literally in replacement, thanks to the fact that it's the very last character.What happens with 4 slashes instead is that the last slash does get matched... but it's again replaced by a slash, so it looked as if nothing occured.Issues like this remind me why I love the likes of XPath and avoid regular expressions.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...