Jump to content

Working with Octal Numbers in Regular Expressions


iwato

Recommended Posts

BACKGROUND: The following is a quote from a page entitled Escape Sequences taken from the online PHP Manual. I will not conclude that I have no idea about what it means, but if someone were to ask me to explain it, I would not be able.

After "\0" up to two further octal digits are read. In both cases, if there are fewer than two digits, just those that are present are used. Thus the sequence "\0\x\07" specifies two binary zeros followed by a BEL character. Make sure you supply two digits after the initial zero if the character that follows is itself an octal digit.
What I believe to understand:
  • ddd - An up to three-digit decimal number expressing some character.
  • \ooo - An up to three-digit octal number expressing some character.
  • \xhh - An up to two-digit hexidecimal number expressing some character

Three different ways to write the alarm (BEL) character.

  • 7
  • \007
  • \x07

Based upon the above, I see in the expression \0\x\07 the following:\07 -- a short-hand for the octal number 007.\x\07 -- another expression for the hexidecimal number 07 that could also be written as \x07 with the same value and meaning.\0\x\07 -- a short-hand for the octal number 007 that could also be written as \007.Now, if someone could please interpret the above quoted paragraph in a manner that I might understand. Please refer to the correctness or wrongness of my stated knowledge in your interpretation.Roddy

Link to comment
Share on other sites

I hate these kinds of wicked regexp witchcraft... eval() is the only thing that's more evil... at least this regexp witchcraft is sometimes the only way, so that sort of redeems it.Now... as for how "\0\x\07" is read... it's from left to right, so here's basically what the computer ends up thinking (in a rather colorful sense):

\escaping something...0a number... it's an octal character code then... two more digits to follow, here we come...\escaping something else... wait, what? Fine... let's collect the digits up until now... 0... the character 0 (NULL) then. OK, let's see what we'll escape next...xThe character "x"... Oh, a hex character code then... two digits to follow... let's see...\Escaping something else... come again? What about the character code? Let's see what we have for it... nothing... that's like 0, right? OK, so the character 0 (NULL) then... let's see what are we escaping next...0Another octal escape sequence... great... two more digits... next...7OK, we have "07" so far... one more... next...-end of string-What? Done already? OK, fine... we have 07 up until now... that's the character 007 (BELL).
Link to comment
Share on other sites

I hate these kinds of wicked regexp witchcraft... eval() is the only thing that's more evil... at least this regexp witchcraft is sometimes the only way, so that sort of redeems it.Now... as for how "\0\x\07" is read... it's from left to right, so here's basically what the computer ends up thinking (in a rather colorful sense):
Very nice boen_robot. Fun, informative, and clear.Read left to right when working with regular expressions. Got it. I have been living in Asia for a very long time, and either direction is fine, as a result.I was a little surprised that \x is interpreted as zero, however.Roddy
Link to comment
Share on other sites

I was a little surprised that \x is interpreted as zero, however.
That's one of the easier surprises to figure out... why do you I think I think regexes are wicked? Exactly because they have more surprises like this. The only way to figure out what a wicked regex expression does is to break it down and parse it manually yourself in the fashion above.
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...