Jump to content

regex to get id and date from string


thescientist

Recommended Posts

Hey.So I am working on a mobile mail web app, and I need to implement a different sort of fetching of messages from an IMAP server due to performance issues. Unfortunately I have very little regex experience and need some help accomplishing the retrieval of data from the response.Sample Response

* 1 FETCH (FLAGS (\Seen) INTERNALDATE " 2-Jul-2009 20:12:11 -0400" RFC822.SIZE 1462)* 2 FETCH (FLAGS () INTERNALDATE " 2-Jul-2009 20:21:25 -0400" RFC822.SIZE 1462)* 3 FETCH (FLAGS () INTERNALDATE " 2-Jul-2009 20:33:53 -0400" RFC822.SIZE 1461)..* 25 FETCH (FLAGS () INTERNALDATE "11-Jul-2009 06:31:59 -0400" RFC822.SIZE 1491)

The first number is the messageId and in between the quotes is a timestamp of the message as it was received. I need to be able to get the number _and_ the date out. I have the date using it's own preg_replace_all, and I'm sure I can do the ID just as easily, and then loop though both match arrays and make a cumulative multidimensional array such as each index would be a corresponding ID and timestamp.However, I am a wondering if a regex could somehow do them both at the same time (and what might that look like), or should I try and extract each piece of info out separately and then "merge" the two arrays into a custom structure that I can use in my app?Thanks in advance for any advice and consideration.edit: if anyone has any advice on how to just get the ID, that would be helpful too... :)edit edit: I guess I could just explode on the * and get the text before the first space.

Link to comment
Share on other sites

edit edit: I guess I could just explode on the * and get the text before the first space.
you just need the number right?if so explode '* 'then explode ' 'that would get you the id number.EDIT: as far as getting the date. you have a discrepancy in format. the top 3 have a space after the " and the last one doesn't.
Link to comment
Share on other sites

right. but i need the date too. What I'm asking for I suppose is if it's possible to get them both with one regex, or should I just take two passes at the response and get what the ID's first, then the dates, then just make my final multi-D array from that.edit: at this point I'm just trying to look at things from an efficiency point of view.

Link to comment
Share on other sites

You can do everything with one pattern. If you use preg_match_all it will return all matches to all patterns, so you should experiment with that to see what the result array looks like with multiple patterns and multiple matches. To format the pattern, each thing in parentheses is a capturable subpattern, so in the regex the ID and the date both need parens around them. e.g.:

$pattern = '#^\* ([0-9]*) [^"]*"([^"]*)".*$#ms';

So in that pattern, the two subpatterns in the parens are going to be returned as matches:$pattern = '#^\* ([0-9]*) [^"]*"([^"]*)".*$#ms';The first is the ID, and the second is the date. Make sure to play around with the return array to figure out the format and order that the matches will be in. You can always have more subpatterns there just for clarity in the regex but ignore them in the return array.

Link to comment
Share on other sites

sweet, thanks for the help JSG. I was able to implement getting the ID using simple string functions, but only because this was a minor part of the overall task. Hopefully after my push tomorrow and the greater part of the functionality is resolved, I can go back and make adjustments to this as I go back and clean up the code. (or if I have to make more fixes...)

Link to comment
Share on other sites

any thoughts on how to do the same thing but with this kind of string? I don't have the convenience of the parens around the UID or the quotes around the date now.

* 1 FETCH (UID 7 BODY[HEADER.FIELDS (date)] {41} Date: Mon, 15 Nov 2010 11:14:52 -0500 ) * 2 FETCH (UID 10 BODY[HEADER.FIELDS (date)] {47} Date: Mon, 21 Mar 2011 11:32:26 -0400 (EDT) ) * 3 FETCH (UID 67 BODY[HEADER.FIELDS (date)] {41} Date: Thu, 09 Jun 2011 12:11:39 -0400 ) ..* 18 FETCH (UID 164 BODY[HEADER.FIELDS (date)] {41} Date: Fri, 05 Aug 2011 15:46:02 -0400 )

I mean I got it using string functions, but if there's a way to do it with regex, it might look little cleaner, and be more effecient.

$parsedResponse = array();$messages = array();$resp = $this->sendCmd('FETCH 1:* (body[header.fields (date)] UID)');$logger->debug('sendCmd Response => ');$logger->dump($resp);		$messages = explode('*', $resp);array_shift($messages);$logger->dump($messages); for($i = 0, $l = count($messages); $i < $l; $i++){  $string = ltrim($messages[$i]);  $uidStartPos = (strpos($string, 'UID') + 4);  $uidEndPos = strpos($string, ' ', $uidStartPos);  $dateStartPos = (strpos($string, 'Date') + 5);  $dateEndPos = strrpos($string, ')');  $id = rtrim(ltrim(substr($string, $uidStartPos, $uidEndPos - $uidStartPos)));  $timestamp = rtrim(ltrim(substr($string, $dateStartPos, $dateEndPos - $dateStartPos)));     array_push($parsedResponse, array('id' => $id, 'timestamp' => strtotime($timestamp)));};

ugh... I'm such a regex n00b... :)

Link to comment
Share on other sites

so the final string format has come to be this:

* 1 FETCH (UID 7 INTERNALDATE "15-Nov-2010 11:14:52 -0500") * 2 FETCH (UID 10 INTERNALDATE "21-Mar-2011 11:32:51 -0400") * 3 FETCH (UID 67 INTERNALDATE " 9-Jun-2011 12:11:41 -0400") * 4 FETCH (UID 76 INTERNALDATE " 7-Jun-2011 14:16:39 -0400") * 5 FETCH (UID 79 INTERNALDATE "30-Jun-2011 12:52:57 -0400") * 6 FETCH (UID 80 INTERNALDATE "15-Jul-2011 15:06:41 -0400")

individually in separate preg replace calls I can get them, but I'm having trouble combining them all into one pattern, even using what you've provided.

preg_match_all('/UID (\d+)/', $resp, $m, PREG_PATTERN_ORDER);var_dump($m[1]);echo '<br><hr>';preg_match_all('/"([^"]+)"/', $resp, $p, PREG_PATTERN_ORDER);var_dump($p[1]);echo '<br><hr>';echo 'ALL in One<br>';$pattern = '#^\* UID (\d+) .* "([^"]+)"$#ms';//$pattern = '/UID (\d+) "([^"]+)"/';preg_match_all($pattern, $resp, $x, PREG_PATTERN_ORDER);var_dump($x);

I don't really understand a lot of the characters at the beginning and end, just the basic pattern stuff

UID ([0-9]*)"([^"]+)"

I actually thought I had them both, but then changed something and it all broke. :)

Link to comment
Share on other sites

You want it to start looking at the beginning of each line, and stop at the end of each line:

^$

You could specify exactly what should be at the beginning of the line, but since you don't care then you're just matching any character, 0 or more times:

^.*$

Then you want the text "UID", followed by a space, followed by 1 or more numbers, which you want to capture:

^.*UID ([0-9]+)$

Then skip any character that is not a quote:

^.*UID ([0-9]+)[^"]*$

Then get a quote, then one or more characters that are not a quote (and capture them), then another quote. Might as well add the space before the quote for clarity, since it exists in the string before the quote also:

^.*UID ([0-9]+)[^"]* "([^"]+)"$

Then, any number of characters after the last quote, until the end of the line:

^.*UID ([0-9]+)[^"]* "([^"]+)".*$

Then add the m modifier, so it works line by line:

$pattern = '#^.*UID ([0-9]+)[^"]* "([^"]+)".*$#m';

The main problem with this pattern:$pattern = '#^\* UID (\d+) .* "([^"]+)"$#ms';is the spaces. A space in a pattern matches an actual space character in the string, and your pattern has a space before "UID" which doesn't appear in the text. You also have a slash at the start which should be a dot.

Link to comment
Share on other sites

It's just a series of small steps. This may look confusing:^.*UID ([0-9]+)[^"]* "([^"]+)".*$but it's made up of individual things that were added one at a time, each to solve one specific problem.
agreed. thanks for the breakdown. :)
Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...