Jump to content

Get Contents From E-Mail And Write It To Mysql Table


IanArcher

Recommended Posts

The way I see it, you need this information: From the headers, you need the content-type and content-id headers. The content-type header contains the filename that you can extract, and the content-id header contains the CID used in the email. You need to process those to get a mapping between the CID and filename, so that you can lookup a particular CID to find out what the original filename was. If you store those in an associative array where the CID is the key and the filename is the value, that will make that part easy. Once you have that, you can get all of the img tags from the email and loop through them. For each img tag, you need to check the src to see if it starts with "cid:". If it does, you extract the CID from the src and look up the original filename in the array described above. Then you replace the value of the src attribute to point to the file instead of the CID. I'm not sure what you're asking about with regard to creating a scope. Do you want to create a namespace to store variables in?

Link to comment
Share on other sites

The way I see it, you need this information: From the headers, you need the content-type and content-id headers. The content-type header contains the filename that you can extract, and the content-id header contains the CID used in the email. You need to process those to get a mapping between the CID and filename, so that you can lookup a particular CID to find out what the original filename was. If you store those in an associative array where the CID is the key and the filename is the value, that will make that part easy. Once you have that, you can get all of the img tags from the email and loop through them. For each img tag, you need to check the src to see if it starts with "cid:". If it does, you extract the CID from the src and look up the original filename in the array described above. Then you replace the value of the src attribute to point to the file instead of the CID. I'm not sure what you're asking about with regard to creating a scope. Do you want to create a namespace to store variables in?
When i say scope i mean for e.g.:mimheaders.txt:
Content-Type: multipart/alternative;	boundary="----=_NextPart_001_0015_01CCD6A7.473AA550" Content-Type: image/gif; name="Image1[2].gif"Content-Transfer-Encoding: base64Content-ID: <813DEC0642E941F0845447B680DA566A@BrinardHP> Content-Type: image/gif; name="Image2[2].gif"Content-Transfer-Encoding: base64Content-ID: <6829CB8F2D5D43AAB115DE06572E770A@BrinardHP>

how to distinguish between which name= attribute belongs to which Content-ID. So there presents the matter of creating a scope of a certain range, so the script would know where to look.PHP says: "Oh so you have this CID match but there are multiple images in the entire email's mime headers? So let's just shrink the range to the correct placement, and scope the proper name element match."So the above code example for mimeheaders.txt, becomes:

Content-Type: image/gif; name="Image2[2].gif"Content-Transfer-Encoding: base64Content-ID: <6829CB8F2D5D43AAB115DE06572E770A@BrinardHP>

PHP says: "So we noticed this CID match is on the same line as the Content-ID: and lines not to far from it, there is a name attribute called "Image2[2].gif", this must be the name match your looking for".This is really what i'm getting at you know?

Edited by IanArcher
Link to comment
Share on other sites

Don't get all of the headers and write them to a text file. Get each set of headers individually, process that one set (get the name and CID from it), and add that to the array: $images[$cid] = $filename; Then get the next set of headers and process those. The end result should be an array that contains a mapping between CID and filename values, not a text file that contains all of the headers from every section in the email.

Link to comment
Share on other sites

Don't get all of the headers and write them to a text file. Get each set of headers individually, process that one set (get the name and CID from it), and add that to the array: $images[$cid] = $filename; Then get the next set of headers and process those. The end result should be an array that contains a mapping between CID and filename values, not a text file that contains all of the headers from every section in the email.
I don't really follow that last example you gave:
$images[$cid] = $filename;

Could you give me an example/sample please?

Edited by IanArcher
Link to comment
Share on other sites

that is the example. instead of your numerically indexed array starting at 0, use the CID as the indexes instead.

(	["813DEC0642E941F0845447B680DA566A"] => Image1.gif,	["6829CB8F2D5D43AAB115DE06572E770A"] => Image2.gif}

Edited by thescientist
Link to comment
Share on other sites

that is the example. instead of your numerically indexed array starting at 0, use the CID as the indexes instead.
(	["813DEC0642E941F0845447B680DA566A"] => Image1.gif,	["6829CB8F2D5D43AAB115DE06572E770A"] => Image2.gif}

I did this:
	$srcMatch[2][0] = $image;	$cidMatch[1][0] = $cid; 	$image[$cid] = $filename;	print_r($filename);

and it prints

Image2.gif

For learning purposes, Howcome it prints the filename of the image though?i also changed the indexes to 1, for it to look for Image1[2] instead

    $srcMatch[2][1] = $image;    $cidMatch[1][1] = $cid;

But it still prints Image2[2].gif

Edited by IanArcher
Link to comment
Share on other sites

The only thing you're telling it to print is the filename. If you want to print the entire array then use $image instead of $filename.
I know, but what i'm trying to understand is, why the combination of $image and [$cid] prints the filename of the second image, even if i do not define these variables as their img tag source matches and cid source matches, it appears anyway. Why is that?
Link to comment
Share on other sites

You're not printing $image[$cid], you're printing $filename. I would need to see all of your code though, it looks like you've still got 2 arrays (an array of CIDs, and an array of filenames) when you should only have 1 that contains both.

Link to comment
Share on other sites

You're not printing $image[$cid], you're printing $filename. I would need to see all of your code though, it looks like you've still got 2 arrays (an array of CIDs, and an array of filenames) when you should only have 1 that contains both.
Here's a copy of it my all of my code:http://pastebin.com/bRARHt0u
Link to comment
Share on other sites

The $cidMatch and $nameAttr arrays are the two arrays that need to be joined. They need to be built at the same time. When you loop through to get the headers for each email you need to do the regular expression match to get the CID and name, and add those to one array.

$images = array();for ($i = 1; $i <= $num_parts; $i++){  $mimepart = imap_fetchmime($mbox, $no, $i);  // use regex to get CID and name from headers  $images[$cid] = $name;}

You can figure out how many parts are in the email, you don't need to hardcode that at 12. If you use imap_fetchstructure, for example, one of the items it returns is an array of the parts. Once you build the above array to map the CID to filenames, then you can loop through the array of img tags and get the src of each one to check if it contains a CID and look up the CID in the previous array to get the filename of the image that it corresponds to.

Link to comment
Share on other sites

Also, i am using the below code to strip off the 'cid:' string from the CID match, but it keeps throwing the else message:

if(array_search('cid:', $srcArr)){str_replace('cid:', '', $srcArr);} else {echo 'No cid affix found<br />';}    echo '<hr>';

I used,in_array and preg_replace to no avail. Any ideas? It's how i am going to match up the content-id values with the cid matches.

Link to comment
Share on other sites

array_search() is case sensitive so that it is not matching. you can use str_ireplace() fo case insensitive replacement. rather than searching through the array you can directly replace the cid using str_ireplace(). if you need to determin the count of replacement occurance you can use the 4th param of that function. http://php.net/str_ireplace. that would be more efficcient.

str_replace('cid:', '', $srcArr,$occurance);if($occrance===0)echo 'No cid affix found<br />';

  • Like 1
Link to comment
Share on other sites

Ok so following your guide justsomeguy, i have happed the CID to the names using:

echo '<br />Array Joins<br/>';$mergeArr = array_combine($nameArr, $cidArr);echo '<pre>';print_r($mergeArr);echo '</pre>';

Which prints

Array JoinsArray(    [Image1[2].gif] => 813DEC0642E941F0845447B680DA566A@BrinardHP    [Image2[2].gif] => 6829CB8F2D5D43AAB115DE06572E770A@BrinardHP    [] =>)

Just to be sure, is this the kind of mapping process you were referring to? If so i was trying to search the array using regexp to test it but it kept throwing the expects parameter to be a string, array given Error, as before, but i have no other knowledge, either in the manual as far as i can see that does this:

//preg_match_all('/[(.*)]\s+=>\s+(.*)\n/i');if(preg_match_all('/[(.*)]\s=>\s(.*)/', $mergeArr, $mergeMatch)){echo 'Matches<br />';} else {echo 'No matches<br />';}

Prints 'No matches'If the above were successfull i would use somewhere along the lines of the following code to do a replace:

echo 'Is $srcArr[0] equal to $cidArr[0]?<br />Answer:';if($srcArr[0] == $cidArr[0]){    //prints 'Yes'echo '<h1>Yes</h1>';} else {echo '<h1>No</h1>'; }preg_replace($srcArr[0], $mergeMatch[2], $html_part);

Where am I going wrong with the regexp and the replace?

Link to comment
Share on other sites

I was suggesting to build the array at once, instead of building 2 arrays and then combining them. You can't necessarily guarantee that each entry in one array matches with the same entry in the other, maybe one of the sets of headers doesn't have a content ID for example. I was suggesting to loop through each part to get the headers for just that part, check for a filename and content ID for that one part, and add them to the array at that time, rather then getting all content IDs, all filenames, and assuming they match up. Other than that, the array values and keys are reversed. The key should be the content ID and the value should be the filename. Regular expressions aren't used to search through arrays, they are used to search through strings. Hence the error. If you want to loop through the array and search each one you can, but the point of building that lookup array is so that you don't need to search. If you use a regular expression to match all of the img tags in the HTML then you can loop through them and get the src for each tag and extract the content ID. You can use the content ID to look up the filename in the array without needing to search through it. Once you have the filename, then you go back and replace the src that contains the CID with the filename.

Link to comment
Share on other sites

So if i have:

    for($i = 1; $i <= $num_parts; $i++){        $fp = fopen('mimeheaders.txt', 'w');        $mimepart = imap_fetchmime($mbox, $no, $i);    //1st part        fwrite($fp, $mimepart);        fclose($fp);        $mimeSave = file_get_contents('mimeheaders.txt'); //save a copy of the headers   	$images = array();}

Is that what you're referring to when you say

loop through each part to get the headers for just that part, check for a filename and content ID for that one part, and add them to the array at that time
Using a regex right? I'm not that sure about how to go about beginning to write something like that. Could this be an example of what you are referring to?
	  $attachments = array();      // PARAMETERS      // get all parameters, like charset, filenames of attachments, etc.      $params = array();      if ($p->ifparameters)        foreach ($p->parameters as $x)          $params[ strtolower( $x->attribute ) ] = $x->value;      if ($p->ifdparameters)        foreach ($p->dparameters as $x)          $params[ strtolower( $x->attribute ) ] = $x->value;      // ATTACHMENT      // Any part with a filename is an attachment,      // so an attached text file (type 0) is not mistaken as the message.      if (!empty($params['filename']) || !empty($params['name'])) {        // filename may be given as 'Filename' or 'Name' or both        $filename = (!empty($params['filename']))? $params['filename'] : $params['name'];        // filename may be encoded, so see imap_mime_header_decode()        $attachments[$filename] = $data;  // this is a problem if two files have same name

Link to comment
Share on other sites

That's the right loop to use, but instead of writing everything to a text file you run the regex on each one. You use the exact same regular expression you were already using, the only difference is that you're checking one set of headers at a time instead of collecting all of them in a text file and checking them all at once.

$images = array();for($i = 1; $i <= $num_parts; $i++){  $headers = imap_fetchmime($mbox, $no, $i);  $fname = '';  $cid = '';    $matches = array();   preg_match('/Content-Type:(.*);\s+name="(.*)"/i', $headers, $matches);  if (count($matches) >= 3)    $fname = $matches[2];   preg_match('/Content-ID:(.*)<(.*)>/i', $headers, $matches);    if (count($matches) >= 3)    $cid = $matches[2];   if ($fname != '' && $cid != '')    $images[$cid] = $fname;} print_r($images);

Link to comment
Share on other sites

That's the right loop to use, but instead of writing everything to a text file you run the regex on each one. You use the exact same regular expression you were already using, the only difference is that you're checking one set of headers at a time instead of collecting all of them in a text file and checking them all at once.
$images = array();for($i = 1; $i <= $num_parts; $i++){  $headers = imap_fetchmime($mbox, $no, $i);  $fname = '';  $cid = '';    $matches = array();   preg_match('/Content-Type:(.*);\s+name="(.*)"/i', $headers, $matches);  if (count($matches) >= 3)	$fname = $matches[2];   preg_match('/Content-ID:(.*)<(.*)>/i', $headers, $matches);    if (count($matches) >= 3)	$cid = $matches[2];   if ($fname != '' && $cid != '')	$images[$cid] = $fname;} print_r($images);

I ran your code, but nothing gets printed to the browser
Link to comment
Share on other sites

Did you define $num_parts?
I was looking at it in the actual for loop and saw the $num_parts variable and trying to figure if something was being done with it, because i never actually used a for loop, only foreach loops before and it should've occured to me anyhow, but what should it be actually? I'm not to sure myself.
$num_parts = '';

or

$num_parts = array();

Edited by IanArcher
Link to comment
Share on other sites

I mentioned in post 86 that you can use the return value from imap_fetchstructure to get the number of parts. http://www.php.net/manual/en/function.imap-fetchstructure.php One the properties of the object that it returns is an array called parts, you can use that array to figure out how many parts there are.

Link to comment
Share on other sites

  • 2 weeks later...

So i made my edits to the code you gave, and it does the replacement, yet the test email i'm working with has two image tags in the body, and it only replaces on. I've been working for a while now trying to get it to jump to other image tags in the body to do replacements with the proper filenames for all of them, but no luck. Used preg_match_all, if statements, even tried putting the code in a function called imageConvert. Forgive me if this is simple to you, I'm still unclear on how i should arrange it to properly replace all the image tag's sources with the filenames even after re-reading through your advice.

//The Situation:	// Once the replace has occured, you will move onto the next image tag		  for ($no = 1; $no <= $box->Nmsgs; $no++) // loop through the messages{	preg_match('/<img\s+([^>]*)src="(([^"]*))"\s+([^>]*)(.*?)>/i', $html_part, $srcMatch);  // if an image tag is found, start the process for that		  $cidPos = strpos($srcMatch[2], 'cid:');	 // find cid: in html body			if($cidPos !== false)				{		$srcMatch[2] = str_ireplace('cid:', '', $srcMatch[2], $occurance);	   }   // slice off cid:			 elseif($occurance===0)			{		echo 'No occurence';			}				  //count the amount of parts there are in the email	$num_parts = 2;		for($i = 1; $i <= $num_parts; $i++){	  $mimeheaders = imap_fetchmime($mbox, $no, $i);	  $fname = '';	//empty string	  $cid = '';	  //empty string	  $matches = array();	//empty array	  $images = array();	//empy array  	  preg_match('/Content-Type:(.*);\s+name="(.*)"/i', $mimeheaders, $matches);	  if (count($matches) >= 3)		$fname = $matches[2]; 	  preg_match('/Content-ID:(.*)<(.*)>/i', $mimeheaders, $matches);  	  if (count($matches) >= 3)		$cid = $matches[2];			  if ($fname != '' && $cid != '')		$images[$cid] = $fname;  // map the $cid to the $image, producing the image filename 	if($srcMatch[2] == $cid){	 // if the match out of the image tag is equiv to the cid		$html_part = str_ireplace('cid:', '', $html_part, $returnstring);   //slice off cid: (again), **fix this**		echo '<b><a style="color:orange"><br />Success</a></b><br />';		$srcMatch[2] = '/'.$srcMatch[2].'/';	//needs delimiters		echo $srcMatch[2];		$html_part = preg_replace($srcMatch[2], $image, $html_part);	} else {	echo '<b><a style="color:orange"><br />Cannot continue replace procedure</a></b>';			 }			 } //end of $num_parts loop			} //end of Nmsgs loop

Edited by IanArcher
Link to comment
Share on other sites

Some of the issues I see are that you are hard-coding $num_parts as 2, so it's only going to look at 2 parts of the mail (probably the body and one attachment), and you're redefining the $images array every time in the for loop so it's only going to contain data from the last part that was processed. In my example code I defined $images outside the loop. The loop is supposed to add items to the array, not clear it out every time.

Link to comment
Share on other sites

Some of the issues I see are that you are hard-coding $num_parts as 2, so it's only going to look at 2 parts of the mail (probably the body and one attachment), and you're redefining the $images array every time in the for loop so it's only going to contain data from the last part that was processed. In my example code I defined $images outside the loop. The loop is supposed to add items to the array, not clear it out every time.
I hard-coded $num_parts like that only for a short time, I intend of getting the exact number for it after i had completed this part of the testing actually. I moved the $images array out of the for loop for $num_parts, yet still only one image is being properly replaced, the other still remains a cid: entity.
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...