Jump to content

Get Contents From E-Mail And Write It To Mysql Table


IanArcher

Recommended Posts

I would still suggest using the regular expression I posted in posts 29 and 39 to convert all filenames to add your images folder to them. That's really all you need to do in terms of the regular expression. When you go through the list of attachments to save then you might not save all of the attachments and might end up with img elements that don't point to an actual file, but that's their fault for emailing a file that you don't support. It's another issue if you want to clean up the HTML afterwards to remove any img elements that don't link to actual files, but in general I would convert the filenames to something usable, save whatever attachments you think are worth saving, and that's it.
Ok i'm going to try something and post back here. Also,
$content = preg_match('/Message: (.*)/', $html_part, $matches) ? $matches[1] : '';   

This only captures whats on the first line. I have set a certain amount of rules for the php script on the whole, so everything after Message: will be the body content for the news article. I want to get everything (paragrahps, images) after Message:.The regex next to it only gets whats on that line though. I modified it to this

$content = preg_match('/Message: ([^"@]*)/', $html_part, $matches) ? $matches[1] : ''; 

But that only gets one paragraph. I need support for multiple paragraphs, line breaks and such. Could you show me a modified regex that will do this?

Edited by IanArcher
Link to comment
Share on other sites

From the script you gave me, do you know anyway i can see the entire source code of an email message?That shows all the X-Mailer, Return-parth, Delivery-date, X-Originating-IP, and so forth? I researcehd through most of the imap functions in the manual and couldn't find anything that could.

Link to comment
Share on other sites

In the code I posted in post 12, the $raw_headers variable contains all of the headers from the message. It uses the imap_fetchheader function. The $original_message variable contains the entire unprocessed body of the email. The original email as received by the mail server is the headers, then an empty line, then the body.

Link to comment
Share on other sites

In the code I posted in post 12, the $raw_headers variable contains all of the headers from the message. It uses the imap_fetchheader function. The $original_message variable contains the entire unprocessed body of the email. The original email as received by the mail server is the headers, then an empty line, then the body.
Ok thanks, i got it done.Also this is a snippet of code from my script. Could you possibly tell me why whenever i run it to insert into the table, i get this error with a test email with just text and paragraphs:
ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'font-size:12.0pt'>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse lib' at line 1
I looked at the source formatting and saw that the style for this paragraph is shown as style='font-size:12.0pt'> instead of with double-quotes. That should be the issue right?
function newsupdate ($text_part, $html_part){//Login to MySQL Datebase$hostname = "localhost";$db_user = "user";$db_password = "password";$database = "database";$db_table = "table";$db = mysql_connect($hostname, $db_user, $db_password);mysql_select_db($database,$db);//These functions searches the open e-mail for the the prefix defining strings.//Need a function to search after the space after the strings because the subject, categories, snippet, tags and message are constant-changing.$subject = preg_match('/Title: (.*)/', $text_part, $matches) ? $matches[1] : '';$subject = str_replace("Title: ", "" ,$subject);        $categories = preg_match('/Categories: (.*)/', $text_part, $matches) ? $matches[1] : '';$categories = str_replace("Categories: ", "" ,$categories); $tags = preg_match('/Tags: (.*)/', $text_part, $matches) ? $matches[1] : ''; $tags = str_replace("Tags: ", "" ,$tags);          $snippet = preg_match('/Snippet: (.*)/', $text_part, $matches) ? $matches[1] : '';                 $snippet = str_replace("Snippet: ", "" ,$snippet);            // removes the snippet prefix$content = preg_match('/Message: ([^"@]*)/s', $html_part, $matches) ? $matches[1] : '';              $content = str_replace("Message: ", "" ,$content);     $when = strtotime("now");   	        	$uri = strtolower(str_replace(" ", "-", $subject));   $uri = substr($uri, 0, 20);       /*** THIS CODE WILL TELL MYSQL TO INSERT THE DATA FROM THE EMAIL INTO YOUR MYSQL TABLE ***/$sql = "INSERT INTO $db_table(`caption`,`snippet`,`content`,`when`,`uri`,`tags`,`categories`,`DATE`) values ('$subject','$snippet','$content','$when','$uri','$tags','$categories','$when')";if($result = mysql_query($sql ,$db)) {} else {echo "ERROR: ".mysql_error();}//echo "<h1>News Article added!</h1>"; //uncomment for testing purposes}    //end defining the function NewsUpdate

Link to comment
Share on other sites

the query is being broken for quotes you need to escape the inputs before its geting inserted into db using mysql_real_escape_string()

Edited by birbal
Link to comment
Share on other sites

I used

mysql_real_escape_string($content, $db);

before the SQL injection and i stil receive the same error. Now here i change $html_part to $text_part and still the injection has a problem with any kind of quotes.

$content = preg_match('/Message: ([^"@]*)/s', $text_part, $matches) ? $matches[1] : '';

Result:

ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't work with humans as they can adapt and enter whatever is required, including ' at line 1
Edited by IanArcher
Link to comment
Share on other sites

mysql_real_escape_strin() returns escaped string which you need to store in some variable and later use that variable to get insert into db

$content=mysql_real_escape_string($content, $db);

Link to comment
Share on other sites

mysql_real_escape_strin() returns escaped string which you need to store in some variable and later use that variable to get insert into db
$content=mysql_real_escape_string($content, $db);

Perfect! That worked. I just didn't immediately think of re-declaring it as $content
Link to comment
Share on other sites

I must ask for learning purposes.Why does this snippet of code even with the 's' modifier not get multiple lines

$content = preg_match('/Message: ([^"@]*)/s', $html_part, $matches) ? $matches[1] : '';

But the code below does

$content = preg_match('/Message: (.*)/s', $html_part, $matches) ? $matches[1] : '';

The regex in the brackets are different, but what characters or other entities does the [^"@]* exclude

Edited by IanArcher
Link to comment
Share on other sites

That pattern says to match anything that is not a double quote or @ character. When the character class starts with ^, it means match anything except what is listed. So it will stop matching once it finds a double quote or @ sign.

Link to comment
Share on other sites

That pattern says to match anything that is not a double quote or @ character. When the character class starts with ^, it means match anything except what is listed. So it will stop matching once it finds a double quote or @ sign.
Oh thanks for letting me know..is there a regex manual equivalent to the php manual.I'm only aware of regular-expressions.info
Link to comment
Share on other sites

  • 2 weeks later...

I think it's time to follow your advice and find create a 'universal' image replacement.Because i have completed the tested the handling of images for Microsoft Outlook:

/***************************** 1st image in email**********************************/if (preg_match('/cid:([^"@]*).(png|jpg|jpeg|gif|bmp)@([^"]*)/', $html_part, $m)){ $find = '/cid:'.$m[1].'.'.$m[2].'@([^"]*)/'; if ($m[2] == 'png') $replace = $png1;if ($m[2] == 'jpg') $replace = $jpg1;if ($m[2] == 'gif') $replace = $gif1;if ($m[2] == 'bmp') $replace = $bmp1;if ($m[2] == 'jpeg') $replace = $jpeg1;$html_part = preg_replace($find, $replace, $html_part);} /***************************** 2nd image in email**********************************/						if (preg_match('/cid:([^"@]*).(png|jpg|jpeg|gif|bmp)@([^"]*)/', $html_part, $m)){ $find = '/cid:'.$m[1].'.'.$m[2].'@([^"]*)/'; if ($m[2] == 'png') $replace = $png2;if ($m[2] == 'jpg') $replace = $jpg2;if ($m[2] == 'gif') $replace = $gif2;if ($m[2] == 'bmp') $replace = $bmp2;if ($m[2] == 'jpeg') $replace = $jpeg2; $html_part = preg_replace($find, $replace, $html_part);}

But when it comes down to other email clients i've tested:Windows Live Mail:

src=3D"cid:813DEC0642E941F0845447B680DA566A@BrinardHP"

Hotmail:

src="https://*****.storage.live.com/y1pZinLnQnaoBClW=RMsQ5sAzEF1H4HGond-KoaAjdcEX0GCp9HzrWa2RSJwI9ngxR7WIub5M9Ps810/cloudsprite.=jpg"

iPhone's default mail app

src="cid:1D8E2297-F260-4555-8223-304A9DB08CC5/image.jpeg"

So instead of spending days possibly coding an image handler based on a specific email clients cid: references, your idea of looping from all the images in a directory is more feasible, but i truly have no idea where to start off with doing that. Any ideas?

Link to comment
Share on other sites

You'll need to do some tests by looping through the attachments in the email to see what information there is about each attachment, and look for how the CID value relates to the image information. The hotmail example is a link, that's not even an attachment.

Link to comment
Share on other sites

The full reference for that is actually:

<img style=3D"width:213px=3Bheight:213=px=3Bborder:0=3B" src="https://******.storage.live.com/y1pZinLnQnaoBClW=RMsQ5sAzEF1H4HGond-KoaAjdcEX0GCp9HzrWa2RSJwI9ngxR7WIub5M9Ps810/cloudsprite.=jpg">';

Is there a function in the manual that could help me loop through? I've looked for a while and haven't come across any.

Link to comment
Share on other sites

  • 3 weeks later...
You would use either a for loop or a foreach loop to go through the array of attachments. http://www.php.net/m...res.foreach.php
Ok so for example, the script list all images in the directory as an array. Now I have
foreach($imgmatch as $image){preg_replace('/src="(.*)"/', '/src="$image"/', $html_part)}

What concerns me is when i have images that are shrouded in <img> tags like this:

src=3D"cid:813DEC0642E941F0845447B680DA566A@BrinardHP"

Edited by IanArcher
Link to comment
Share on other sites

That loop is going to replace the contents of every "src" attribute with every image in the array. When it finishes, all "src" attributes will point to the last filename that was in the array, regardless of what they started as.

What concerns me is when i have images that are shrouded in <img> tags like this:
Like I said above, you'll need to do some testing. Save the original HTML email, get the list of information for all of the attachments, and examine the HTML and the attachment information to see how you link up the attachments. I would assume that the CID is given in the information for the attachments, but you need to verify that.
Link to comment
Share on other sites

That loop is going to replace the contents of every "src" attribute with every image in the array. When it finishes, all "src" attributes will point to the last filename that was in the array, regardless of what they started as. Like I said above, you'll need to do some testing. Save the original HTML email, get the list of information for all of the attachments, and examine the HTML and the attachment information to see how you link up the attachments. I would assume that the CID is given in the information for the attachments, but you need to verify that.
No CID won't always be the case, sometimes those <img> tags may actually carry the original filename of the picture. But i'll try analyzing between the attachments and the HTML.
Link to comment
Share on other sites

I followed your advice by seeing how you link up the attachments with the html body andIn the first for loop in your script you provided i included this:

    $fp = fopen('mimeheaders.txt', 'w');    $mimepart = imap_fetchmime($mbox, $no, 1);    //1st part    $mimepart .= imap_fetchmime($mbox, $no, 2);    //2nd part    $mimepart .= imap_fetchmime($mbox, $no, 3);    //3rd part    $mimepart .= imap_fetchmime($mbox, $no, 4);    //4th part    $mimepart .= imap_fetchmime($mbox, $no, 5);    //3rd part    $mimepart .= imap_fetchmime($mbox, $no, 6);    //4th part    $mimepart .= imap_fetchmime($mbox, $no, 7);    //4th part    $mimepart .= imap_fetchmime($mbox, $no, 8);    //3rd part    $mimepart .= imap_fetchmime($mbox, $no, 9);    //4th part    $mimepart .= imap_fetchmime($mbox, $no, 10);    //4th part    $mimepart .= imap_fetchmime($mbox, $no, 11);    //3rd part    $mimepart .= imap_fetchmime($mbox, $no, 12);    //4th part    fwrite($fp, $mimepart);    fclose($fp);    $rawmime = file_get_contents('mimeheaders.txt');

So in the mimeheaders.txt file, we have:

    Content-Type: multipart/alternative;boundary="----=_NextPart_001_0015_01CCD6A7.473AA550"Content-Type: image/gif; name="Image1[2].gif"Content-Transfer-Encoding: base64Content-ID: <813DEC0642E941F0845447B680DA566A@BrinardHP>Content-Type: image/gif; name="Image2[2].gif"Content-Transfer-Encoding: base64Content-ID: <6829CB8F2D5D43AAB115DE06572E770A@BrinardHP>

Now i wrote this to match all <img> tags in the email body:

 if (preg_match_all('/<img\s+([^>]*)src="(([^"]*))"\s+([^>]*)(.*?)>/i', $html_part, $srcMatch)) {    echo 'Matched image tag sources';    echo '<pre>';$srcArr =    array(                            $srcMatch[2][0],        //list up to 12 image matches                            $srcMatch[2][1],                            $srcMatch[2][2],                            $srcMatch[2][3],                            $srcMatch[2][4],                            $srcMatch[2][5],                            $srcMatch[2][6],                            $srcMatch[2][7],                            $srcMatch[2][8],                            $srcMatch[2][9],                            $srcMatch[2][10],                            $srcMatch[2][11],                            $srcMatch[2][12],    );    print_r($srcArr);    echo '</pre>';    } else {    echo 'No matches found for image tag';    }

I wrote this to match the filename/name:

//Here we are setting up real filename matches    // var $rawmime (See imapScript.php)    if(preg_match_all('/Content-Type:(.*);\s+name="(.*)"/', $rawmime, $nameAttr)){    echo 'Matches found';    echo '<pre>';$nameArr = array(                                $nameAttr[2][0],    //number of array items equiv. to source                                $nameAttr[2][1],                                $nameAttr[2][2],                                $nameAttr[2][3],                                $nameAttr[2][4],                                $nameAttr[2][5],                                $nameAttr[2][6],                                $nameAttr[2][7],                                $nameAttr[2][8],                                $nameAttr[2][9],                                $nameAttr[2][10],                                $nameAttr[2][11],                                $nameAttr[2][12],                                );                            print_r($nameArr);    echo '</pre>';    } else {    echo 'No match, Check your regular expression';    }

I tried something like:

if($srcMatch[2] == $nameAttr[2]){preg_replace($srcMatch, $nameAttr, $html_part);}

But i receive parameter errors, about string expections, and array given, So i would use an implode function to convert it to string, but even then, the preg_replace expects delimiters. But a bigger problems is i want to create a Scope that the script will know where to look when look for filename/name elements, instead of matching all of them at once. How can i do both of these? Or i there something i'm completely doing wrong?Because i just want to replace the name match with the correct img tag source match.

Link to comment
Share on other sites

The only error I see in the code is that you have an extra comma at the end of each array. Other than that, I would need to see the output from print_r for each array and the error messages that you're seeing. It looks like the Content-Id header for each attachment contains the CID that is used in the email body.

Link to comment
Share on other sites

This is how it appears:The first array is the matches from the <img> tag and the second it the matches from the name elements of the MIME headers And still there's that issue of creating a Scope.

Matched image tag sourcesArray(	[0] => cid:813DEC0642E941F0845447B680DA566A@BrinardHP	[1] => cid:6829CB8F2D5D43AAB115DE06572E770A@BrinardHP	[2] =>	[3] =>	[4] =>	[5] =>	[6] =>	[7] =>	[8] =>	[9] =>	[10] =>	[11] =>	[12] =>) Matches foundArray(	[0] => Image1[2].gif	[1] => Image2[2].gif	[2] =>	[3] =>	[4] =>	[5] =>	[6] =>	[7] =>	[8] =>	[9] =>	[10] =>	[11] =>	[12] =>) 

Edited by IanArcher
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...