IanArcher Posted January 20, 2012 Author Share Posted January 20, 2012 (edited) I would still suggest using the regular expression I posted in posts 29 and 39 to convert all filenames to add your images folder to them. That's really all you need to do in terms of the regular expression. When you go through the list of attachments to save then you might not save all of the attachments and might end up with img elements that don't point to an actual file, but that's their fault for emailing a file that you don't support. It's another issue if you want to clean up the HTML afterwards to remove any img elements that don't link to actual files, but in general I would convert the filenames to something usable, save whatever attachments you think are worth saving, and that's it.Ok i'm going to try something and post back here. Also,$content = preg_match('/Message: (.*)/', $html_part, $matches) ? $matches[1] : ''; This only captures whats on the first line. I have set a certain amount of rules for the php script on the whole, so everything after Message: will be the body content for the news article. I want to get everything (paragrahps, images) after Message:.The regex next to it only gets whats on that line though. I modified it to this $content = preg_match('/Message: ([^"@]*)/', $html_part, $matches) ? $matches[1] : ''; But that only gets one paragraph. I need support for multiple paragraphs, line breaks and such. Could you show me a modified regex that will do this? Edited January 20, 2012 by IanArcher Link to comment Share on other sites More sharing options...
justsomeguy Posted January 23, 2012 Share Posted January 23, 2012 You can find that information in the manual, but it looks like php.net is having problems today. The s modifier will enable multi-line mode. preg_match('/Message: (.*)/s', $html_part, $matches) Link to comment Share on other sites More sharing options...
IanArcher Posted January 25, 2012 Author Share Posted January 25, 2012 From the script you gave me, do you know anyway i can see the entire source code of an email message?That shows all the X-Mailer, Return-parth, Delivery-date, X-Originating-IP, and so forth? I researcehd through most of the imap functions in the manual and couldn't find anything that could. Link to comment Share on other sites More sharing options...
justsomeguy Posted January 25, 2012 Share Posted January 25, 2012 In the code I posted in post 12, the $raw_headers variable contains all of the headers from the message. It uses the imap_fetchheader function. The $original_message variable contains the entire unprocessed body of the email. The original email as received by the mail server is the headers, then an empty line, then the body. Link to comment Share on other sites More sharing options...
IanArcher Posted January 27, 2012 Author Share Posted January 27, 2012 In the code I posted in post 12, the $raw_headers variable contains all of the headers from the message. It uses the imap_fetchheader function. The $original_message variable contains the entire unprocessed body of the email. The original email as received by the mail server is the headers, then an empty line, then the body.Ok thanks, i got it done.Also this is a snippet of code from my script. Could you possibly tell me why whenever i run it to insert into the table, i get this error with a test email with just text and paragraphs:ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'font-size:12.0pt'>Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse lib' at line 1I looked at the source formatting and saw that the style for this paragraph is shown as style='font-size:12.0pt'> instead of with double-quotes. That should be the issue right? function newsupdate ($text_part, $html_part){//Login to MySQL Datebase$hostname = "localhost";$db_user = "user";$db_password = "password";$database = "database";$db_table = "table";$db = mysql_connect($hostname, $db_user, $db_password);mysql_select_db($database,$db);//These functions searches the open e-mail for the the prefix defining strings.//Need a function to search after the space after the strings because the subject, categories, snippet, tags and message are constant-changing.$subject = preg_match('/Title: (.*)/', $text_part, $matches) ? $matches[1] : '';$subject = str_replace("Title: ", "" ,$subject); $categories = preg_match('/Categories: (.*)/', $text_part, $matches) ? $matches[1] : '';$categories = str_replace("Categories: ", "" ,$categories); $tags = preg_match('/Tags: (.*)/', $text_part, $matches) ? $matches[1] : ''; $tags = str_replace("Tags: ", "" ,$tags); $snippet = preg_match('/Snippet: (.*)/', $text_part, $matches) ? $matches[1] : ''; $snippet = str_replace("Snippet: ", "" ,$snippet); // removes the snippet prefix$content = preg_match('/Message: ([^"@]*)/s', $html_part, $matches) ? $matches[1] : ''; $content = str_replace("Message: ", "" ,$content); $when = strtotime("now"); $uri = strtolower(str_replace(" ", "-", $subject)); $uri = substr($uri, 0, 20); /*** THIS CODE WILL TELL MYSQL TO INSERT THE DATA FROM THE EMAIL INTO YOUR MYSQL TABLE ***/$sql = "INSERT INTO $db_table(`caption`,`snippet`,`content`,`when`,`uri`,`tags`,`categories`,`DATE`) values ('$subject','$snippet','$content','$when','$uri','$tags','$categories','$when')";if($result = mysql_query($sql ,$db)) {} else {echo "ERROR: ".mysql_error();}//echo "<h1>News Article added!</h1>"; //uncomment for testing purposes} //end defining the function NewsUpdate Link to comment Share on other sites More sharing options...
birbal Posted January 27, 2012 Share Posted January 27, 2012 (edited) the query is being broken for quotes you need to escape the inputs before its geting inserted into db using mysql_real_escape_string() Edited January 27, 2012 by birbal Link to comment Share on other sites More sharing options...
IanArcher Posted January 27, 2012 Author Share Posted January 27, 2012 (edited) I used mysql_real_escape_string($content, $db); before the SQL injection and i stil receive the same error. Now here i change $html_part to $text_part and still the injection has a problem with any kind of quotes. $content = preg_match('/Message: ([^"@]*)/s', $text_part, $matches) ? $matches[1] : ''; Result: ERROR: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 't work with humans as they can adapt and enter whatever is required, including ' at line 1 Edited January 27, 2012 by IanArcher Link to comment Share on other sites More sharing options...
birbal Posted January 27, 2012 Share Posted January 27, 2012 mysql_real_escape_strin() returns escaped string which you need to store in some variable and later use that variable to get insert into db $content=mysql_real_escape_string($content, $db); Link to comment Share on other sites More sharing options...
IanArcher Posted January 27, 2012 Author Share Posted January 27, 2012 mysql_real_escape_strin() returns escaped string which you need to store in some variable and later use that variable to get insert into db$content=mysql_real_escape_string($content, $db); Perfect! That worked. I just didn't immediately think of re-declaring it as $content Link to comment Share on other sites More sharing options...
IanArcher Posted January 29, 2012 Author Share Posted January 29, 2012 (edited) I must ask for learning purposes.Why does this snippet of code even with the 's' modifier not get multiple lines $content = preg_match('/Message: ([^"@]*)/s', $html_part, $matches) ? $matches[1] : ''; But the code below does $content = preg_match('/Message: (.*)/s', $html_part, $matches) ? $matches[1] : ''; The regex in the brackets are different, but what characters or other entities does the [^"@]* exclude Edited January 29, 2012 by IanArcher Link to comment Share on other sites More sharing options...
justsomeguy Posted January 30, 2012 Share Posted January 30, 2012 That pattern says to match anything that is not a double quote or @ character. When the character class starts with ^, it means match anything except what is listed. So it will stop matching once it finds a double quote or @ sign. Link to comment Share on other sites More sharing options...
IanArcher Posted January 30, 2012 Author Share Posted January 30, 2012 That pattern says to match anything that is not a double quote or @ character. When the character class starts with ^, it means match anything except what is listed. So it will stop matching once it finds a double quote or @ sign.Oh thanks for letting me know..is there a regex manual equivalent to the php manual.I'm only aware of regular-expressions.info Link to comment Share on other sites More sharing options...
justsomeguy Posted January 30, 2012 Share Posted January 30, 2012 There are several regular expression references out there, many of them focus on language-specific versions. Link to comment Share on other sites More sharing options...
IanArcher Posted January 31, 2012 Author Share Posted January 31, 2012 I see. Link to comment Share on other sites More sharing options...
IanArcher Posted February 10, 2012 Author Share Posted February 10, 2012 I think it's time to follow your advice and find create a 'universal' image replacement.Because i have completed the tested the handling of images for Microsoft Outlook: /***************************** 1st image in email**********************************/if (preg_match('/cid:([^"@]*).(png|jpg|jpeg|gif|bmp)@([^"]*)/', $html_part, $m)){ $find = '/cid:'.$m[1].'.'.$m[2].'@([^"]*)/'; if ($m[2] == 'png') $replace = $png1;if ($m[2] == 'jpg') $replace = $jpg1;if ($m[2] == 'gif') $replace = $gif1;if ($m[2] == 'bmp') $replace = $bmp1;if ($m[2] == 'jpeg') $replace = $jpeg1;$html_part = preg_replace($find, $replace, $html_part);} /***************************** 2nd image in email**********************************/ if (preg_match('/cid:([^"@]*).(png|jpg|jpeg|gif|bmp)@([^"]*)/', $html_part, $m)){ $find = '/cid:'.$m[1].'.'.$m[2].'@([^"]*)/'; if ($m[2] == 'png') $replace = $png2;if ($m[2] == 'jpg') $replace = $jpg2;if ($m[2] == 'gif') $replace = $gif2;if ($m[2] == 'bmp') $replace = $bmp2;if ($m[2] == 'jpeg') $replace = $jpeg2; $html_part = preg_replace($find, $replace, $html_part);} But when it comes down to other email clients i've tested:Windows Live Mail: src=3D"cid:813DEC0642E941F0845447B680DA566A@BrinardHP" Hotmail: src="https://*****.storage.live.com/y1pZinLnQnaoBClW=RMsQ5sAzEF1H4HGond-KoaAjdcEX0GCp9HzrWa2RSJwI9ngxR7WIub5M9Ps810/cloudsprite.=jpg" iPhone's default mail app src="cid:1D8E2297-F260-4555-8223-304A9DB08CC5/image.jpeg" So instead of spending days possibly coding an image handler based on a specific email clients cid: references, your idea of looping from all the images in a directory is more feasible, but i truly have no idea where to start off with doing that. Any ideas? Link to comment Share on other sites More sharing options...
justsomeguy Posted February 10, 2012 Share Posted February 10, 2012 You'll need to do some tests by looping through the attachments in the email to see what information there is about each attachment, and look for how the CID value relates to the image information. The hotmail example is a link, that's not even an attachment. Link to comment Share on other sites More sharing options...
IanArcher Posted February 10, 2012 Author Share Posted February 10, 2012 The full reference for that is actually: <img style=3D"width:213px=3Bheight:213=px=3Bborder:0=3B" src="https://******.storage.live.com/y1pZinLnQnaoBClW=RMsQ5sAzEF1H4HGond-KoaAjdcEX0GCp9HzrWa2RSJwI9ngxR7WIub5M9Ps810/cloudsprite.=jpg">'; Is there a function in the manual that could help me loop through? I've looked for a while and haven't come across any. Link to comment Share on other sites More sharing options...
justsomeguy Posted February 10, 2012 Share Posted February 10, 2012 You would use either a for loop or a foreach loop to go through the array of attachments. http://www.php.net/manual/en/control-structures.foreach.php Link to comment Share on other sites More sharing options...
IanArcher Posted February 29, 2012 Author Share Posted February 29, 2012 (edited) You would use either a for loop or a foreach loop to go through the array of attachments. http://www.php.net/m...res.foreach.php Ok so for example, the script list all images in the directory as an array. Now I haveforeach($imgmatch as $image){preg_replace('/src="(.*)"/', '/src="$image"/', $html_part)} What concerns me is when i have images that are shrouded in <img> tags like this: src=3D"cid:813DEC0642E941F0845447B680DA566A@BrinardHP" Edited February 29, 2012 by IanArcher Link to comment Share on other sites More sharing options...
justsomeguy Posted February 29, 2012 Share Posted February 29, 2012 That loop is going to replace the contents of every "src" attribute with every image in the array. When it finishes, all "src" attributes will point to the last filename that was in the array, regardless of what they started as. What concerns me is when i have images that are shrouded in <img> tags like this:Like I said above, you'll need to do some testing. Save the original HTML email, get the list of information for all of the attachments, and examine the HTML and the attachment information to see how you link up the attachments. I would assume that the CID is given in the information for the attachments, but you need to verify that. Link to comment Share on other sites More sharing options...
IanArcher Posted February 29, 2012 Author Share Posted February 29, 2012 That loop is going to replace the contents of every "src" attribute with every image in the array. When it finishes, all "src" attributes will point to the last filename that was in the array, regardless of what they started as. Like I said above, you'll need to do some testing. Save the original HTML email, get the list of information for all of the attachments, and examine the HTML and the attachment information to see how you link up the attachments. I would assume that the CID is given in the information for the attachments, but you need to verify that.No CID won't always be the case, sometimes those <img> tags may actually carry the original filename of the picture. But i'll try analyzing between the attachments and the HTML. Link to comment Share on other sites More sharing options...
astralaaron Posted March 4, 2012 Share Posted March 4, 2012 wow that is a great code thanks for posting it JSG 1 Link to comment Share on other sites More sharing options...
IanArcher Posted March 8, 2012 Author Share Posted March 8, 2012 I followed your advice by seeing how you link up the attachments with the html body andIn the first for loop in your script you provided i included this: $fp = fopen('mimeheaders.txt', 'w'); $mimepart = imap_fetchmime($mbox, $no, 1); //1st part $mimepart .= imap_fetchmime($mbox, $no, 2); //2nd part $mimepart .= imap_fetchmime($mbox, $no, 3); //3rd part $mimepart .= imap_fetchmime($mbox, $no, 4); //4th part $mimepart .= imap_fetchmime($mbox, $no, 5); //3rd part $mimepart .= imap_fetchmime($mbox, $no, 6); //4th part $mimepart .= imap_fetchmime($mbox, $no, 7); //4th part $mimepart .= imap_fetchmime($mbox, $no, 8); //3rd part $mimepart .= imap_fetchmime($mbox, $no, 9); //4th part $mimepart .= imap_fetchmime($mbox, $no, 10); //4th part $mimepart .= imap_fetchmime($mbox, $no, 11); //3rd part $mimepart .= imap_fetchmime($mbox, $no, 12); //4th part fwrite($fp, $mimepart); fclose($fp); $rawmime = file_get_contents('mimeheaders.txt'); So in the mimeheaders.txt file, we have: Content-Type: multipart/alternative;boundary="----=_NextPart_001_0015_01CCD6A7.473AA550"Content-Type: image/gif; name="Image1[2].gif"Content-Transfer-Encoding: base64Content-ID: <813DEC0642E941F0845447B680DA566A@BrinardHP>Content-Type: image/gif; name="Image2[2].gif"Content-Transfer-Encoding: base64Content-ID: <6829CB8F2D5D43AAB115DE06572E770A@BrinardHP> Now i wrote this to match all <img> tags in the email body: if (preg_match_all('/<img\s+([^>]*)src="(([^"]*))"\s+([^>]*)(.*?)>/i', $html_part, $srcMatch)) { echo 'Matched image tag sources'; echo '<pre>';$srcArr = array( $srcMatch[2][0], //list up to 12 image matches $srcMatch[2][1], $srcMatch[2][2], $srcMatch[2][3], $srcMatch[2][4], $srcMatch[2][5], $srcMatch[2][6], $srcMatch[2][7], $srcMatch[2][8], $srcMatch[2][9], $srcMatch[2][10], $srcMatch[2][11], $srcMatch[2][12], ); print_r($srcArr); echo '</pre>'; } else { echo 'No matches found for image tag'; } I wrote this to match the filename/name: //Here we are setting up real filename matches // var $rawmime (See imapScript.php) if(preg_match_all('/Content-Type:(.*);\s+name="(.*)"/', $rawmime, $nameAttr)){ echo 'Matches found'; echo '<pre>';$nameArr = array( $nameAttr[2][0], //number of array items equiv. to source $nameAttr[2][1], $nameAttr[2][2], $nameAttr[2][3], $nameAttr[2][4], $nameAttr[2][5], $nameAttr[2][6], $nameAttr[2][7], $nameAttr[2][8], $nameAttr[2][9], $nameAttr[2][10], $nameAttr[2][11], $nameAttr[2][12], ); print_r($nameArr); echo '</pre>'; } else { echo 'No match, Check your regular expression'; } I tried something like: if($srcMatch[2] == $nameAttr[2]){preg_replace($srcMatch, $nameAttr, $html_part);} But i receive parameter errors, about string expections, and array given, So i would use an implode function to convert it to string, but even then, the preg_replace expects delimiters. But a bigger problems is i want to create a Scope that the script will know where to look when look for filename/name elements, instead of matching all of them at once. How can i do both of these? Or i there something i'm completely doing wrong?Because i just want to replace the name match with the correct img tag source match. Link to comment Share on other sites More sharing options...
justsomeguy Posted March 8, 2012 Share Posted March 8, 2012 The only error I see in the code is that you have an extra comma at the end of each array. Other than that, I would need to see the output from print_r for each array and the error messages that you're seeing. It looks like the Content-Id header for each attachment contains the CID that is used in the email body. Link to comment Share on other sites More sharing options...
IanArcher Posted March 8, 2012 Author Share Posted March 8, 2012 (edited) This is how it appears:The first array is the matches from the <img> tag and the second it the matches from the name elements of the MIME headers And still there's that issue of creating a Scope. Matched image tag sourcesArray( [0] => cid:813DEC0642E941F0845447B680DA566A@BrinardHP [1] => cid:6829CB8F2D5D43AAB115DE06572E770A@BrinardHP [2] => [3] => [4] => [5] => [6] => [7] => [8] => [9] => [10] => [11] => [12] =>) Matches foundArray( [0] => Image1[2].gif [1] => Image2[2].gif [2] => [3] => [4] => [5] => [6] => [7] => [8] => [9] => [10] => [11] => [12] =>) Edited March 8, 2012 by IanArcher Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now