Jump to content

Get Contents From E-Mail And Write It To Mysql Table


IanArcher

Recommended Posts

Your script works and it gets the the images as attachements and saves it to the folder "images"You were completely right, Outlook and roundcube was showing these images as just that and not attachments to the email. Now what im trying to do is replace every instance of src="cid:image001.png@ with src="images/image001.png"I use string replace but it keeps bringing back the same "cid" string.

       	        	 // Call if functions into play here, (remember .gif, .png, .jpg, .bmp)            // for .png images                $png1 = "images/image001.png";                $png2 = "images/image002.png";                $png3 = "images/image003.png";                $png4 = "images/image004.png";                $png5 = "images/image005.png";                        // for .jpg images                $jpg1 = "images/image001.jpg";                $jpg2 = "images/image002.jpg";                $jpg3 = "images/image003.jpg";                $jpg4 = "images/image004.jpg";                $jpg5 = "images/image005.jpg";                        // for .gif images                $gif1 = "images/image001.gif";                $gif2 = "images/image002.gif";                $gif3 = "images/image003.gif";                $gif4 = "images/image004.gif";                $gif5 = "images/image005.gif";                            // for .bmp images                    $bmp1 = "images/image001.bmp";                $bmp2 = "images/image002.bmp";                $bmp3 = "images/image003.bmp";                $bmp4 = "images/image004.bmp";                $bmp5 = "images/image005.bmp"; str_replace('src="cid:image001.png@', 'src="$png1"', $html_part);                str_replace('src="cid:image002.jpg@', 'src="$jpg2"', $html_part);                    str_replace('src="cid:image003.png@', 'src="$jpg3"', $html_part);

There's also that problem where random integers and words are generated after the "@"for example:src="cid:image001.png@01CCC701.5A896430"but that random text isn't static, it's dynamic, so everytime this is run (or i should say whenever the email is sent), a new set oftext is placed there.Any suggestions on this?

Link to comment
Share on other sites

Thanks for that snippet.I have to ask for learning purposes, what is the difference between the regex you posted, and the one i planned on using?

"/src="cid:(.*).png@(.*)/""

Link to comment
Share on other sites

That pattern may match more than what you intend. If the markup has this: <img src="cid:image1.png@..."><img src="cid:image2.png@..."> Then your entire pattern will match this: src="cid:image1.png@..."><img src="cid:image2.png@..." And the first subpattern will match this: image1.png@..."><img src="cid:image2 Also, you would need to escape the dot before "png", the dot has a special meaning. You need to escape it if you want to match a literal dot character. With the pattern I wrote, it also doesn't matter what the extension is. You don't need a different pattern for each extension.

Link to comment
Share on other sites

Ok thanks for clearing that up for me. I don't want it to seem as if i run here for every little problem or issue to be spoon fed, i do tremendous amounts of research and work on my own for this, but this is something i really need help with. If a .png is the first found when the script reaches to this line, then that means the email has a .png image first, and will be named automatically be email laws? as image001.png If found .jpg is found first then that means the first image will be named image001.jpgEven if the first image is a .png (it will be generated as image001.png) and the next image found would be a .jpg, it would be named image002.jpg, because the email doesnt list emails by file extensions, but by the amount of images in the email, which is why a number sequence is appended to the image file in the email. What if the first image is a .png, then i want it to be image001.png. But the next is a .bmp then i want it to be image002.bmp and not image002.png. But what if it is a .gif for the third? i want it to be image001.gif and not image001.png.What if the second image was a .png? Then i want the script to know to let it be image002.png.And so forth in that sequence of possibilities.At max this will support up to 12 images for the entire process off updating to the MySQL table. What is the process of doing this?Also the filename extensions have to be specified in the REGEX so the if functions would know how to come into play. I really need an example of what i believe would mainly be based around if functions, but also some guiding in where to go for a solution in the manual?

#    Haystack =                         // Call if functions into play here, (remember .gif, .png, .jpg, .bmp)            // for .png images                $png1 = 'src="images/image001.png"';                $png2 = 'src="images/image002.png"';                $png3 = 'src="images/image003.png"';                $png4 = 'src="images/image004.png"';                $png5 = 'src="images/image005.png"';                $png6 = 'src="images/image006.png"';                $png7 = 'src="images/image007.png"';                $png8 = 'src="images/image008.png"';                $png9 = 'src="images/image009.png"';                $png10 = 'src="images/image010.png"';                $png11 = 'src="images/image011.png"';                $png12 = 'src="images/image012.png"';                            // for .jpg images                $jpg1 = 'src="images/image001.jpg"';                $jpg2 = 'src="images/image002.jpg"';                $jpg3 = 'src="images/image003.jpg"';                $jpg4 = 'src="images/image004.jpg"';                $jpg5 = 'src="images/image005.jpg"';                $jpg6 = 'src="images/image006.jpg"';                $jpg7 = 'src="images/image007.jpg"';                $jpg8 = 'src="images/image008.jpg"';                $jpg9 = 'src="images/image009.jpg"';                $jpg10 = 'src="images/image010.jpg"';                $jpg11 = 'src="images/image011.jpg"';                $jpg12 = 'src="images/image012.jpg"';                            // for .gif images                $gif1 = 'src="images/image001.gif"';                $gif2 = 'src="images/image002.gif"';                $gif3 = 'src="images/image003.gif"';                $gif4 = 'src="images/image004.gif"';                $gif5 = 'src="images/image005.gif"';                $gif6 = 'src="images/image006.gif"';                $gif7 = 'src="images/image007.gif"';                $gif8 = 'src="images/image008.gif"';                $gif9 = 'src="images/image009.gif"';                $gif10 = 'src="images/image010.gif"';                $gif11 = 'src="images/image011.gif"';                $gif12 = 'src="images/image012.gif"';                            // for .bmp images                    $bmp1 = 'src="images/image001.bmp"';                $bmp2 = 'src="images/image002.bmp"';                $bmp3 = 'src="images/image003.bmp"';                $bmp4 = 'src="images/image004.bmp"';                $bmp5 = 'src="images/image005.bmp"';                $bmp6 = 'src="images/image006.bmp"';                $bmp7 = 'src="images/image007.bmp"';                $bmp8 = 'src="images/image008.bmp"';                $bmp9 = 'src="images/image009.bmp"';                $bmp10 = 'src="images/image010.bmp"';                $bmp11 = 'src="images/image011.bmp"';                $bmp12 = 'src="images/image012.bmp"';                                # End of Haystack                        ///////////// First image in email            if first_occurence is .png get $png1 needle from $haystack (                            $find = '#src="cid:([^"@]*)@([^"]*)"#';     //this needs to be modified to look for .png specifically                            $replace = $png1                            );                        else                    first_occurence is .jpg? then get $jpg1 (                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .jpg specically                                    $replace = $jpg1                                    );                        else                    first_occurence is .gif ? (                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .gif specically                                    $replace = $gif1                                    );                        else                    first_occurence is .bmp ? (                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .bmp specically                                    $replace = $bmp1                                    );//////////////// Second image in email                        if second_occurence is .png get $png2 needle from haystack (                            $find = '#src="cid:([^"@]*)@([^"]*)"#';     //this needs to be modified to look for .png specifically                            $replace = $png2                            );                        else                    second_occurence is .jpg? then get $jpg2 needle from haystack (                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .jpg specically                                    $replace = $jpg2                                    );                        else                    second_occurence is .gif ? then get $gif2 needle from haystack(                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .gif specically                                    $replace = $gif2                                    );                        else                    second_occurence is .bmp ? then get $bmp2 needle from haystack(                                    $find = '#src="cid:([^"@]*)@([^"]*)"#';        //this needs to be modified to look for .bmp specically                                    $replace = $bmp2                                    );    //AND SO FORTH FOR 12 OCCURENCES                                //Performs the matches            $html_part = preg_replace($find, $replace, $html_part);

Edited by IanArcher
Link to comment
Share on other sites

I think I'm a little confused. The client that sends the email will typically name the images like that automatically, so they might not all have the names you're looking for depending on the email client. Are you saying you want to rename the images? I don't think it's wise to automatically assume that the first <img> tag that shows up in the HTML markup corresponds to the first attachment. The attachments can be listed in any order. The filenames, not the order in which they appear, are there to distinguish them. I was working under the assumption that you want to save any attachments in the email regardless of the filenames, so that's why I don't have the filenames in my regular expression patterns. If you want to capture the extension of the filename you can do that also, I don't see any reason why you need all of this duplicate code with different patterns and replacements. You should be able to get all of the information you need by matching the filename in the img tag and looking up the filename in the list of attachments. If you want to figure out what the extension is then you can get the extension when you get the filename. Maybe I'm not understanding your requirements, but I'm not sure what the concern is with using filenames in the form "image###" and distinguishing between extensions, or the order that they show in the HTML markup.

Link to comment
Share on other sites

Ok i'll explain further.Over time this method will be used more and more for posting New news articles, every time a new email with new images are sent the client, like you said, will automatically name them in that sequence, so eventual when it comes time to posting, those images in that same folder from a previous article with be overwritten. And if someone goes back to look at a previous article, the images in that article will be gone and overwritten with the new ones causing them to be confused. Which is why i implemented this series of functions, so many like that. Using this method, i put out a rule. Only a maximum of 12 images will be allowed to be inserted. Which is where

    //AND SO FORTH FOR 12 OCCURENCES

comes from. In my head, i believe PHP reads and processes each script on each line accordingly unless it is a reference to a variable or function before and after that line. Just as it would read and process each <img> in the email body. My idea was using mkdir to create a new directory to hold the images for that specific article based on the subjects name.

//These functions searches the open e-mail for the the prefix defining strings. //Need a function to search after the space after the strings because the subject, categories, snippet, tags and message are constant-changing. $subject = preg_match('/Title: (.*)/', $text_part, $matches) ? $matches[1] : '';  $stripped = strtolower(ereg_replace("[^A-Za-z0-9 ]", "", $subject));    //removes special characters from article/folder name    $subject = str_replace("Title: ", "" ,$subject);                                                          // Creating a directory to hold the images for this article                           $subject = strtolower(str_replace(" ", "_", $stripped));	 // builds a directory name based on the article's name            $dirname = $subject;            $dir = mkdir($dirname, 0777);               // creates the directory

Though with mkdir i get a "No such file or directory" error message. Did that clear it up for you?

Link to comment
Share on other sites

mkdir needs the full path, not just a name. I'm still not clear why you need the images to all be named "image###", or why the file extension matters. It seems like an easy task to save all images in a directory of your choice, regardless of how many there are, and use a single regular expression to handle all of them. I'm not sure why it needs to be more complex than that. The only thing I would add would be a filter to make sure you don't save files with certain extensions that might compromise the server, like a PHP file. Any time I see a bunch of duplicated code like what you posted in post 32 that's an indication that the design probably isn't correct. You shouldn't need to duplicate code like that. If you need to limit how many images there are then I'm not real sure how you can do that other than by finding the first 12 img tags, figuring out which files they reference, and saving those files from the list of attachments but not the other files. You would also need to remove the img tags that point to files that didn't get saved.

Link to comment
Share on other sites

These are very simple articles though, with just bold, italicized, underlined text and images. It's not my choice for all the images to be named image###, by default email clients name all images inserted into the emails in that sequence. For example, i just inserted 4 images into an email and sent them to myself and here is the source of the email. Now before i sent it, these images had other filenames.

<img width=960 height=612 id="Picture_x0020_1" src="cid:image001.jpg@01CCD11C.6AFFF2F0">

<img width=1126 height=2048 id="Picture_x0020_2" src="cid:image002.jpg@01CCD11C.6AFFF2F0">

<img width=1052 height=2048 id="Picture_x0020_3" src="cid:image003.jpg@01CCD11C.6AFFF2F0">

<img width=903 height=2048 id="Picture_x0020_4" src="cid:image004.jpg@01CCD11C.6AFFF2F0">

<img width=500 height=419 id="Picture_x0020_5" src="cid:image005.jpg@01CCD11C.6AFFF2F0">

You see what i'm getting at?

Edited by IanArcher
Link to comment
Share on other sites

It's not my choice for all the images to be named image###, by default email clients name all images inserted into the emails in that sequence.
SOME email clients do that. Don't assume that they all do.
You see what i'm getting at?
Do you see what I'm getting at? Why limit yourself to working with only certain filenames when you can use regular expressions that don't care what the files are called? Why limit yourself?
Link to comment
Share on other sites

Ok i see where your coming from. See i made that conclusion from the email clients i used, though when i tried to reserach it, their isn't much help articles or documentation at all really, that tells you all about email and their doings 'under the hood'. What regular expresion can i use to do so? Including the filename extensions?

Link to comment
Share on other sites

That's what this will do:

$find = '#src="cid:([^"@]*)@([^"]*)"#';$replace = 'src="images/$1"';$html_part = preg_replace($find, $replace, $html_part);

If the markup has this: <img src="cid:some_file.ext@01CCD11C.6AFFF2F0"> then it will be replaced to this: <img src="images/some_file.ext"> Presumably, there would be an entry in the attachments array with the filename "some_file.ext" that you would save to the images folder.

Link to comment
Share on other sites

That's what this will do:
$find = '#src="cid:([^"@]*)@([^"]*)"#';$replace = 'src="images/$1"';$html_part = preg_replace($find, $replace, $html_part);

If the markup has this: <img src="cid:some_file.ext@01CCD11C.6AFFF2F0"> then it will be replaced to this: <img src="images/some_file.ext"> Presumably, there would be an entry in the attachments array with the filename "some_file.ext" that you would save to the images folder.

Thanks for that, though how would the script be able to differentiate between the different image filetypes that could be inserted into the email? (.gif, .png, .jpg, .bmp) then saved to the images folder. Edited by IanArcher
Link to comment
Share on other sites

I should've mentioned early on, this script would be used solely for Cron jobs to periodically update the news feed whenever an email is sent and to check at a given date and time.That is why i have alot of things and goals to run these automatically without much manual editing except for the initial setup. Does this change anything i should know about?

Edited by IanArcher
Link to comment
Share on other sites

No, I run my script that checks emails automatically also. Certain things in PHP change when you run from a command line though, like the working directory. Make sure to use absolute paths for all of your filenames, even included files.

Link to comment
Share on other sites

The regular expression doesn't do that, but when you're going through the list of attachments you can check each extension and only save the ones you want to.
But now knowing what i told you about me using Cron jobs, wouldn't this still come into conflict with it? I mean how nothing will really be done manually, so i can't go through the list of attachments. You said the regex could be modified to look for specific filename extensions? I believe you posted an example of it on the previous page though it was an exact copy of what you gave me before.
Link to comment
Share on other sites

You wouldn't go through the list of attachments manually anyway, you would write a loop to loop through all of them in PHP and check each filename to decide if you want to save that file based on the extension. I'm just assuming a certain set of rules, you need to define the actual rules that you want your code to check for yourself. You can change the filename part of the pattern so that it explicitly looks for the dot and a specific extension if you want to add that requirement for the regex.

Link to comment
Share on other sites

You wouldn't go through the list of attachments manually anyway, you would write a loop to loop through all of them in PHP and check each filename to decide if you want to save that file based on the extension. I'm just assuming a certain set of rules, you need to define the actual rules that you want your code to check for yourself. You can change the filename part of the pattern so that it explicitly looks for the dot and a specific extension if you want to add that requirement for the regex.
How can i modifiy the regex to look for the specific filenames? I really don't know that much about editing it.
#src="cid:([^"@]*)@([^"]*)"#

EDIT:Well i edited this way

#src="cid:([^"@]*).jpg@([^"]*)"#

and it worked in a test preg_replace function i did. Just to be sure, is that the correct way? Should i have any reserves or precautions about just adding the filename extension smack in the middle like that?

Edited by IanArcher
Link to comment
Share on other sites

I found out you were also right about the filename extensions comment. I assumed that all email clients name their images like Outlook does "image001.png" When i tested it with Windows Live Mail, they named their images with "Image1[2].gif", "Image2[2].gif". I'd also have to possibly try G-mail and Yahoo as well. What do you recommend so i would have to avoid building a series of if statements that looks for the X-Mailer: portion to know how to handle the images in that email?

Link to comment
Share on other sites

Just to be sure, is that the correct way? Should i have any reserves or precautions about just adding the filename extension smack in the middle like that?
That pattern will only match jpg images, if that's what you want it to do, but you would need to escape the period so that it looks for a literal period instead of any character.
What do you recommend so i would have to avoid building a series of if statements that looks for the X-Mailer: portion to know how to handle the images in that email?
I would recommend allowing all filenames and all image extensions (that you want to support) instead of expecting anything to be in a certain format. It's not feasible to try and build a list of all possible email clients and what rules they follow. Some email clients may not rename the images at all, so you have to assume that the filenames could literally be any valid filename.
Link to comment
Share on other sites

That pattern will only match jpg images, if that's what you want it to do, but you would need to escape the period so that it looks for a literal period instead of any character. I would recommend allowing all filenames and all image extensions (that you want to support) instead of expecting anything to be in a certain format. It's not feasible to try and build a list of all possible email clients and what rules they follow. Some email clients may not rename the images at all, so you have to assume that the filenames could literally be any valid filename.
Could you point me in the right direction (i.e. php.net manuals) or can could you possibly draw me up some example code of what you are suggesting so i could do this? To avoid writing a setting of functions to look for based on an email client's specific image sequence naming protocol.
Link to comment
Share on other sites

I would still suggest using the regular expression I posted in posts 29 and 39 to convert all filenames to add your images folder to them. That's really all you need to do in terms of the regular expression. When you go through the list of attachments to save then you might not save all of the attachments and might end up with img elements that don't point to an actual file, but that's their fault for emailing a file that you don't support. It's another issue if you want to clean up the HTML afterwards to remove any img elements that don't link to actual files, but in general I would convert the filenames to something usable, save whatever attachments you think are worth saving, and that's it.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...