destrugter Posted September 7, 2009 Share Posted September 7, 2009 Hi, it's been a while since I needed help with PHP. I came across a very helpful command called explode.For my purpose, I need to grab a webpage, use explode and split it up into strings based on the new line character, which is "\n". Now I got that far, and I got as far as getting the exact result I wanted, BUT it's far too slow. My page loads in 31 seconds using explode versus most of the other pages based on my databases of 0.0006 seconds.Does anyone know a faster alternative to doing this? Or is PHP just not ready to handle an external webpage of 300+ lines within seconds?If there is a faster alternative, please tell me. I'll post my current code. Function Find_Minimum($Item_Number){ $txt = file_get_contents('http://itemdb-rs.runescape.com/viewitem.ws?obj='. $Item_Number); $data = explode("\n", $txt); $ind = 1; foreach($data as $line) { set_time_limit(0); If($ind == 220) { $pattern = "<b>Minimum price:</b>"; $Item_Minimum_Price = Cut_String($pattern, $line); } set_time_limit(0); $ind++; } Return $Item_Minimum_Price;} Link to comment Share on other sites More sharing options...
Synook Posted September 7, 2009 Share Posted September 7, 2009 Most of the time is probably taken up by the fetching of the external page, not the actual explode routine. Especially if you are calling that function multiple times. Benchmark each section! Link to comment Share on other sites More sharing options...
Ingolme Posted September 7, 2009 Share Posted September 7, 2009 You can use file() instead of file_get_contents(), since it returns an array from the file, which is exactly what you're trying to do. Link to comment Share on other sites More sharing options...
destrugter Posted September 7, 2009 Author Share Posted September 7, 2009 I tried that, and in terms of easiness and code clean-up it works great. I tested it out, and the page measured to take 37.408 seconds to load.When I save a particular webpage as a mht file, where the "obj" instance is 2, the resulting file is 364KB on disc. So if it's having to sort through a 364kb file then spit out information, I can see why this would be causing the 30 second load times.I do not think it would be possible to get this to work under 30 seconds. To do so, I would have to make php skip the first couple hundred lines of the code, and then read that information into my variable.[EDIT]Here was my code after making it work with file() Function Find_Minimum($Item_Number){ $txt = file('http://itemdb-rs.runescape.com/viewitem.ws?obj='. $Item_Number); $pattern = "<b>Minimum price:</b> "; $Item_Minimum_Price = Cut_String($pattern, $txt[219]); Return $Item_Minimum_Price;} Link to comment Share on other sites More sharing options...
justsomeguy Posted September 8, 2009 Share Posted September 8, 2009 It takes milliseconds to explode several hundred KB of text, not seconds. Explode is not your problem. Since you have your page request in the function:$txt = file('http://itemdb-rs.runescape.com/viewitem.ws?obj='. $Item_Number);if you call that function 10 times to get 10 different things, it's going to send 10 requests to the server. That's where the time is coming from, waiting for the runescape.com server to return several different responses. Ideally you would only get the page once and then search through that for whatever you're looking for.Also, if they ever change the format of their page, even just adding one more line, it will break your code. It's better to use a regular expression for this. Link to comment Share on other sites More sharing options...
destrugter Posted September 9, 2009 Author Share Posted September 9, 2009 Yes, I understand this. For my example though, I am only using the function 1 time for 1 specific thing. I made this as a test to see the load times and everything, and so I grabbed 1 line from it. I understand that I could do better later by loading it once, but just for the testing purposes I made it only 1 part of the file.I understand that if they change it, that would mess up my code. I am prepared for this as well. Like I said, this was just a test to see if everything would check out fine, and the only problem so far is the load time. Link to comment Share on other sites More sharing options...
justsomeguy Posted September 9, 2009 Share Posted September 9, 2009 It's easy enough to add timers to see how long everything takes. The microtime function returns the current time in seconds and milliseconds. e.g.: Function Find_Minimum($Item_Number){ $start = microtime(true); $txt = file('http://itemdb-rs.runescape.com/viewitem.ws?obj='. $Item_Number); echo 'got file in ' . (microtime(true) - $start) . ' seconds<br>'; $pattern = "<b>Minimum price:</b> "; $start = microtime(true); $Item_Minimum_Price = Cut_String($pattern, $txt[219]); echo 'cut string in ' . (microtime(true) - $start) . ' seconds<br>'; Return $Item_Minimum_Price;} you can add stuff like that to the cut_string function to see how long various things there take also. Link to comment Share on other sites More sharing options...
destrugter Posted September 9, 2009 Author Share Posted September 9, 2009 The actual retrieving the file is what is making it so lengthy in time. When I go to that page, it doesn't take 30 seconds to load, so why would it take 30 seconds to load in this case? Link to comment Share on other sites More sharing options...
justsomeguy Posted September 9, 2009 Share Posted September 9, 2009 That would be a question for your host. If you have access, it would probably be a good idea to run some traceroutes or something like that from the server. Link to comment Share on other sites More sharing options...
destrugter Posted September 9, 2009 Author Share Posted September 9, 2009 If I substitute that site for say, a wiki site, it will do it near instantly and perform it multiple times. It has to be something with RuneScape not wanting people to do this, which I guess I can understand. Link to comment Share on other sites More sharing options...
justsomeguy Posted September 9, 2009 Share Posted September 9, 2009 Maybe so, they might have an API they want people to use instead of just scraping the HTML pages. They might also be checking the user agent string. You might be able to use something like the CURL library to send a request and specify the user agent so it thinks you're a regular browser instead of a bot. Link to comment Share on other sites More sharing options...
destrugter Posted September 9, 2009 Author Share Posted September 9, 2009 Wow, that's pretty clever.I have a question though, say I go through with looking at the cURL library to do as you are explaining. Is any part of what I'm doing illegal in any way? All I'm doing is connecting to the website, grabbing 3 pieces of information so that the people who visit my site have the most accurate information about that item possible. Link to comment Share on other sites More sharing options...
justsomeguy Posted September 9, 2009 Share Posted September 9, 2009 They may claim some copyright over the information you're gathering, but there's nothing inherently illegal about accessing online content. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.