real_illusions Posted February 3, 2011 Share Posted February 3, 2011 Hi all,Does anyone know of any scripts to gather up data from external sites and display it on the page?Very much like these SEO/Website gradings stuff, where they display things like page title, meta tags, images, whether they have alt attributes, inbound links and stuff like that.Is there an easy way of doing this or what? Can it be all PHP based or is some of it going to have to be cronjob stuff (which i dont have a clue about) or building a website robot to gather that info? (which i also dont have a clue about)... Link to comment Share on other sites More sharing options...
boen_robot Posted February 3, 2011 Share Posted February 3, 2011 A cronjob and a robot would be nice ways if you want to keep a local copy of the thing you're scraping and keep the local copies updated without user intervention.But for plain screen scraping, if the targeted site has valid (or almost valid) code, you can use DOMDocument::loadXML() (for well formed X(HT)ML files) or DOMDocument::loadHTMLFile() (for valid and almost valid HTML files). Once the file is loaded, you can use the rest of the DOM API to extract whatever you want from the page.If even that fails, the only way is to either engineer your own HTML parser or use regular expressions over the remote file's contents. Extracting the contents you want is a little harder in that case. Link to comment Share on other sites More sharing options...
birbal Posted February 3, 2011 Share Posted February 3, 2011 you can use file_get_contents to get and fetch pages from othersite.but you need to enable allow_url_fopen in ini config.its for a single page..probably you need some cron jobs to act like spiders to get pages in a manner from other site all the time. Link to comment Share on other sites More sharing options...
real_illusions Posted February 4, 2011 Author Share Posted February 4, 2011 Thanks. Looks like that part is relatively straight forward.Just the other parts associated with those scripts seem a little on the trickier side. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.