Jump to content

website scraping kinda stuff


real_illusions

Recommended Posts

Hi all,Does anyone know of any scripts to gather up data from external sites and display it on the page?Very much like these SEO/Website gradings stuff, where they display things like page title, meta tags, images, whether they have alt attributes, inbound links and stuff like that.Is there an easy way of doing this or what? Can it be all PHP based or is some of it going to have to be cronjob stuff (which i dont have a clue about) or building a website robot to gather that info? (which i also dont have a clue about)...

Link to comment
Share on other sites

A cronjob and a robot would be nice ways if you want to keep a local copy of the thing you're scraping and keep the local copies updated without user intervention.But for plain screen scraping, if the targeted site has valid (or almost valid) code, you can use DOMDocument::loadXML() (for well formed X(HT)ML files) or DOMDocument::loadHTMLFile() (for valid and almost valid HTML files). Once the file is loaded, you can use the rest of the DOM API to extract whatever you want from the page.If even that fails, the only way is to either engineer your own HTML parser or use regular expressions over the remote file's contents. Extracting the contents you want is a little harder in that case.

Link to comment
Share on other sites

you can use file_get_contents to get and fetch pages from othersite.but you need to enable allow_url_fopen in ini config.its for a single page..probably you need some cron jobs to act like spiders to get pages in a manner from other site all the time.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...