Skemcin Posted June 24, 2011 Share Posted June 24, 2011 So I've been digging around trying to find a software program that I can use to mine some basic information from a website. The short of it is, I've got a new job and will be managing several web sites. I need to get an idea how each of these sites link back anbd forth to each other or to other areas. And, I need to understand the myriad of server and client side technologies being used - a sort of detail summary of how many times a particular file extension is used in link or form.I've downloaded these two applications - both falling short of my requirements:GSiteCrawlerDRKSpiderAgain, basic requirements for a report/export:a.) a count and list of unique domains and sub-domains referencedb.) a count and list of file extension referenced throughout the site - a href, form action, document.location,etc.Anyone have any suggestions? Link to comment Share on other sites More sharing options...
boen_robot Posted June 26, 2011 Share Posted June 26, 2011 Don't tell me... the sites are not with well formed XHTML, right? If they were, writing your own bot would be almost trivial (except the document.location... analyzing JavaScript originating code would ###### big time). Link to comment Share on other sites More sharing options...
Skemcin Posted June 27, 2011 Author Share Posted June 27, 2011 That is funny - a corporate site done in well formatted [anything].lolMy analysis thus far has: 2,328 pages (not that bad) 523 web forms 206 mailto references (which equals spam) 75 server side mail functions 2,484 orphaned files (lol - but understand that includes things like .htaccess 'cause not linked to) 1,108 broken links 8 public application integration points (where we use a vendor with an associated domain) 11 private application integration points (where we use a vendor with an associated domain) 2 databases with 34 tables between them Needless to say, I wasn't expecting to see well-formatted pages - lol. Link to comment Share on other sites More sharing options...
Recommended Posts
Archived
This topic is now archived and is closed to further replies.