Jump to content

web comic aggregation website general approach?


mysteriousmonkey29

Recommended Posts

I am trying to create a web comic aggregation website in which users can view a variety of different web comics all on one website in a user-friendly format. I am currently trying to decide on the best way to pull comics from other people's websites (if I can get this website to work, I eventually plan on paying for content, but right now I'm just trying to see if I can get it to work in the easiest way possible).

After some research, it seems like I could either try to parse built-in RSS feeds which most of these websites include (but which vary from website to website and may not include all the content I want), or I can try to use some kind of web scraper.

Specifically, I would like to be able to access every comic on a given website in an automated fashion (I would like users to be able to view comics by date).

What are the advantages and disadvantages of using an RSS feed as opposed to a web scraper? Is there another possible approach that I have missed?

Also, from other questions, people have suggested to me that I use jFeed, an extension to jQuery, to parse RSS feeds if I decide to go this route. However, I am a little bit confused as to the possible implementation of a web scraper. Currently, I am attempting to write the website in HTML 5, CSS 3, and JavaScript. Would I need to use a server-side scripting language in order to do a web scraper? And if so, how would that language interact with the HTML and other code? Or would it replace it entirely?

Help would be much appreciated

Link to comment
Share on other sites

The only problem with a Javascript-only solution is that cross-domain requests will only work if the owner of the RSS feed has explicitly enabled it.

Link to comment
Share on other sites

Thanks for the example; I will check it out.

 

As for the cross domain request comment, this also seems like a good point. After a little research it definitely seems like that would be a problem for my website. It also seems like the best way around it is to create a server-side proxy in PHP or some similar language. So it looks like I will have to use server-side code after all. Fortunately, I also get the impression that developing my website in this way will provide some other advantages, although I'm not sure what yet (it just seems like everyone I talk to suggest to do it this way)

 

Thanks

Link to comment
Share on other sites

If I were you I would ask permission from each of the comic owners before making your site. Otherwise you may get into legal trouble and that can be expensive.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...