Jump to content

mysteriousmonkey29

Members
  • Posts

    49
  • Joined

  • Last visited

Everything posted by mysteriousmonkey29

  1. Okay, let's say that the phrase "comic strip" refers to a whole strip--XKCD, the far side, etc. (good list by the way, some of my favorites), and the word "comic" refers to each individual cohesive joke image within a comic strip. I agree that the relationship between comic strips and posts is one-to-many. However, from what I've seen, the relationship between posts and comics/images is actually one-to-one in all the comics I've checked so far. And I've checked a lot--about 30--so I think this is a safe assumption for future comics. The relationship between posts and publication dates is also one-to-many, although this one is confusing because different comics strips will inevitably have different comics with the same publication date. So does this mean that I should haveā€¦ one table for comic strips, including their names and primary IDs, one table for image source links, including the links, primary IDs, and foreign keys of the comic strips, and one table for publication dates, including the dates, primary IDs, and foreign keys of the image source links (but not the comic strip's Foreign keys because those can be deduced from the image source links table) ?
  2. Hello, I am currently working on a web comic aggregation website. My general Design is an RSS reader that grabs the publishing date and image source link From various web comic RSS feeds each time they are updated, and uses this to update a database. Then the front end accesses this database info to be displayed in a user-friendly format each time someone accesses the website. I might eventually add a user account system to store user preferences about comic arrangements, which would also need to be put in the database. I am currently writing the back end in PHP and MySQL, and am trying to decide on a general database design. I googled how to design an efficient relational database, which provided a lot of information, but also filled me with fear that my database will be massively inefficient (because I've never done this before), and that it will eventually make my site slow and unusable. So I thought I would check my design on some forums before I proceed. My original plan was to have two databases: one for comics, then one for users later (am not too concerned about this one, because I'm sure there are a ton of examples for how to make an efficient username/password/info database). The comics database would have one table for each different comic, and each table would have a column for publishing date and image source link. However, after reading about database normalization, I thought of an alternative design: one table for comics, including their name and a primary ID,and then one table for publishing dates and one table for image source links, both including their own primary ID, and also the foreign key of the comic ID for each link or date. However, then I realized that the publishing dates table isn't going to make any sense because the publishing date is kind of like a key for the image source link. So basically I'm pretty confused as to how to generally arrange my tables and fields. I was wondering if anyone has any suggestions? I also have another concern. Some RSS feeds do not include publication dates, so it's possible that I am going to need to have some null entries for publishing dates. The MySQL manual told me to use the "NOT NULL" whenever possible, so I was wondering if not using it for this category will be a problem? Finally, there are a few miscellaneous things that I was wondering about. I have read that I can change table types, storage engines, and row format (compact, dynamic, compresse, etc.). I have also read that sometimes websites sacrifice normalization (and thus total storage space) for speed somehow. I was wondering if these more nitpicky details are important for my website? Although I plan on accumulating very long lists of individual comics (at least thousands, maybe tens of thousands for each different web comic), and the website could potentially become accessed very frequently if it became popular, it doesn't seem like what I am trying to do with my database is nearly as complicated as the kinds of things that most people are trying to do. So I was wondering how much of this optimization stuff is really worth the time, and how much of it is just going to make insignificant differences in the performance of my website? Help would be much appreciated
  3. Okay, make sense. Unfortunately I don't have any experience in this kind of thing so crossed fingers about designing an efficient database ha ha. I guess I can Google tips on that sort of thing. Either way, thank you all!
  4. Great, thanks. So you think the scalability problems are more about complexity than purely size?
  5. I am In the process of trying to create a web comic aggregation website in php/HTML/JavaScript. I already have a mostly functional PHP RSS feed reader that grabs the image source URLs from a given feed. I am now trying to add the functionality to add these URLs to a database, as well as to download the images to a file system. I am trying to decide on a database API, but am having trouble. From my research, it seems like MySQL is pretty much perfect for what I am trying to do, unless the website grows too popular or the database grows too large, in which case I am told that MySQL will slow down significantly. Initially, the database shouldn't be that large, but over time, it will grow from RSS feeds and probably from the addition of new comics. Also, I don't think the website will become incredibly popular, but if it did and then promptly ceased functioning due to lack of a scalable database, that would certainly be a shame. I am also considering other SQL APIs, such as Microsoft and Oracle SQL. However, from my research, it seems like MySQL is easier to use, cheaper, and has more resources to explain how to use it. So I am torn. Any recommendations for which one to choose? Another option I am considering is to start developing in MySQL, and then if I ever run into significant performance issues from database size or website popularity, to switch to one of the more robust database APIs. I have read that this isn't too hard, but am suspicious. Can anyone give a second opinion on this assessment? PS: I am currently running Windows 7 and developing in NetBeans 8 with XAMPP, although if and when I ever launch my website, I plan on getting a hosting plan and not hosting it myself. Help would be much appreciated
  6. More specifically, I have read a lot about MySQL, It seems basically perfect except that I have read that it has major performance issues if you attempt to scale it up a lot. Do you guys agree with this assessment, and if so, how large does a website need to get in order to run into these performance issues? I was also considering some other options like Microsoft and Oracle SQL.
  7. Actually I think I just figure out the answer to my question. My guess is that it's just included in the hosting plan you get. I think I generally get how database and file system storage works in the context of actually producing a website now--thanks for the help. Any recommendations as to different types of file system storage and/or database storage?
  8. Okay noted, thanks again for the advice. Do you know how storing stuff in a file system works if you don't own the actual server and are renting it?
  9. Okay cool. If I decided on a particular type of database API to use (say MySQL), would that limit my options for purchasing hosting plans? Also, would most hosting plans include databases capable of handling tens or hundreds of thousands of lines of text and maybe also images? As for the copyright issue, thanks. If I can get the site to work locally, then I plan on going to the next step and contacting the content owners to pay for content. But for now I'm just seeing if I can get it to work
  10. Hello, I am trying to create a web comic aggregation website in PHP/HTML/JavaScript. Currently, my general design plan for the website is an RSS reader that periodically checks various web comics for new comics, and then adds these comics to an ever-growing database, which is then accessed via the client side of the website and displayed in a user-friendly and interactive format. I also may eventually attempt to use web scraping to backfill the database with all the previous comics from before the RSS updater was launched. Currently, I have created an RSS reader that gets the image source of all the comics from a given RSS feed. So mostly I would want to be storing text strings (the URLs) along with some kind of organizational keys (probably based on publication date). Currently, it is not particularly important to me that this information be kept secure at all, although I may eventually add an account creation option in which case I would need to securely store usernames and passwords. I would also like to back up all the comic links I gather by actually downloading the images somewhere (in case the links ever get changed or taken down). Finally, this database can be the same for every user of the website, although it needs to be able to get very long (probably between thousands and hundreds of thousands of individual comics), and of course I would like it to be reasonably quickly accessible so my site isn't incredibly slow. After some research online, it seems like I could use simple file storage, or some kind of database storage API, such as MySQL or Microsoft SQL Server. I was wondering what the advantages and disadvantages of these various options might be (and if I missed any major options), along with how server-side web storage generally works? For example, when I actually launch my website, I know that I would have to rent a server from somewhere or set one up myself. Would I also have to rent a database storage means along with this? How would that generally work?
  11. Thanks,you were correct. Here is the code I ended up with. I replaced everything inside the for each loop with just a getImageSrc function that calls a getImageTag function: //function to find an image tag within a specific section if there is one function getImageTag ($item,$tagName) { //pull desired section from given item $section = $item->getElementsByTagName($tagName)->item(0); //reparse description as if it were a string, because for some reason PHP woon't let you directly go to the source image with getElementsByTagName $decoded_section = htmlspecialchars_decode($section->nodeValue); $section_xml = new DOMDocument(); @$section_xml->loadHTML($decoded_section); //the @ is to suppress a bunch of warnings about characters this parser doesn't like //pull image tag from section if there $image_tag = $section_xml->getElementsByTagName('img')->item(0); return $image_tag; } //function to get the image source URL from a given item function getImageSrc ($item) { $image_tag = getImageTag($item,'description'); if (is_null($image_tag)) //if there was nothing with the tag name of image in the description section { //check in content:encoded section, because that's the next most likely place $image_tag = getImageTag($item,'encoded'); if (is_null($image_tag)) //if there was nothing with the tag name of image in the encoded content section { //if the program gets here, it's probably because the feed is crap and doesn't include images, //or it's because this particular item doesn't have a comic image in it $image_src = ''; //THIS EXCEPTION WILL PROBABLY NEED TO BE HANDLED LATER TO AVOID POTENTIAL ERRORS } else { $image_src = $image_tag->getAttribute('src'); } } else { $image_src = $image_tag->getAttribute('src'); } return $image_src; }
  12. Hello, I am trying to create an RSS reader based on this example: http://www.w3schools.com/php/php_ajax_rss_reader.asp Specifically, I am attempting to modify this example so that the reader will access and display all the available comic images (and nothing else) from any given web comic RSS feed. I realize that it may be necessary to make the code at least a little site-specific, but I am trying to make it as general-purpose as possible. Currently, I have modified the initial example to produce a reader that displays all the comics of a given list of RSS feeds.. However, it also displays other unwanted text information that I am trying to get rid of. Here is my code so far, with a few feeds that are giving me trouble in particular: index.php file: <html> <head> <script> function showRSS() { if (window.XMLHttpRequest) { // code for IE7+, Firefox, Chrome, Opera, Safari xmlhttp=new XMLHttpRequest(); } else { // code for IE6, IE5 xmlhttp=new ActiveXObject("Microsoft.XMLHTTP"); } xmlhttp.onreadystatechange=function() { if (xmlhttp.readyState==4 && xmlhttp.status==200) { document.getElementById("rssOutput").innerHTML=xmlhttp.responseText; } } xmlhttp.open("GET","logger.php",true); xmlhttp.send(); } </script> </head> <body onload="showRSS()"> <div id="rssOutput"></div> </body> </html> (pretty sure there's nothing wrong with this file; I think the problems arise in the next one although I included this one for completeness) logger.php: <?php //function to get all comics from an rss feed function getComics($xml) { $xmlDoc = new DOMDocument(); $xmlDoc->load($xml); $x=$xmlDoc->getElementsByTagName('item'); foreach ($x as $x) { $comic_image=$x->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue; //output the comic echo ($comic_image . "</p>"); echo ("<br>"); } } //create array of all RSS feed URLs $URLs = [ "SMBC" => "http://www.smbc-comics.com/rss.php", "garfieldMinusGarfield" => "http://garfieldminusgarfield.net/rss", "babyBlues" => "http://www.comicsyndicate.org/Feed/Baby%20Blues", ]; //Loop through all RSS feeds foreach ($URLs as $xml) { getComics($xml); } ?> Because this method includes extra text in between the comic images (a lot of random stuff with SMBC, just a few advertisement links for gMg, and a copyright link for baby blues), I looked at the RSS feeds and concluded that the problem is that it's the description tag that includes the image source, but also includes other stuff. Next, I tried modifying the getComics function to scan directly for the image tag, rather than first looking for the description tag. I replaced the part in between the DOMDocument creation/loading and the URL list with: $images=$xmlDoc->getElementsByTagName('img'); print_r($images); foreach ($images as $image) { //echo $image->item(0)->getAttribute('src'); echo $image->item(0)->nodeValue; echo ("<br>"); } but apparently getElementsByTagName doesn't pick up the image tag embedded inside the description tag, because I get no comic images outputted, and the following output from the print_r statement: DOMNodeList Object ( [length] => 0 ) DOMNodeList Object ( [length] => 0 ) Finally, I tried a combination of the two methods, trying to use getElementsByTagNam('img') inside the code that parses out the description tag contents. I replaced the line: $comic_image=$x->getElementsByTagName('description')->item(0)->childNodes->item(0)->nodeValue; with: $comic_image=$x->getElementsByTagName('description')->item(0)->getElementsByTagName('img'); print_r($comic_image); But this also finds nothing, producing the output: DOMNodeList Object ( [length] => 0 ) So sorry for the really long background, but I'm wondering if there is a way to parse just the img src out of a given RSS feed without the other text and links I don't want? Help would be much appreciated
  13. Thanks for the logging and request response information; I'll check that stuff out. As for the Xdebug stuff, I will check on those settings, thanks. The weird thing though is that when I run projects in debug mode from NetBeans, I don't get any error messages, and I get something appended to the URL indicating that an Xdebug session is started. It's like it's enabled but not working
  14. Yes, this is what that section says: xdebug xdebug support enabled Version 2.4.0 IDE Key Randy Supported protocols Revision DBGp - Common DeBuGger Protocol $Revision: 1.145 $ Directive Local Value Master Value xdebug.auto_trace Off Off xdebug.cli_color 0 0 xdebug.collect_assignments Off Off xdebug.collect_includes On On xdebug.collect_params 0 0 xdebug.collect_return Off Off xdebug.collect_vars Off Off xdebug.coverage_enable On On xdebug.default_enable On On xdebug.dump.COOKIE no value no value xdebug.dump.ENV no value no value xdebug.dump.FILES no value no value xdebug.dump.GET no value no value xdebug.dump.POST no value no value xdebug.dump.REQUEST no value no value xdebug.dump.SERVER no value no value xdebug.dump.SESSION no value no value xdebug.dump_globals On On xdebug.dump_once On On xdebug.dump_undefined Off Off xdebug.extended_info On On xdebug.file_link_format no value no value xdebug.force_display_errors Off Off xdebug.force_error_reporting 0 0 xdebug.halt_level 0 0 xdebug.idekey no value no value xdebug.max_nesting_level 256 256 xdebug.max_stack_frames -1 -1 xdebug.overload_var_dump On On xdebug.profiler_aggregate Off Off xdebug.profiler_append Off Off xdebug.profiler_enable Off Off xdebug.profiler_enable_trigger Off Off xdebug.profiler_enable_trigger_value no value no value xdebug.profiler_output_dir \ \ xdebug.profiler_output_name cachegrind.out.%p cachegrind.out.%p xdebug.remote_addr_header no value no value xdebug.remote_autostart Off Off xdebug.remote_connect_back Off Off xdebug.remote_cookie_expire_time 3600 3600 xdebug.remote_enable Off Off xdebug.remote_handler dbgp dbgp xdebug.remote_host localhost localhost xdebug.remote_log no value no value xdebug.remote_mode req req xdebug.remote_port 9000 9000 xdebug.scream Off Off xdebug.show_error_trace Off Off xdebug.show_exception_trace Off Off xdebug.show_local_vars Off Off xdebug.show_mem_delta Off Off xdebug.trace_enable_trigger Off Off xdebug.trace_enable_trigger_value no value no value xdebug.trace_format 0 0 xdebug.trace_options 0 0 xdebug.trace_output_dir \ \ xdebug.trace_output_name trace.%c trace.%c xdebug.var_display_max_children 128 128 xdebug.var_display_max_data 512 512 xdebug.var_display_max_depth 3 3 xml XML Support active XML Namespace Support active libxml2 Version 2.9.4 xmlreader XMLReader enabled
  15. I'm not sure; how would I tell? I'm trying to post the output by PHP info here but it's too long
  16. Okay, I opened up chrome development tools and looked in the error console and there was an error. I'm not sure what you mean by the file request and response, and I'm also not sure how to add logging to PHP and JavaScript. I'm using NetBeans 8. However, I did look up a video of how Xdebug is supposed to work with NetBeans (probably should've done this earlier): When I run my code with a breakpoint like in the video, I browser does not pause. I am allowed the option to stop debugging, but this step through buttons are all greyed out. any idea why this is happening? It seems like I have the debugger installed, but that it's not functioning correctly
  17. Hello, I am trying to create a simple RSS reader using HTML 5, JavaScript+jQuery+jFeed, and now, PHP. I originally was not using PHP, but then I ran into a cross domain request issue when trying to access an RSS feed (I am using the web comic XKCD as a test) with the following script in the body: <!--include jQuery--> <script type="text/javascript" src="jquery-3.0.0.js"></script> <!--include jFeed--> <script type="text/javascript" src="jfeed.js"></script> <script type="text/javascript"> jQuery.getFeed({ url: 'http://xkcd.com/rss.xml' success: function(feed) { alert('jFeed test'); alert(feed.title); } }); </script> which produces the following error: XMLHttpRequest cannot load http://xkcd.com/rss.xml. No 'Access-Control-Allow-Origin' header is present on the requested resource. Origin 'http://localhost:8383' is therefore not allowed access. (18:01:48:198 | error, javascript) at public_html/index.html which after research, is supposedly solvable by using a PHP server proxy to forward the RSS page to the client side of my website. So I added the following PHP file to my project (my attempt at a proxy, basically just copied from examples online): <?php // Set your return content type header('Content-type: application/xml'); // Website url to open $url = 'http://xkcd.com/rss.xml'; // Get that website's content $handle = fopen($url, "r"); // If there is something, read and return if ($handle) { while (!feof($handle)) { $buffer = fgets($handle, 4096); echo $buffer; } fclose($handle); } ?> And then changed the RSS URL in the HTML code to that of my PHP proxy file, so that it now reads: url: 'localhost/XKCD_proxy.php', However, when I test tthis on my local server using XAMPP, nothing happens--no jQuery dialogues, as included in the success function. I also don't get any errors so I'm pretty confused. Anyone know what I'm doing wrong? Thanks in advance
  18. Thanks, my current plan is just to develop the website to see if I can actually get it working. Then if I can, I will talk to the owners of the content before actually launching it.
  19. Thanks for the example; I will check it out. As for the cross domain request comment, this also seems like a good point. After a little research it definitely seems like that would be a problem for my website. It also seems like the best way around it is to create a server-side proxy in PHP or some similar language. So it looks like I will have to use server-side code after all. Fortunately, I also get the impression that developing my website in this way will provide some other advantages, although I'm not sure what yet (it just seems like everyone I talk to suggest to do it this way) Thanks
  20. I am trying to create a web comic aggregation website in which users can view a variety of different web comics all on one website in a user-friendly format. I am currently trying to decide on the best way to pull comics from other people's websites (if I can get this website to work, I eventually plan on paying for content, but right now I'm just trying to see if I can get it to work in the easiest way possible). After some research, it seems like I could either try to parse built-in RSS feeds which most of these websites include (but which vary from website to website and may not include all the content I want), or I can try to use some kind of web scraper. Specifically, I would like to be able to access every comic on a given website in an automated fashion (I would like users to be able to view comics by date). What are the advantages and disadvantages of using an RSS feed as opposed to a web scraper? Is there another possible approach that I have missed? Also, from other questions, people have suggested to me that I use jFeed, an extension to jQuery, to parse RSS feeds if I decide to go this route. However, I am a little bit confused as to the possible implementation of a web scraper. Currently, I am attempting to write the website in HTML 5, CSS 3, and JavaScript. Would I need to use a server-side scripting language in order to do a web scraper? And if so, how would that language interact with the HTML and other code? Or would it replace it entirely? Help would be much appreciated
  21. Okay, thanks for the info. After reading your answer and a few other similar questions on other forums, I have some follow-up questions: First, I am concerned about the overall availability of RSS feeds, or at least of those that output what I want. After doing some research on various web comics to see if they included RSS feeds, I found that they almost overwhelmingly did, although some of them seem to include crippled feeds that don't actually include any images--they just have links to each individual comic on the original website. I also read somewhere else that this is becoming a trend (to make sure users see advertisements from the original website). I was wondering if there is any way around this kind of thing? Because the whole point of my website idea is to make the comics easy to read and all on one webpage, not to include a bunch of different links to different webpages. Second, I am still a little confused as to the capabilities of RSS feeds. Do these feeds provide content for articles and comics in only a range of recent dates? Or can a typical RSS feed access all articles/comics from the original website across its entire history? Because most of what I'm reading online refers to RSS feeds as a way to get daily updated news articles. Don't get me wrong, I would like my website to update comics daily, but I would also like users to be able to access very old comics, so I'm wondering if a typical RSS feed will provide this capability. Third, I'm wondering about the advantages and disadvantages of the suggestion to use server-side code in conjunction with client-side code to create my website. How exactly would this benefit a website in search engine rankings? Are there other advantages? Finally, I'm curious about the general implementation of a combination of server-side and client-side code in a website. After some research it looks like I would have to use a different language for the server-side code, like PHP or Ruby or whatever, and I was wondering, if this is the case, how do the different languages interact? Would it be like the interaction between HTML, CSS, and JavaScript? For example, would I "iinclude" the server-side code file inside the client-side HTML file? Thanks again
  22. Hello,I am trying to create a web comic aggregation website using HTML 5, CSS 3, and JavaScript. I would like users to be able to view comics of different dates from different websites all in one place. After some research, it seems like I'm probably going to need to use an RSS feed to accomplish this. However, I don't fully understand the capabilities and usage of an RSS feed. First, would it be possible to pull images from comic websites in an automated and orderly fashion using an RSS feed? Or would I need to use something else? Or would it not be possible at all? If it is possible with an RSS feed, I'm also confused somewhat about the general implementation. Would the code for the feed be in HTML, or JavaScript, or both? Would I need to use existing libraries and or APIs? Is their existing code with a similar enough function that I could use it as a starting point? Thanks
×
×
  • Create New...