Jump to content

trying to design an efficient database for my website?


Recommended Posts

Hello, I am currently working on a web comic aggregation website. My general Design is an RSS reader that grabs the publishing date and image source link From various web comic RSS feeds each time they are updated, and uses this to update a database. Then the front end accesses this database info to be displayed in a user-friendly format each time someone accesses the website. I might eventually add a user account system to store user preferences about comic arrangements, which would also need to be put in the database. I am currently writing the back end in PHP and MySQL, and am trying to decide on a general database design.

 

I googled how to design an efficient relational database, which provided a lot of information, but also filled me with fear that my database will be massively inefficient (because I've never done this before), and that it will eventually make my site slow and unusable. So I thought I would check my design on some forums before I proceed.

 

My original plan was to have two databases: one for comics, then one for users later (am not too concerned about this one, because I'm sure there are a ton of examples for how to make an efficient username/password/info database). The comics database would have one table for each different comic, and each table would have a column for publishing date and image source link. However, after reading about database normalization, I thought of an alternative design: one table for comics, including their name and a primary ID,and then one table for publishing dates and one table for image source links, both including their own primary ID, and also the foreign key of the comic ID for each link or date. However, then I realized that the publishing dates table isn't going to make any sense because the publishing date is kind of like a key for the image source link.

 

So basically I'm pretty confused as to how to generally arrange my tables and fields. I was wondering if anyone has any suggestions?

 

I also have another concern. Some RSS feeds do not include publication dates, so it's possible that I am going to need to have some null entries for publishing dates. The MySQL manual told me to use the "NOT NULL" whenever possible, so I was wondering if not using it for this category will be a problem?

 

Finally, there are a few miscellaneous things that I was wondering about. I have read that I can change table types, storage engines, and row format (compact, dynamic, compresse, etc.). I have also read that sometimes websites sacrifice normalization (and thus total storage space) for speed somehow. I was wondering if these more nitpicky details are important for my website? Although I plan on accumulating very long lists of individual comics (at least thousands, maybe tens of thousands for each different web comic), and the website could potentially become accessed very frequently if it became popular, it doesn't seem like what I am trying to do with my database is nearly as complicated as the kinds of things that most people are trying to do. So I was wondering how much of this optimization stuff is really worth the time, and how much of it is just going to make insignificant differences in the performance of my website?

 

Help would be much appreciated

Link to post
Share on other sites

Figure out the relations between everything. When you say a comic, I assume you mean an entire strip like XKCD, Far Side, etc. And that each comic would have multiple posts or whatever you want to call them, and each post might have multiple images. So, you need to define those relationships. The relationship between a comic and a post is one-to-many, because one comic can have many posts, but each post only belongs to one comic. The relationship between posts and images would also be one-to-many. You might also identify relationships that are one-to-one or many-to-many. Each of those relationships requires a different database structure. A one-to-many relationship only needs 2 tables, one to hold the parent object and the other to hold the child objects, where the child table has a column for the parent ID for each child row. A many-to-many relationship requires 3 tables. Objects with a one-to-one relationship can all go in a single table unless there's a particular reason for separating them.

 

I was wondering if these more nitpicky details are important for my website?

Not really, but if you want to be able to define foreign keys in your database, and you're using MySQL, then choose InnoDB as the storage engine instead of MyISAM. MyISAM is fine for a lot of sites but more complex data or sites where the data frequently changes may be better with InnoDB.
Link to post
Share on other sites

Okay, let's say that the phrase "comic strip" refers to a whole strip--XKCD, the far side, etc. (good list by the way, some of my favorites), and the word "comic" refers to each individual cohesive joke image within a comic strip. I agree that the relationship between comic strips and posts is one-to-many. However, from what I've seen, the relationship between posts and comics/images is actually one-to-one in all the comics I've checked so far. And I've checked a lot--about 30--so I think this is a safe assumption for future comics. The relationship between posts and publication dates is also one-to-many, although this one is confusing because different comics strips will inevitably have different comics with the same publication date. So does this mean that I should have…

 

one table for comic strips, including their names and primary IDs,

one table for image source links, including the links, primary IDs, and foreign keys of the comic strips, and

one table for publication dates, including the dates, primary IDs, and foreign keys of the image source links (but not the comic strip's Foreign keys because those can be deduced from the image source links table)

 

?

Link to post
Share on other sites

You just need one table. Something like this:

  • id (an AUTO_INCREMENT field used for adding, editing and removing)
  • title (Name of this particular strip "Today's comic")
  • image (path or filename of the image file that represents the strip, like "xkcd-todays-strip.png")
  • date (Date this strip was posted "2016-08-12". This field can be indexed for faster searching)
  • comic_strip (The comic this strip belongs to, for example "xkcd". This field can be indexed for faster searching)

If you want detailed information about the comic that he strips belong to, you can set the comic_strip field to an INT type and make a separate table for the comic strip. It might look something like this:

  • id (an AUTO_INCREMENT field used to identify the conic strip)
  • name (For example "xkcd")
  • description (For example "A webcomic of romance, sarcasm, math, and language.")
  • image (A file path. Maybe an icon to represent this comic)
Link to post
Share on other sites

Do I only need one table like that in the logical arrangement sense, or are you saying that you think that is one of the most efficient ways to design it? I am currently considering a different table arrangement suggested to me on another forum:

 

comic_strip
===========
id (integer, primary key, auto-incremented)
name (varchar)

comic
=======
id (integer, primary key, auto-incremented)
comic_strip_id(integer, foreign key to comic_strip.id)
publish_date (date or timestamp)
image_url (varchar)

 

Having only one table as you suggested, or one table for each comic strip as I was originally thinking make the most sense to me just from an intuitive layout point of view, but I'm trying to choose the most efficient table layout possible. What is your opinion on that?

 

And also what do you mean by indexing the date for faster searching? And finally, do you think it would be a better idea to store the publishing date as a date or timestamp?

Link to post
Share on other sites

The table structure depends strictly on what data you want to store. If the only thing you're storing about the comic strip is its name, then you don't need another table for it. It's unnecessary. Just store the comic strip's name in a field in the comic table. If you're storing multiple things about the comic strip, like description, website and icon, then you need a separate table for it.

 

When creating a table you can mark a field as indexed, this makes it faster to search and order by that field.

 

Whether you want a date or timestamp is up to you. When displaying the date next to the comic do you want to see the exact time the comic was published or just on which day?

Link to post
Share on other sites

However, from what I've seen, the relationship between posts and comics/images is actually one-to-one in all the comics I've checked so far.

That's fine if you want to make that assumption, but that means that your application won't be able to support anything that uses multiple images for a single post or whatever.
Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
×
×
  • Create New...