Jump to content

Handling Cross-references Of Two Large Data Sets?


IAmBill

Recommended Posts

I don't know this particular paradigm of web development and I was hoping some more experienced developers could help me on this one. I am not posting this in a medium specific forum as I'm uncertain which medium to use.Let's say I have a registration-based website with the potential for a large amount of traffic. Now, let's say I have functionality in the website where I users are able to write personal notes on any other user which they can recall at a later time. Given the fact that there is a cross-reference of users, the number of possible combination grows exponentially with each new user. What is the best way to store this? Would it be best to just throw it into a single table in a database with three fields and just let it get massive? Do mySQL tables have row restrictions? Would it be better to give each user their own table? What about storing the data in unique XML files for the user? I could see having thousands of tables in a database being an issue, but would it be absurd to have thousands of XML files stored? I'm just concerned that if I lump it into one table that it might take a while to query it. I think it would be less time consuming to maintain a reference table of users to their respective XML file, and just parse that to get the info I need.

Link to comment
Share on other sites

Do mySQL tables have row restrictions?
I don't think so, but they do have overall size restrictions. I think either 2GB or 4GB might be the maximum size for a table, but I think that also depends on the server platform. For what it's worth, I've got a system with over 75,000 users and the single largest table is just over 100MB. Most sites that handle tens of thousands of users and database accesses per day end up using more than one database server, you can set up a MySQL cluster where you have several servers all handling the same database.
Would it be better to give each user their own table?
I've never considered that a good idea, it makes a lot more sense to just have one user table where each user has one record.
I'm just concerned that if I lump it into one table that it might take a while to query it.
It might, but indexes will speed up searches. I've got a small lookup table with 2 columns that has over a million records in it. If I look up all of the records for a random user I get 50 records returned in 0.0094 seconds. Since that table has an index on the user ID column, looking up records based on the user ID is very fast. You're going to have a lot more overhead waiting for the filesystem to find and open a file than you will waiting for the database to find records with good indexes set up. For the largest table I mentioned above that's over 100MB with about 30 columns (just under 400k total records), if I look up the same user's records in that table I get 20 records returned in 0.011 seconds, and that's returning a lot more data than the smaller 2-column lookup table. That larger table also has an index on the user ID field.You really only need 2 tables to handle that. You need one table to hold the users, and another one to hold the notes. The notes table can have one user ID column for the user ID who created the note, another user ID column for the user ID of the user that the note is about, and then whatever note details there are. You can look up all notes written by a certain user, all notes about a certain user, etc without needing to use the filesystem at all or mess around with parsing XML files. Using the filesystem involves a lot more overhead than using the database, the database is specifically optimized for this type of usage.
Link to comment
Share on other sites

It might, but indexes will speed up searches. I've got a small lookup table with 2 columns that has over a million records in it. If I look up all of the records for a random user I get 50 records returned in 0.0094 seconds. Since that table has an index on the user ID column, looking up records based on the user ID is very fast. You're going to have a lot more overhead waiting for the filesystem to find and open a file than you will waiting for the database to find records with good indexes set up. For the largest table I mentioned above that's over 100MB with about 30 columns (just under 400k total records), if I look up the same user's records in that table I get 20 records returned in 0.011 seconds, and that's returning a lot more data than the smaller 2-column lookup table. That larger table also has an index on the user ID field.You really only need 2 tables to handle that. You need one table to hold the users, and another one to hold the notes. The notes table can have one user ID column for the user ID who created the note, another user ID column for the user ID of the user that the note is about, and then whatever note details there are. You can look up all notes written by a certain user, all notes about a certain user, etc without needing to use the filesystem at all or mess around with parsing XML files. Using the filesystem involves a lot more overhead than using the database, the database is specifically optimized for this type of usage.
Hmmm... you're getting me thinking, but I'm on the day after an all-nighter, so I'm probably not thinking clearly. I know I couldn't possibly have larger tables than some of the more major websites... and it's sensible to break the data up into two tables that cross-reference each other based on comparable keys... in fact, it feels obvious, but my brain is liquid so I can't quite comprehend it. In fact, I'm quite sure I've done plenty of it in a previous database I've built for a company... maybe I should sleep.
Link to comment
Share on other sites

It's never a good idea to create new tables on the fly. Things just get really confusing.

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...