Jump to content

Facebook Website Question


Don E

Recommended Posts

Hello everyone, I was messing around on Facebook and it is obvious that they are using the state of the art in regards to web development/design. I was wondering about their way of storing data for a person. We all know they use PHP and MySQL. They store so much data for one person, how do you think they go about structuring their database/tables to do this? For instance, when a new user signs up, do you think a table is created just for that user and columns like movies, likes, interests, friends, basic info, contact info, wall posts, etc are in that table? For so much data that is stored for a person, a whole table for that person would be appropriate it seems. Thanks.

Link to comment
Share on other sites

You should never create new tables dynamically — it is just a very, very bad idea. As with all relational databases, the data should be stored in a relational fashion — so there may be one table that stores "basic data", that each person only has one of (e.g., FBID and birthday), and then another table that stores wall posts, related to the former table on a one-to-many basis, and then another table for friends, which associates users with other users with a many-to-many relation, and so on. The basic concepts of good database design remain (more or less) the same no matter how large your dataset gets — you shouldn't be intimidated into doing weird things like creating tables on-the-fly just because you have "too much data". Optimisation can be performed at the implementation level, for example Facebook makes extensive use of caching. They also have many non-relational databases that store some of the information on their system.

Link to comment
Share on other sites

Thanks Synook for your input. I figured it was the way you explained it, but thought maybe they had something else going on because of how much data they have to store per person. Another thing about Facebook and PHP that came to mind. I read some time ago that because of the size of Facebook, PHP doesn't have the "scalability" to handle such a massive website. Can you explain what they mean by this? Because if you look at PHP, how can it not be? Today being fully object-oriented, don't you think PHP would handle such a massive site no problems? So I read somewhere, because of this, they use or have something called PHP Hip Hop or something. It's something that takes PHP code and turns it into C++ code I believe.

Link to comment
Share on other sites

They do have a lot more going on, but it wouldn't involve doing strange things to their database structure. However, there are "non-relational" database systems that store data in different ways that are perhaps more suited to large-scale applications, like Google's BigTable. On the PHP side, a language being object-oriented does not necessarily mean it has higher performance; in fact, the extra effort required to compile/interpret object-oriented structures probably makes them slower to compile/run.* However, the main problem with PHP is that it is a fully interpreted language, which means that it must usually be converted into machine instructions on the fly, which incurs a significant overhead. I don't know what this overhead is, but if it is superlinear then you have a scalability problem (and it's just slower anyway). By using their HipHop "code transformer", they can convert PHP code into C++ code and then compile it, allowing their applications to run natively instead of being interpreted all the time. * on an interesting note, our algorithms subject at uni is taught in C and not C++ because it is not object-oriented, so algorithms can be implemented directly, with minimal extraneous code.

Link to comment
Share on other sites

In regards to PHP being a fully interpreted language and converted into machine instructions, does this also go for other web development languages as well? Like ASP, .NET, JSP, coldfusion...Is there an actual best or better web development language? Or in the end they all basically equal out being the same really? If you look at bank websites for example, I barely see PHP; at all for that matter. "Important" websites like banks seem to use languages like ASP, but from what I read on here once it's because of the customer support that this language offers. So since they have customer support etc, does this make those other languages better than PHP? It makes me wonder though how PHP is funded. I'm assuming it's funded somehow but no sure how. In many circles on the internet, a lot of people bash on PHP, saying it's no good etc, but if you look again, it's coming from people who are biased toward another language and trying to make PHP look bad so you can go for the other language that costs, instead of free(PHP). Thanks Synook for your insight. It is greatly appreciated!

Link to comment
Share on other sites

I believe ASP and ASP.NET can be both interpreted and compiled, depending on what language you use (remember, those are just frameworks). Java (JSP), in which Java is converted into "bytecode" (a sort of contrived assembly language), which is then run on a virtual machine. Modern PHP does this too but I don't think the process is as advanced — for example, Java has a "JIT" compilation mechanism that can cache common routines in machine code and just execute them instead of having to use the bytecode. Coldfusion is almost certainly interpreted. I wouldn't say there is a absolutely "best" programming language or framework for web development, but different ones suit different purposes. For example, along with the existence of first-party support ASP.NET (classic ASP has been superseded and is definitely not a good framework to use) integrates well with Microsoft systems such as Windows Server, SQL Server and Active Directory and thus is good for organisations that use Windows. Popular application programming languages, such as C#, C++, and VB.NET can also be used with ASP.NET, which reduces the learning curve. This is similar with JSP. In the case of bank websites, it is important to have a high level or support, but other features of PHP, such as its open-source nature, also reduce its applicability for those sort of applications. Remember the crypt() bug in PHP?* The service-level-agreements (SLAs) you get when you purchase licenses for proprietary products can provide some assurance in that regard. As aforementioned, too, using something like ASP.NET means that your programmers don't have to learn a new language from scratch. The reason, as far as I can tell, many people don't like PHP is because it is a very weird language. Many languages, such as Java or even Haskell, are designed from the ground up with very clear rules and conventions in mind — the language will operate in this way, our syntax will look like this, etc., and thus are very... uniform, if you like. PHP, on the other hand, was hacked together from a very simple start to the very complex language it is today, and thus has many little quirks and other inconsistencies that make it less "beautiful" to program in. For example, PHP only recently introduced an object-oriented model, and many of the libraries still use the old structured style (e.g. where you see a = create(); do_something(a)). There are many syntactic irregularities, too, such as with HEREDOC syntax. As a result in can feel much more ... unstructured writing in PHP than in some of the other languages, and also leads many people to program in strange ways (e.g. combining object-oriented and structured programming). I don't know how PHP gets funding, but it probably involves some combination of sponsorship, donations, and merchandise. I don't think the PHP Group sells support (unlike how, e.g. MySQL AB used to). * it basically caused the crypt() function to return the same thing for every input string.

Link to comment
Share on other sites

Synook, I basically understand the one-to-many relation when it comes to database tables, but a little confused about many-to-many relation tables. To clarify: One-to-many means: One table in a database having a relation with another table in that same database.For ie.Table A has columns: user_id | firstName | lastName | emailTable B has columns: id | favMovies | FavMusic | favFood | user_id In the scenario above, these two tables are related(linked) to each other because of the 'user_id' columns. Person A in Table A(one) can have many rows of information in Table B(-to-many). So when it comes to many-to-many relation, is this correct: Table A has columns: user_id | firstName | lastName | emailTable B has columns: id | favMovies | FavMusic | favFood | user_idTable C has columns: id | friends | likes | status | user_id In the above, Table B and Table C are related(linked) to each other because of the user_id column. Actually all tables are because of that. Does Table B and C have a many-to-many relation because both tables can have many rows of information for a particular user? Is this a good example of a many-to-many relation?

Link to comment
Share on other sites

A many-to-many relation would require an intermediate table.Table A: user_id | username | passwordTable B: data_id | field 1 | field_2Table A_to_B: user_id | data_id Table A_to_B can associate any user with many different records of the data table and viceversa.

Link to comment
Share on other sites

Facebook has open-sourced much of their code. This is their server: https://github.com/facebook/scribe More information: https://github.com/facebook/http://developers.facebook.com/opensource/

For instance, when a new user signs up, do you think a table is created just for that user and columns like movies, likes, interests, friends, basic info, contact info, wall posts, etc are in that table?
Just to point out, that wouldn't be a single table. That would be one table per type of data (user data, contacts, posts, friends, likes, etc). There's no reason to create that many tables every time a user signs up. If you need to change the structure or optimize the tables then all of a sudden you have a lot of work to do if you're keeping track of multiple tables per user (for 300 million users, that would be a lot of tables to optimize). The bottleneck for large amounts of data like this is the speed of the database. The database needs to be able to update and return information quickly. Other than the techniques that you should apply to a database of any size, like proper indexing, the major change to make something this scalable is to use a database cluster. A cluster is several database servers all acting as a single large database. Some versions of MySQL support server clusters. I would imagine that something as large as Facebook has a database cluster on the order of thousands of servers. The hardware to support that is very expensive, but then again Facebook makes a lot of money by selling your information to advertisers.
Link to comment
Share on other sites

  • 5 weeks later...

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...