Which is better for performance

MrAdam · December 21, 2010

Hi guys,I was just curious as to which you thought would be the most efficient method of storing friendships.a) One row per relationship. Draw-backs will obviously be when you have to check against 2 columns to retrieve a list of a particular user's friends, or just checking if a certain user is in their friend's list. The returned result set would also be harder to work with. Or, Two rows per friendship. Main draw-back being that twice the memory is needed, and inserting/updating would be a little more complex, however the queries would be much more optimized, and the returned data would be consistent.I'm more included towards b, but what would you say?Thanks for any responses,Adam

[dx] · December 21, 2010

How about table friendships where you haveID_friendship (auto increment)ID_userID_friendand use query: SELECT ID_friend FROM friendships WHERE ID_user='$usersid'So there could be duplicated items. Example:I'm 1, you are 2So:ID_friendship --- ID_user --- ID_friend // Query on this will result my friends------ 1 ------ ------- 1---- ------- 2 -----And also it must be:ID_friendship --- ID_user --- ID_friend // Query on this will result your friends------ 2 ------ ------- 2---- ------- 1 -----

boen_robot · December 21, 2010

I'm not sure I see what you're talking about with this two rows per relationship model... care to give an example?If you make both ID_user and ID_friend a primary (non auto incrementing) key, you don't need ID_frindship... that would be the best space wise and integrity wise approach.

MrAdam · December 21, 2010

Okay let's say the table is structured roughly as you say, boen_robot, like:

create table friendships (	user_id mediumint unsigned not null,	friend_id mediumint unsigned not null,	primary key (user_id, friend_id));

And we have the following data:

insert into friendships values (1, 2), (3, 1), (2, 3);

So 1 is friends with 2 and 3, 2 is friends with 3 (and vice-versa obviously, given that the friendship would be a two way thing). We store the user who requested the friendship in `user_id`, and the friend who accepted in `friend_id`.So as an example, let's say I wanted to return all the friends of 1, I'd have to check both columns:

select * from friendships where user_id = 1 or friend_id = 1;

That would give me:+---------+-----------+| user_id | friend_id |+---------+-----------+| 1 | 2 || 3 | 1 |+---------+-----------+It wouldn't be difficult based on the ID we queried with to arrange these correctly, but that's adding complexity to select queries, as well as overhead on the database to check both columns.Instead if we inserted two rows for each friendship:

truncate friendships;insert into friendships values (1, 2), (2, 1), (1, 3), (3, 1), (2, 3), (3, 2);

We could then query with:

select * from friendships where user_id = 1;

To produce:+---------+-----------+| user_id | friend_id |+---------+-----------+| 1 | 2 || 1 | 3 |+---------+-----------+Which would mean cleaner, faster select statements, but obviously twice the memory.What I'm unsure about is if my logic is right; or would the memory issue out-weigh the cleaner/simpler statements?

boen_robot · December 21, 2010

A friendship is not necesarily a mutual and transit act... you may have me as someone who you think of as a friend, but should I have you as mine if I don't know you myself (but a friend of mine knows you) or don't want you as my friend anymore for whatever reason even though you still want us to be friends?The first kind of table shows you not only who is friends with who, but in which user's list the friend is. As Haris_S put it, you can do

SELECT friend_id FROM friendships WHERE user_id=$usersid

and get all friends in a certain user's list... not necesarily mutual friends. And you can use something like:

SELECT f.friend_id FROM friendships AS f WHERE f.user_id=$usersid AND $usersid IN (SELECT mf.friend_id FROM friendships AS mf WHERE mf.user_id = f.friend_id)

to get only mutual friends.Yes, the query for mutual friends is more complex, but the processing overhead (in the above query at least; there may be a more efficient query for the same thing) is only going to be big if all friends are mutual.

MrAdam · December 21, 2010

Okay well, assume for this friendship's are mutual.

boen_robot · December 21, 2010

Well, the same queries would work for the double row model too, only this time you can assume all friends are mutual if you care to add both combinations at insert time, and therefore improve perfroamance by using the first query... at the price of using double the space as you point out.With a single row model... I think you may be able to do something like

SELECT user_id WHERE friend_id = $useridUNIONSELECT friend_id WHERE user_id = $userid

to get all (allegedly mutual) friends of a user, all conviniently palced in a single column in the result set. The statement is a little more complex, but the overhead is probably less than that of the previous more complex query (a UNION basically appends one set to the other, removing items that duplicate with ones from the first along the way)

MrAdam · December 21, 2010

Ah that's perfect. I knew of UNION, but never used it enough / read properly into it, for it to come to mind.Thanks for your help!

Sign In

Which is better for performance

Recommended Posts

MrAdam

Link to comment

Share on other sites

[dx]

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

MrAdam

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

MrAdam

Link to comment

Share on other sites

boen_robot

Link to comment

Share on other sites

MrAdam

Link to comment

Share on other sites

Archived

Browse

Activity