Jump to content

Db Design


tal

Recommended Posts

Hello I am building a DB for a website (mysql 5.1.3, and my site with php)and have this situation I have one big table "main"main:-------id, (unique)user email, (unique)country,religion,education,... (about 20 more columns with data that is the same for all users (not unique))so i figured to make another table "dictionary", in order to save in disk space in "main" tableby storing the real value once in "dictionary" table and store only pointers in "main" tabledictionary:------------id, (unique)country,religion,education,... (all the columns with data that can be pointed to)now i have a few questions - should i make a separate dictionary table for each pointed column ?- what is better - saving disk space while doing more select queries that are more complicated as i need to get data from more tables ? - or the opposite having less queries and simpler queries and take more disk space (as one big table) ?thank youTal

Link to comment
Share on other sites

No you should fragment your tables as much as possible, for things like religion, country, obviously, there's only a certain amount of options, so you should have a country table, with an id/code and country name, then in main you'd reference a line of country, same for all other info, can I see your full list of columns?One info that I always hesitate is for ###### -> haha i tried to say s/e/x, okay, lets say "gender", M/F should you create a separate table? in theory this is the same situation as country / language / religion, but with only 2 options, so...

Link to comment
Share on other sites

hy djpmy two create table are below in main table all the columns with small length are pointers to the dictionary table now my question is should i make a different table to each of the columns in dictionary table or leave it as it is with 23 columns in example country column has 219 values while height has 29 (and is the second longest column)also i have read about Foreign key but not sure how in my case it will help with my DB designas the data in dictionary table does not change any other suggestions about how to build the tables are welcome thanks for helpingTalCREATE TABLE main ( id INTEGER (11) UNSIGNED NOT NULL AUTO_INCREMENT, date DATE, first_name VARCHAR (20) NOT NULL, family_name VARCHAR (20), email VARCHAR (40) NOT NULL, password VARCHAR (150) NOT NULL, personal_question VARCHAR (1), personal_ansewr VARCHAR (40), gender VARCHAR (1), first_born VARCHAR (3), date_of_birth DATE, country_of_birth INTEGER (3) UNSIGNED, family_status INTEGER (1) UNSIGNED, number_of_children INTEGER (1) UNSIGNED, children_residens INTEGER (1) UNSIGNED, country_of_residence INTEGER (3) UNSIGNED, city_of_residence VARCHAR (20), city_of_residence_abroad VARCHAR (20), residence_adress VARCHAR (20), zipcode VARCHAR (5), religion VARCHAR (20), education INTEGER (2) UNSIGNED, occupation INTEGER (2) UNSIGNED, finance_status INTEGER (1) UNSIGNED, health INTEGER (1) UNSIGNED, smoking_habit INTEGER (1) UNSIGNED, drinking_habit INTEGER (1) UNSIGNED, massenger_email VARCHAR (40), skype_user_name VARCHAR (20), phone_number VARCHAR (11), phone_number_second VARCHAR (11), height INTEGER (2) UNSIGNED, body_type INTEGER (1) UNSIGNED, eye_color INTEGER (1) UNSIGNED, heir_style INTEGER (1) UNSIGNED, heir_color INTEGER (1) UNSIGNED, heir_lenght INTEGER (1) UNSIGNED, film_participate VARCHAR (3), interest_fields VARCHAR (511), PRIMARY KEY (id) );CREATE TABLE dictionary ( id INTEGER (3) UNSIGNED, personal_question VARCHAR (20), gender VARCHAR (6), first_born VARCHAR (3), country VARCHAR (22), family_status VARCHAR (13), number_of_children VARCHAR (17), children_residens VARCHAR (28), religion VARCHAR (12), religion_level VARCHAR (12), education VARCHAR (24), occupation VARCHAR (20), finance_status VARCHAR (10), health VARCHAR (20), smoking_habit VARCHAR (22), drinking_habit VARCHAR (26), height VARCHAR (28), body_type VARCHAR (10), eye_color VARCHAR (20), heir_style VARCHAR (12), heir_color VARCHAR (13), heir_lenght VARCHAR (12), film_participate VARCHAR (3), PRIMARY KEY (id) );

Link to comment
Share on other sites

in your main table I suggest you keep only the following as real info:iddate first_namefamily_nameemail passwordpersonal_answergender first_born (1 or 0) date_of_birthnumber_of_children children_residenscountry_of_residence city_of_residence city_of_residence_abroadresidence_adress zipcode finance_status massenger_email skype_user_name phone_numberphone_number_second height heir_lenght film_participate And the following in their own tables with a reference in the main table: personal_question (because this may only be say 5 to 10 pre-defined questions)country_of_birthfamily_statusreligioneducationoccupationhealthsmoking_habit, drinking_habit (no, yes, occasional)body_typeeye_color, heir_color (you could just have a color table)heir_styleinterest_fields (you could just a have a pre-defined list of say 20 popular pass times)As for the children you'd need to think of a different format, because if the number_of_children is zero, or NULL, then the next information is just not needed

Link to comment
Share on other sites

hy djpso you say dividing the dictionary table into many tables one for each category is the way to go(in example hair color and eye color will be one table)now the question pops up my head iswill the extra queries i will make have a major effect on my over all performance ?or will it run fine with many tables that are small in size ?and if so how will joins and unions effect my db with many small tables ?thanksTal

Link to comment
Share on other sites

It won't be a problem if you preform the joins correctly, and at the right point, your queries will run faster this way, for example:SELECT * FROM country, usersWHERE country.code = users.country_refAND country_name LIKE 'A%'vsSELECT * FROM country, usersWHERE country_name LIKE 'A%' AND country.code = users.country_refThe first one is logical, its the way you learn sql, join the tables first, so you don't forget to after, but, look at this more closely and you can see that by joining the tables at that point, you're storing all the fields of both tables what are linked in the sql server's memory, which could be a very large amount of information, then in the next step you filter out the countries and it's references by only retaining the countries starting with A.But in the second sql, we knock off all the countries which don't have an A to start with, and so we've only got... I don't know 10 or so countries? And then we join, we only end up storing a reduced amount of information in the memory, and it runs faster.So there's the aspect that you need to make your tables small, but also, by looking closer at your sql, you can reduce execution time, I have a complex query last week, took 40,6 seconds to execute and I tried changing some stuff and got it down to around 20 seconds, and then I looked harder and make it less memory obsessed and it got down to 1.27 seconds ! That was working on a 4 tables with around 8 columns max. but one table had 16000 rows !

Link to comment
Share on other sites

Those two queries are actually going to run about the same, it still starts with a cartesian product of both tables. It would be more efficient to do this:SELECT * FROM country INNER JOIN usersON country.code = users.country_refWHERE country_name LIKE 'A%'Since it joins on a certain column, instead of creating a cartesian product it's only going to have all of the rows that actually do join, and then the where clause is going to filter that result. So you start with a smaller result set instead of filtering the entire product.It's going to be faster if you don't have an extra table for things that don't have many options and don't change. Things like gender, hair color, eye color, smoking/drinking etc are better off as ENUM columns where you say what the possible values are instead of a reference to another table. It makes sense for countries to go in their own table, or maybe things like religion or occupation or whatever, but things like height or number of children can just be a number and don't even need a set of possible values.

Link to comment
Share on other sites

thanks both for answering but i still dont understand a few things and i now understand how important planing my DB before starting to work really is :)1 - about many tables --------------------------in my dictionary table, i have the unique id as my primary key column and in my select queries i specify what column and what row i want to read (select dictionary.country from dictionary where dictionary.id=main.country)will it still make more seance dividing the dictionary table (that now has 23 columns that dont change) to "about" 23 tables each with 2 columns (for example - id,eye color)will the queries be faster or less memory expansive when doing it like that - to divide the dictionary table ?2 - about the enum -----------------------that is a list of possible values that the row/column cell can take for example if i have a color emun of (red,black,blue,orange,white) and i insert the enum row/column cell the second option in my color enum it will store black at the row/column cell, and so will not save space (like storing 2 and reffer it to the dictionary table)it is 1 char "2" insted of 5 chars "black" (for each user in my main table (many rows long))or are you taking only at single note columns as gender, or short number as height column ?3 - about the select query ------------------------------i have the country code in my main table (the one with the unique user id)and i have the corrosponding country code as a key in my dictionary table (the one with the country list)and a coulmn for the actual country names as for now my select query looks like this $query="SELECT dictionary.country FROM dictionary,main WHERE main.id=(unique user id) AND main.country=dictionary.id ";i see what you have done in your select query example switching the order of the WHERE rules and i have two WHERE rules in my select query one to select the unique user from main two select the right country from dictionarytaking that main table will have many many rows and dictionary table will stay the same what WHERE rule should i put first - unique user or country name ?thanksTal

Link to comment
Share on other sites

1 - about many tables
I've never seen the setup you're trying to use, that's sort of odd. Most people don't use a single lookup table for several different possibilities, you have one table per item. That doesn't mean you need 23 tables, it's pretty pointless to have an eye color table where the only purpose is to hold 5 or 6 eye colors that aren't going to change. The way I see it, you need 1 table to hold your main information. The lookup tables should be country, religion, and occupation. Maybe "hair style" if the list of hair styles is long and you want to easily add new ones. All other columns, including city, gender, color of anything, etc, don't need to be in some separate table, just put the data in the main users table. Make the columns like eye color just regular varchar columns that hold whatever text you have for it. The eye color could also be an enum like gender.When you add too many tables it's going to increase query time a lot for certain lookups. Keep the number of tables to a minimum (that doesn't mean one giant lookup table for all possible data), but enough so that you don't have to duplicate a lot of data. Think of each table as an object. You have your person object, country object, religion object, etc, and they have relationships where a person lives in a country, practices a religion, etc. When you have a 1-to-1 or 1-to-many relationship, where you have many users and many countries, and each user only lives in one country, and any country might have several users living in it, the best way to represent that is to have a users table with a country ID column that points to the country record in the countries table. If you have a many-to-many relationship, like if you were allowed to select more than one religion, where one user might have several religions, and one religion might have several users, then in that case you need a users table, plus a religions table, plus a third table to hold user ID/religion ID combos to say that a certain user practices a certain religion. Something like eye color sounds like a 1-to-1 relationship where you would need another table, it just seems pretty pointless to have an eye color table where the only data that the table will ever hold is records for "blue", "brown", "green", "grey", etc. The main advantage of storing the choices in another table is that it makes it easy to add new choices or edit the ones already there without needing to update all of the users that use that option. You can change "South Korea" to "Republic of Korea" just by changing the one name in the countries table, and don't need to update all of the users that point to that ID, the ID won't change. Over time you might want to add new religions or occupations or whatever, but they aren't going to all of a sudden invent a new eye color or gender.
it is 1 char "2" insted of 5 chars "black" (for each user in my main table (many rows long))
You're assuming that it actually stores the string "black" in the field. It's probably more likely that it stores the enum values as a numbered list, and stores the number in the column instead. You don't specify a length for an enum column, only the possible values. That makes me assume that it's not storing the actual value in the field, just a reference to it in the list. Sort of like using a lookup table but more efficient.
what WHERE rule should i put first - unique user or country name ?
The order of things in the WHERE clause doesn't matter much, more important is the order that you do joins in. Taking that example, instead of doing this:SELECT dictionary.country FROM dictionary,main WHERE main.id=(uid) AND main.country=dictionary.idIt would be better to do this:SELECT dictionary.country FROM dictionary INNER JOIN main ON main.country=dictionary.idWHERE main.id=(uid)The reason that's better is because with a join it creates a product of two tables. If your users table has 5000 users in it, and your dictionary table has 250, doing the join you have in the first example is going to produce a temporary table with (5000 * 250 = 1250000) rows in it, and then it's going to filter those million-plus rows with the conditions in the WHERE clause. The reason it takes the product of the entire table is because you're not telling it which column to join on, you're just telling it to join the two tables and then filter the result. The second example tells it which column to join on, so instead of joining all possible rows, it's only going to join the rows that match. It's not going to join one user with all 250 rows from the other table, it's only going to join the user with the 1 row that matches. So you end up with a much smaller data set to filter for the ID column.Just to be clear, this syntax, where you list tables separated by commas:SELECT dictionary.country FROM dictionary,main WHERE main.id=(uid) AND main.country=dictionary.idis the same as this:SELECT dictionary.country FROM dictionary INNER JOIN mainWHERE main.id=(uid) AND main.country=dictionary.idIt just does an inner join, but not on any particular column, just a straight cartesian product. It essentially multiplies the two tables together. The WHERE conditions are not used in the join, they are used to filter the result. That's why it's better to use an explicit inner join and specify the column to join on, and then filter those results.I feel like I'm rambling at this point, so digest that and let us know if you have other questions.
Link to comment
Share on other sites

hy justsomeguythanks for explaining all of that i have figured it is better for me to work more with "enum"as i have many "many-to-one" relationship a few that dont change at alland more that shouldn't change, but is still possible for them to change so is there a problem ALTER-ing a "enum" column after the DB is running and has rows writen to it ?will it damage my data already stored ?or is there a way of doing it graisfully ?as for the specific joins your explanation was good enough for me to get how it can work thanks again it is really helpful for me having some one to ask for helpTal

Link to comment
Share on other sites

so is there a problem ALTER-ing a "enum" column after the DB is running and has rows writen to it ?
I don't think there would be a problem with adding new values. There might be a problem removing or changing existing values.
Link to comment
Share on other sites

really thank you for the time you put into this forum justsomeguyit really help me with my sql skills so now i have a question about seperating the dictionary table (my refarance table)into two because i have 8 columns with less then 10 rows one column with 100 rows and another with 220 rows so my question is - does having such a difference between column row number (for the referance table) effect the efficency of the DB with queries ?thank youTal

Link to comment
Share on other sites

Yeah you don't want to put all of those columns in the same table. Each row includes all of the columns, if your table has 10 columns then your 220 rows that only need 1 column are still going to take up space for the other 9 columns. If you have unused columns for several rows that's a good hint that you should break it up into more than one table.

Link to comment
Share on other sites

okso i am very close to doing thisand i came to another problem i have done a select query writen below in order to get all the data on one user, with the data from the dictionary table already translated (from number to the actual data)the problem is:in my main table i have two columns about country (country_of_birth and country_of_residance)in my dictionary table i have one column for country names (because why do i need two columns the same)but with my select query a can not get the translation for both the country columns(with different column names, then what there is in the dictionary table)so as i understand if i want to do my select query as it is belowi do need two columns for country, with different column name (one for birth and one for residance) ?did i get it right ?is there a better why of doing the select query ?(so i will get all of my user data (after translation - with the dictionary table))(and with one column for the country names) ?thanks again for helpingTal SELECT *FROM data INNER JOIN (SELECT dictionary.occupation FROM main INNER JOIN dictionary ON main.occupation=dictionary.id WHERE main.id=$user_id) AS temp1 INNER JOIN (SELECT dictionary.country_of_birth FROM main INNER JOIN dictionary ON main.country_of_birth=dictionary.id WHERE main.id=$user_id) AS temp2 INNER JOIN (SELECT dictionary.country_of_residance FROM main INNER JOIN dictionary ON main.country_of_residance=dictionary.id WHERE main.id=$user_id) AS temp3...and so on to 10 columns of "translated" data...ON main.id=$user_id

Link to comment
Share on other sites

You need a join for each column that references another table, so if you have 10 columns that reference the dictionary table then you need to use 10 joins in the query. That's one of the reasons why it's better to put everything you can in the main table, so that you can cut down on joins.

Link to comment
Share on other sites

what i wanted to understand is do i have to have two country columns in my dictionary table (as refference for my two columns in my main table)or can i refference the two columns from my main table to one column in my dictionary table ?and so save space on a column in the dictionary table thanksTal

Link to comment
Share on other sites

oh so maybe i am doing something wrong this is what i am doing my main table and my dictionary tablethat both columns from main table (main.country_of_birth and main.country_of_residence)refference the real data from (dictionary.country)and my select query all below am i doing it wrong, because it does not work for me i get only one column refferenced right and the other one is left as a number and not as the country name Thank you very muchTalmain table ------------ CREATE TABLE main ( id INTEGER (11) UNSIGNED NOT NULL AUTO_INCREMENT, country_of_birth INTEGER (3) UNSIGNED, country_of_residence INTEGER (3) UNSIGNED, PRIMARY KEY (id) );dictionary table -------------------CREATE TABLE dictionary ( id INTEGER (3) UNSIGNED, country VARCHAR (24), PRIMARY KEY (id) );select query ---------------SELECT * FROM main INNER JOIN (SELECT dictionary.country FROM main INNER JOIN dictionary ON main.country_of_birth=dictionary.id WHERE main.id=$user_id) AS temp1 INNER JOIN (SELECT dictionary.country FROM main INNER JOIN dictionary ON main.country_of_residence=dictionary.id WHERE main.id=$user_id) AS temp2 ON main.id=$user_id;

Link to comment
Share on other sites

SELECT main.id, d1.country AS country_of_birth, d2.country AS country_of_residence FROM main INNER JOIN dictionary AS d1 ON main.country_of_birth=d1.idINNER JOIN dictionary AS d2 ON main.country_of_residence=d2.idWHERE main.id=$user_id

Link to comment
Share on other sites

wow thanks it works great now one last question as for now i am on JOIN #10and each JOIN is done ON that a spesific column for example(INNER JOIN $sqltable3 AS d1 ON $sqltable1.personal_question=d1.idINNER JOIN $sqltable3 AS d2 ON $sqltable1.number_of_children=d2.id)is there a way of merging these JOIN's to one as they are made on the same table but not the same column maybeINNER JOIN $sqltable3 AS d1,d2 ON $sqltable1.personal_question=d1.id,$sqltable1.number_of_children=d2.idwill it create the same size of table as the two separate JOIN's ?will it save on prossecor time ?(got an error when trying it, but the idea is there)really thank you justsomeguyyou sure helped me with my SQLTal

Link to comment
Share on other sites

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...