Jump to content

SELECT JOIN - A Failed Query


iwato

Recommended Posts

PROBLEM:  Turn a two-step SELECT procedure into a one-step procedure. The two-step procedure is 

1) Discover with a query to a parent table what other rows in the parent table are related to the queried row.

2) Obtain selected data for the queried row and other related rows from the parent table and a child table that are connected by a FOREIGN KEY.

BACKGROUND:  I have two tables -- a parent (parent_table) and a child (child_table) table -  connected by a valid FOREIGN KEY and an additional table (ref_table) that contains information about the relationship among the rows of both the parent and child tables.  Please find below the results of three SHOW CREATE TABLE  statements to help you in your understanding of the table structure.  In addition, I have included the INSERT statements for the child_table and ref-table.  I accidentally destroyed the INSERT statement for the parent_table.  This said, the parent and child tables are very similar in structure.

DISCLAIMER:  Please understand that the problem that I have created is heuristic in nature and is being used to create a prototype for subsequent, more practical use.  

The PARENT Table

parent_table
CREATE TABLE `parent_table` (
	`id` int(3) NOT NULL DEFAULT '0',
	`usertype` enum('1','2','3') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '2' COMMENT '1=good, 2=neutral, 3=bad',
	`username` char(150) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
	 PRIMARY KEY (`id`,`usertype`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The CHILD Table

child_table
CREATE TABLE `child_table` (
	`id` int(3) NOT NULL DEFAULT '0',
	`usertype` enum('1','2','3') CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '2' COMMENT '1=good, 2=neutral, 3=bad',
	`userbio` varchar(500) DEFAULT NULL,
	 KEY `parent` (`id`,`usertype`),
	 CONSTRAINT `child_table_ibfk_1`
	 FOREIGN KEY (`id`, `usertype`)
	 REFERENCES `parent_table` (`id`, `usertype`)
	 ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=latin1

INSERT INTO
	`child_table` (`id`, `usertype`, `userbio`)
	 VALUES ('1', '1', 'I am Roddy.'),
			('2', '2', 'I am Beth.'),
			('3', '3', 'I am Matt.'),
			('4', '3', 'I am Tim.'),
			('5', '2', 'I am Tylor.'),
			('6', '1', 'I am Liz.'),
			('7', '1', 'I am Aldo.'),
			('8', '1', 'I am Adzit.'),
			('9', '3', 'I am Jason.'),
			('10', '3', 'I am David.')

The REFERENCE Table

ref_table
CREATE TABLE `ref_table` (
	`ref_id` int(3) NOT NULL DEFAULT '0',
	`id` int(3) DEFAULT NULL,
	`ref` int(3) DEFAULT NULL,
	`count_ref` int(1) DEFAULT NULL,
	 KEY `par_ref` (`ref_id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

INSERT INTO ref_table (ref_id,id,ref,count_ref)
	VALUES  ('1','1','2','1'),
			('2','1','6','2'),
			('3','1','7','3'),
			('4','2','6','1'),
			('5','2','9','2'),
			('6','4','1','1'),
			('7','4','2','2'),
			('8','4','6','3'),
			('9','4','10','4'),
			('10','5','6','1'),
			('11','6','1','1'),
			('12','6','2','2'),
			('13','6','9','3'),
			('14','7','5','1'),
			('15','7','6','2'),
			('16','8','1','1'),
			('17','9','10','1'),
			('18','10','4','1'),
			('19','10','6','2'),
			('20',10, NULL,'1');

EXPLANATION BY EXAMPLE:  Say a user is interested in row 4 of the parent_table.  When the database is queried a SELECT JOIN statement looks in the id field of the ref_table and finds four corresponding rows identified by ref_id  6, 7, 8, and 9.   With each of these latter rows is associated a different row -- namely, 1, 2, 6, 10.  Without further selection is returned all of the information contained in the parent_table and child_table associated with rows 1, 2, 46, and 10.  This is what is supposed to happen, but does not.

The REJECTED SQL STATEMENT

SELECT * FROM parent_table OUTER JOIN child_table ON parent_table.id = child_table.id OUTER JOIN ref_table ON parent_table.id = ref_table.id

QUESTION ONE:  Does the table structure make sense?  Are the constraints properly set?

QUESTION TWO:  Why is my SQL statement rejected as poorly formatted?

This is my first attempt to use the JOIN clause.  So, please be as thorough with your answer as possible.  I must believe that I have a long road ahead with MySQL and should prepare for it as i move forward.

Roddy

Link to comment
Share on other sites

Does the table structure make sense?

No.  These are some of the questions I have:

Why are you using a composite primary key?  It doesn't look like you have more than one row for any one ID, so why isn't the ID the primary key?

Why is the username char and not varchar?  And why uft8_bin for it?

What's the purpose of the child table at all?  There's only one additional field there - will each ID/usertype combo have multiple descriptions?  If not, then I don't see a reason for it.

What is the meaning of each of the four fields in the ref table?

This might make more sense if you export the data from the parent table so we know what's in there also.

Link to comment
Share on other sites

UPDATE:   Since I last posted I have acquired new information and am now well positioned to pose a more precise question.

BACKGROUND:  The following query on the above tables yields the data shown below the query.

SELECT * FROM parent_table LEFT OUTER JOIN child_table ON parent_table.id = child_table.id LEFT OUTER JOIN ref_table ON parent_table.id = ref_table.id WHERE parent_table.id = '2'
id: 2
usertype: 2
username: beth
userbio: I am Beth.
ref_id: 4
ref: 6
count_ref: 1

id: 2
usertype: 2
username: beth
userbio: I am Beth.
ref_id: 5
ref: 9
count_ref: 2

The GOAL:  Obtain the above information as well as that corresponding to each of the rows identified in the ref field by the values 6 and 9.  In particular,

id: 2
usertype: 2
username: beth
userbio: I am Beth.

id: 6
usertype: 1
username: liz
userbio: I am Liz.

id: 9
usertype: 3
username: jason
userbio: I am Jason.

QUESTION:  How to I write the SQL statement to obtain the above information?

Edited by iwato
Link to comment
Share on other sites

INSERT INTO
	`parent_table` (`id`, `usertype`, `username`)
	 VALUES  ('1', '1', 'roddy.'),
			 ('2', '2', 'beth'),
			 ('3', '3', 'matt'),
			 ('4', '3', 'tim'),
			 ('5', '2', 'tyler'),
			 ('6', '1', 'liz'),
			 ('7', '1', 'aldo'),
			 ('8', '1', 'adjit'),
			 ('9', '3', 'jason'),
			 ('10', '3', 'david')

The parent_table Data Set

Link to comment
Share on other sites

RESPONSE TO JSG's QUESTIONS

QUESTION:  Why are you using a composite primary key?  It doesn't look like you have more than one row for any one ID, so why isn't the ID the primary key?

RESPONSE:  The above prototype will be applied (if I can get it to work) to podcast episodes. As it is possible, even likely that some podcasts will require revision, there may be multiple entries for a single podcast -- in effect, multiple rows with the same podcast number.

---

QUESTION:  Why is the username char and not varchar?  And why uft8_bin for it?

RESPONSE:  The primary difference between the CHAR and VARCHAR type is one of length.  As this is a heuristic prototype and the data of the variables (fields/columns) associated with a particular podcast must fit into a template, I must be able to control their length.  Assigning more characters than is necessary simply clutters the database with unused volume. 

---

QUESTION:  And why uft8_bin for it?

RESPONSE:  Yes, if this were not a multipurpose prototype, then I would agree:  the username would be restricted to an ASCII format.  This said,  I am catering to an international audience and other fields cannot be restricted to the ASCII format.

---

QUESTION:  What's the purpose of the child table at all?  There's only one additional field there - will each ID/usertype combo have multiple descriptions?  If not, then I don't see a reason for it.

RESPONSE:  This prototype will eventually be applied for the production of RSS feeds.  Apple Computer uses its own namespace that is different from the generic RSS2 channel and item elements.  There will likely be other podcast hosts that will require custom namespaces.  I have three tables for creating RSS2 feeds:  channel, item, and iTunes.  Placing everything into one table is confusing and partitioning a table adds unwanted complexity.

---

QUESTION:  What is the meaning of each of the four fields in the ref table?

RESPONSE:

  1. ref_id is the ref_table row number.
  2. id corresponds to a particular row in the parent_table.
  3. ref corresponds to a row that is referenced by the selected row in the parent_table.  One parent_table row can refer to multiple other rows within the parent_table.
  4. ref_count is an ordinal counter that both counts and ranks the relative importance of the  referenced rows listed in the ref column for a particular parent_table row.

 

Link to comment
Share on other sites

VARCHAR almost always takes less space than CHAR, because it allows fewer characters than the limit while CHAR requires the full space to be occupied. If you put a smaller string in a CHAR field, it pads the rest with null bytes.

If you put the string "Hello" in a CHAR(150) field, what you essentially have in the table is "Hello\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"

  • Thanks 1
Link to comment
Share on other sites

The above prototype will be applied (if I can get it to work) to podcast episodes. As it is possible, even likely that some podcasts will require revision, there may be multiple entries for a single podcast -- in effect, multiple rows with the same podcast number.

I always use a single numeric ID as the primary key unless I have a really good reason to do anything else, like if it is only a lookup table with 2 columns and then I make both columns primary.  The ID is used internally by the database only and would never need to be changed, I do not use it as an identifier that anyone else would need to know.  If you want to give your podcasts a number for people to identify them, make that another database field other than the ID.  If you want a revision number, make that another database field.  That way the ID will refer to a specific episode and revision.  For as long as I've been doing this I haven't found a great use case for not just using an auto-increment primary key.

The primary difference between the CHAR and VARCHAR type is one of length.

No, it's one of storage.  A char field always stores the maximum number of characters that the field allows.  A varchar only stores what is necessary but has a maximum length.  See here for storage requirements.  Note also that an int field always uses 4 bytes regardless of the length you gave it.  If a number has a smaller range than use one of the smaller int types.

Assigning more characters than is necessary simply clutters the database with unused volume.

I agree, and that is what char fields do, they always store the maximum number of characters.  You would only use a char field if the values will always be the same length, like for a GUID.

This said,  I am catering to an international audience and other fields cannot be restricted to the ASCII format.

So why not utf8_unicode_ci?  Do you want to allow people to have the same username with different capitalization or characters that look the same but are different unicode code points?  That's what using utf8_bin would allow, it looks at binary data only and does not take into account case or whether two characters have the same meaning but are different code points.

Placing everything into one table is confusing and partitioning a table adds unwanted complexity.

It's just a question about data normalization.  Databases have been around for decades, we have this figured out now.  If there is a one-to-one relationship, that goes in a single table.  If there is a one-to-many relationship, that's a parent and child table.  If there is a many-to-many relationship, that's 2 tables with a third lookup table.  The database is specifically designed to work with this kind of data, let it do its job.  Don't increase the time required to insert, update, or select data because it's less complex for you.  Data storage should be optimized for a computer, not a person.  You can display it however you want it to show up, but make it easy for the database to do its job.

ref_id is the ref_table row number.

That should be an auto-increment primary key.

ref_count is an ordinal counter that both counts and ranks the relative importance of the  referenced rows listed in the ref column for a particular parent_table row.

There's no reason to store a counter like that in a table, you can use SQL to get the count and then it's always going to be correct and you don't have to worry about updating it when things change.  The way that's set up now, if you select the rows and order them by ID then it makes the count redundant, you don't need to store that data.  I do use columns like that for display order if I want to allow the display order to be specified (as opposed to ordering by date or some other field), where it will be a sequential series, but if you're only storing the count there's no reason to do that.

  • Thanks 1
Link to comment
Share on other sites

Having received all of the very useful information about table structure, column constraints, variable type, variable choice, and character set -- much of which I will certainly implement in my final production I feel no closer to the solution of my original problem -- the failed grammar of my SQL statement .  So as to renew focus  I would like to draw your attention to the following entry 

This entry provides a SQL statement with two sets of output.  The first set of output is what is realized by the provided SQL statement.  The second set of output is what is actually desired -- namely, full information for not only the selected row, but also each of the rows to which the selected row refers. The question, then, is how to write the SQL statement to achieve not the realized result, but the desired result.

Once again, as I am still relatively new to MySQL, please accompany your solution with an explanation.  Reading the online MySQL manual is simultaneously instructive and confusing, and I feel very fortunate to have got anything returned at all.

Roddy

Edited by iwato
Link to comment
Share on other sites

It's important to design the database right from the start, that's the purpose of the other commentary.  There's no point in trying to fit a square peg into a round hole when you can instead just make the peg round.

This is kind of a mess to do with joins, it's probably easier with a union.  Maybe something like this might work:

SELECT * FROM table WHERE id=10
	UNION
	SELECT * FROM table WHERE id IN (SELECT ref FROM ref_table WHERE id=10)

Link to comment
Share on other sites

Is there perhaps a missing alias in your proposed statement?  The phpMyAdmin SQL console is reporting the following error:

#1064 - You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near table.

WHERE
  id = 10
UNION
SELECT
  *
FROM
  table
WHERE

Roddy

Link to comment
Share on other sites

SELECT
  * parent_table
WHERE
  id = 10
UNION
SELECT
  *
FROM
  parent_table
WHERE
  id IN(
  SELECT
    ref
  FROM
    ref_table
  WHERE
    id = 10
)

This was copied from the SQL module window of the myPhpAdmin application.  As JSG's reply assumed that I had combined the parent_table with the child_table, I ran his response only for the parent_table.

I also changed the value of id in the belief that the NULL value entered into row 10 might affect the result.  The error message was identical.

Edited by iwato
Link to comment
Share on other sites

Yes, that is correct.  It now returns the information contained in parent_table for the originally queried row as well as the each of the rows to which the originally queried row refers in ref_table.  Now, in order to add the corresponding information from the child table would it be sufficient to simply add child_table after parent_table as shown below,

 

SELECT
  *
FROM
  parent_table
WHERE
  id = 10
UNION
SELECT
  *
FROM
  parent_table, child_table
WHERE
  id IN(
  SELECT
    ref
  FROM
    ref_table
  WHERE
    id = 10
)

or must I add another UNION statement such as what follows?

SELECT
  *
FROM
  parent_table
WHERE
  id = 10
UNION
SELECT
  *
FROM
  parent_table
WHERE
  id IN(
  SELECT
    ref
  FROM
    ref_table
  WHERE
    id = 10
)
UNION
SELECT
  *
FROM
  child_table
WHERE
  id IN(
  SELECT
    ref
  FROM
    ref_table
  WHERE
    id = 10
)

Roddy

Link to comment
Share on other sites

Yes, that is correct.  It now returns the information contained in parent_table for the originally queried row as well as the each of the rows to which the originally queried row refers in ref_table. 

id: 10
usertype: 3
username: david

id: 4
usertype: 3
username: tim

id: 6
usertype: 1
username: liz

The next step is to do the same, but include the data from the chile_table.  Unfortunately, the following SQL statement does not return any data from child_table, and duplicate like variables:

SELECT * FROM parent_table LEFT OUTER JOIN child_table ON child_table.id = parent_table.id WHERE parent_table.id IN(SELECT ref FROM ref_table WHERE ref_table.id = 10) UNION SELECT * FROM parent_table LEFT OUTER JOIN child_table ON child_table.id = parent_table.id WHERE parent_table.id IN(SELECT ref FROM ref_table WHERE ref_table.id = 10)

RETURN VALUES

id usertype username id    usertype userbio
4  3        tim      NULL  NULL     NULL
6  1        liz      NULL  NULL     NULL

I can do UNION or JOIN, but i cannot do both together.

Roddy

Link to comment
Share on other sites

jSG:  What exactly did  you have in mind.  I have now tried just about every associative relation possible.  The result is, without exception, a failed query.  i believe a new approach is required.

Roddy

Link to comment
Share on other sites

I mean you need to group the queries to make it explicit what you're trying to do:

(SELECT * FROM parent_table LEFT OUTER JOIN child_table ON child_table.id = parent_table.id WHERE parent_table.id IN(SELECT ref FROM ref_table WHERE ref_table.id = 10))
	UNION
	(SELECT * FROM parent_table LEFT OUTER JOIN child_table ON child_table.id = parent_table.id WHERE parent_table.id IN(SELECT ref FROM ref_table WHERE ref_table.id = 10))

Link to comment
Share on other sites

Insofar as the query is not rejected, your grouping is better than all of those that I tried.  Unfortunately,  the results are nearly an empty set and do not provide any information for the original query -- only those to which the original query referred.

id: 
usertype: 
username: tim
userbio: 

id: 
usertype: 
username: liz
userbio: 

Compare the following results with those above 

id: 10
usertype: 3
username: david

id: 4
usertype: 3
username: tim

id: 6
usertype: 1
username: liz

The desired, but not forthcoming result is

id: 10
usertype: 3
username: david
userbio: I am David.

id: 4
usertype: 3
username: tim
userbio: I am Tim. 

id: 6
usertype: 1
username: liz
userbio: I am Liz.

Roddy

ps.  Depending on whether I run the statement with MySQLi or myPhpAdmin the results are different, but still incomplete. 

Edited by iwato
Link to comment
Share on other sites

Now we're back to database design.  I don't see a reason for you to design the database like you did, one-to-one data should go in the same table.  You shouldn't need a join at all.

There's another problem I just noticed though.  The two queries that are being unioned are the same query.  There's no reason to do that, they're going to return the same data.  That's not what I was describing when I suggested that query format, the queries that I union together are not the same.  One of them selects the primary row and one of them selects the related rows.

Depending on whether I run the statement with MySQLi or myPhpAdmin the results are different

That suggests that your PHP code is incorrect somewhere.  phpMyAdmin is a PHP program, it's probably using PDO or maybe mysqli, it should be no different than your own code.  If you're getting different results then it sounds like your code is doing something wrong.

Link to comment
Share on other sites

Some progress has been achieved.  Now all matched rows are reporting, but still there is no data coming from child_table.

Also, myPhpAdmin and MySQLi are reporting different results.  

(SELECT * 
	FROM parent_table
	LEFT OUTER JOIN child_table
		ON child_table.id = parent_table.id
		WHERE parent_table.id = 9
)
UNION
(SELECT * 
	FROM parent_table
	LEFT OUTER JOIN child_table
		ON child_table.id = parent_table.id
		WHERE parent_table.id
			IN (SELECT ref
					FROM ref_table
					WHERE ref_table.id = 9
			)
)
MySQLI

id: 
usertype: 
username: jason
userbio: 

id: 
usertype: 
username: david
userbio: 
myPhpAdmin

id usertype username id    usertype userbio

9  3        jason    NULL  NULL     NULL
10 3        david    NULL  NULL     NULL

 

Link to comment
Share on other sites

A closer look suggests confusion in what is being read.  Notice the difference in the number of ID values between the previous entry and this latter one.  In the previous case were received values for three IDs.  In this case only two IDs are reported.  What is more, the reported values are completely different!

Roddy :huh::o:mellow:

p.s.  Please do not interpret my reluctance to rebuild my database structure. I need to move ahead quickly.  Already a year has past, and I have yet to publish a single podcast!  Once my project has been tested, and I know that it is has been worth the investment, I can return and refurbish.  i have kept good record of what I have done  and still have much to do, before i can launch.

Edited by iwato
Link to comment
Share on other sites

In the previous case were received values for three IDs.  In this case only two IDs are reported.  What is more, the reported values are completely different!

Well yeah, you're asking for different data sets.  One of them was for ID 10, and this one is for ID 9.  That ID should change in the query depending on whatever your primary record is.

At this point I would specify the columns you want from each table.  e.g.:

SELECT p.id, p.usertype, p.username, c.userbio
    FROM parent_table AS p
    LEFT OUTER JOIN child_table AS c

phpMyAdmin is showing all of the columns from both tables because it's accessing the row differently than you're doing in your code.  In your code, $row['id'] only refers to a single value, phpMyAdmin is looping through the columns and showing them that way.  There's no reason to return the duplicate ID and usertype columns though (there's also no reason to store the usertype in more than one table at all, it wastes space and creates a consistency problem if you update one table but forget to update another one).

Once my project has been tested, and I know that it is has been worth the investment, I can return and refurbish.

Like they say, you can either take the time to do it right or make the time to do it over.  People often choose the latter, even though it's more difficult to unlearn and relearn things like this.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...