SQL Joins tutorial / refresher

Considering the following simple schema:

CREATE TABLE one (
id int,
c_name varchar(255),
c_info1 varchar(255),
c_info2 varchar(255));
CREATE TABLE two (
id int,
c_name varchar(255),
c_info1 varchar(255),
c_info2 varchar(255));

insert into one values(1,'aaa','','');
insert into one values(2,'bbb','','');
insert into one values(3,'ccc','c','');
insert into one values(4,'ccc','','');
insert into one values(5,'ccc','','');
insert into one values(6,'ddd','d','');
insert into one values(7,'ddd','','');
insert into one values(8,'eee','','');

insert into two values(1,'aaa','','');
insert into two values(2,'ccc','','');
insert into two values(3,'ccc','','');
insert into two values(4,'ddd','','');
insert into two values(5,'eee','e','');
insert into two values(6,'fff','','');


select count(distinct c_name) from one; -- 5
select count(*) from one; -- 8

Inner joins

The default join is an inner join, therefore the ‘inner’ modifier does not need to be specified. Inner joins are about getting the commonality between the tables as defined in the join clause. There is no extra left/right consideration and swapping the tables around makes no difference to the content of the resulting table.
The following 4 selects are equivalent:

select * from one join two on one.c_name=two.c_name;
select * from one inner join two on one.c_name=two.c_name;
select * from two join one on two.c_name=one.c_name;
select * from two inner join one on two.c_name=one.c_name;

Every row in the left table is evaluated in turn, against every row in the right table based on the criteria expressed in the join clause. If the conditional expression part of the join clause evaluates to true, (in this case, that the value in the c_name column/field in the left row is the same as the c_name value in the row under consideration in the right table) then the rows are joined and included in the resultant table.

-- 10 records
1 aaa 1 aaa
3 ccc c 2 ccc
4 ccc 2 ccc
5 ccc 2 ccc
3 ccc c 3 ccc
4 ccc 3 ccc
5 ccc 3 ccc
6 ddd d 4 ddd
7 ddd 4 ddd
8 eee 5 eee e

In the above example the join key ‘ccc’ occurs 3 times in the left table and 2 times in the right table. Each of the 3 records in the left table matches each of the 2 records in the right table, giving rise to (3×2) six records in the output. Therefore, you can have more records in the result set than in either table.

Note: the row from table one with id 2 and the row from table two with id 6 have been are omitted from the result set by the join based on the join clause.

Cross Joins – just give me everything.

If you take the join conditional clause out of an inner join, and formalize this by putting the ‘cross’ modifier in – you get a cross join. A cross join the same as an inner join in that every combination of pairs of records (one from each table) is considered, but as there’s no limiting join clause, all possible candidate joined rows are accepted for inclusion in the resulting table. The number of entries in the result set is always the number of rows in the left table multiplied by the number of rows in the right table.

select * from one cross join two;
-- yields 8 x 6 = 48 records

Outer joins

We’re not always interested in what is common between two tables, we also want to know what is unique to either or both tables. Outer joins help us with this. There are 3 types of outer join

Left Outer

Changing the join type to left outer causes the query to return additional rows above and beyond an inner join. A “left outer” join specifies that whether or not a row in the left table has a matching record in the right table, it should be included in the result set. The columns in the result table which would normally be populated with the values from the matching row in the right table contain NULLs.

select *
from one left outer join two
on one.c_name = two.c_name

yields a table with one additional row above that yielded by the inner join, ie the left table row not included in the inner join as it has no matching right table row according to the join clause criteria.

1 aaa 1 aaa
2 bbb NULL NULL NULL NULL
3 ccc c 2 ccc
3 ccc c 3 ccc
4 ccc 2 ccc
4 ccc 3 ccc
5 ccc 2 ccc
5 ccc 3 ccc
6 ddd d 4 ddd
7 ddd 4 ddd
8 eee 5 eee e

This union of the commonality and the left table specific data is generally not what we’re after. Mostly we’re after the left table only data.  The nulled out data in the last result set gives us a hook for excluding the inner join rows by adding a where clause:

select *
from one left outer join two
on one.c_name = two.c_name
where two.id is null;

yields the rows specific to the left table

2 bbb NULL NULL NULL NULL

Right Outer

The complement to left outer. The resulting table is going to include all right table rows irrespective of the join clause.

select *
from one right outer join two
on one.c_name = two.c_name

yields and 11 row table containing the 10 inner join rows along with the single row specific to the right table, according to the join clause.

1 aaa 1 aaa
3 ccc c 2 ccc
4 ccc 2 ccc
5 ccc 2 ccc
3 ccc c 3 ccc
4 ccc 3 ccc
5 ccc 3 ccc
6 ddd d 4 ddd
7 ddd 4 ddd
8 eee 5 eee e
NULL NULL NULL NULL 6 fff

If adding a where clause to filter out the inner join rows, it must be based on null values in the left table columns.

select *
from one right outer join two
on one.c_name = two.c_name
where one.id is null

yields the row specific to the right table

NULL NULL NULL NULL 6 fff

Full Outer

A Full Outer join is the union of left and right outer joins

select *
from one full outer join two
on one.c_name = two.c_name

1 aaa 1 aaa
2 bbb NULL NULL NULL NULL
3 ccc c 2 ccc
3 ccc c 3 ccc
4 ccc 2 ccc
4 ccc 3 ccc
5 ccc 2 ccc
5 ccc 3 ccc
6 ddd d 4 ddd
7 ddd 4 ddd
8 eee 5 eee e
NULL NULL NULL NULL 6 fff

To exclude the inner join rows in order to limit the results to the rows specific to the left and right tables extend the where clause to detect NULLs from left and right.

select *
from one full outer join two
on one.c_name = two.c_name
where one.id is null or two.id is null;

2 bbb NULL NULL NULL NULL
NULL NULL NULL NULL 6 fff

Relationship between types of joins

If an inner join returns M rows, a left outer join on the same tables returns M + N rows, a right outer returns M + O, then a full join returns M + N + O, and a cross returns M x N.

Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_swedish_ci,IMPLICIT)

http://bugs.mysql.com/bug.php?id=18227
[14 Mar 2006 16:55] Mats-Ove Fant
Redefining the VIEW as this solves the problem:

CREATE VIEW test_view AS SELECT sysid,name FROM test_table WHERE name COLLATE
latin1_swedish_ci =
substring_index(substring_index(CURRENT_USER(),’@’,1),’.’,-1);