union
, intersect
and except
are used on 2 tables with same fields. Where as join
is performed for denormalizing 2 tables into one on a common column.
Consider 2 (similar) tables representing doctors and teachers with first name, last name and age.
doctors <- data.frame(first=c('A','B','C'), last=c('W','X','Y'), age=c(20,30,40))
teachers <- data.frame(first=c('A','B','D'), last=c('W','X','Z'), age=c(20,30,50))
# Union - doctors or teachers
sqldf("SELECT first, last FROM doctors UNION
SELECT first, last FROM teachers")
## first last
## 1 A W
## 2 B X
## 3 C Y
## 4 D Z
# Intersect - doctors and teachers
sqldf("SELECT first, last FROM doctors INTERSECT
SELECT first, last FROM teachers")
## first last
## 1 A W
## 2 B X
# Except - doctors who are not teachers
sqldf("SELECT first, last FROM doctors EXCEPT
SELECT first, last FROM teachers")
## first last
## 1 C Y
Consider 2 (very different) tables representing customers and orders. The orders table will store only customer ID and not all the customer information to save space. To find shipping address for an order, orders and customers tables are joined on common customer ID column.
customers <- data.frame(cust_id=c(1,2,3), name=c('A','B','C'), age=c(20,30,50))
orders <- data.frame(order_id=c(1,2,3), cust_id=c(1,2,4), order_qty=c(4,7,8))
# Inner join
sqldf("SELECT * FROM orders o INNER JOIN customers c ON o.cust_id=c.cust_id")
## order_id cust_id order_qty cust_id name age
## 1 1 1 4 1 A 20
## 2 2 2 7 2 B 30
# Left join
sqldf("SELECT * FROM orders o LEFT JOIN customers c ON o.cust_id=c.cust_id")
## order_id cust_id order_qty cust_id name age
## 1 1 1 4 1 A 20
## 2 2 2 7 2 B 30
## 3 3 4 8 NA <NA> NA
# Right join (simulate, because SQLITE doesn't support)
sqldf("SELECT * FROM customers c LEFT JOIN orders o ON o.cust_id=c.cust_id")
## cust_id name age order_id cust_id order_qty
## 1 1 A 20 1 1 4
## 2 2 B 30 2 2 7
## 3 3 C 50 NA NA NA
# Full join (simulate, because SQLITE doesn't support)
sqldf(
"SELECT order_id, order_qty, o.cust_id, name, age FROM orders o LEFT JOIN customers c ON o.cust_id=c.cust_id
UNION
SELECT order_id, order_qty, c.cust_id, name, age FROM customers c LEFT JOIN orders o ON o.cust_id=c.cust_id"
)
## order_id order_qty o.cust_id name age
## 1 NA NA 3 C 50
## 2 1 4 1 A 20
## 3 2 7 2 B 30
## 4 3 8 4 <NA> NA