Introduction

Set operations like union, intersect and except are used on 2 tables with same fields. Where as join is performed for denormalizing 2 tables into one on a common column.

Set Operations

Consider 2 (similar) tables representing doctors and teachers with first name, last name and age.

doctors  <- data.frame(first=c('A','B','C'), last=c('W','X','Y'), age=c(20,30,40))
teachers <- data.frame(first=c('A','B','D'), last=c('W','X','Z'), age=c(20,30,50))

# Union - doctors or teachers
sqldf("SELECT first, last FROM doctors UNION 
       SELECT first, last FROM teachers")
##   first last
## 1     A    W
## 2     B    X
## 3     C    Y
## 4     D    Z
# Intersect - doctors and teachers
sqldf("SELECT first, last FROM doctors INTERSECT 
       SELECT first, last FROM teachers")
##   first last
## 1     A    W
## 2     B    X
# Except - doctors who are not teachers
sqldf("SELECT first, last FROM doctors EXCEPT 
       SELECT first, last FROM teachers")
##   first last
## 1     C    Y

Joins

Consider 2 (very different) tables representing customers and orders. The orders table will store only customer ID and not all the customer information to save space. To find shipping address for an order, orders and customers tables are joined on common customer ID column.

customers <- data.frame(cust_id=c(1,2,3), name=c('A','B','C'), age=c(20,30,50))
orders    <- data.frame(order_id=c(1,2,3), cust_id=c(1,2,4), order_qty=c(4,7,8))

# Inner join
sqldf("SELECT * FROM orders o INNER JOIN customers c ON o.cust_id=c.cust_id")
##   order_id cust_id order_qty cust_id name age
## 1        1       1         4       1    A  20
## 2        2       2         7       2    B  30
# Left join
sqldf("SELECT * FROM orders o LEFT JOIN customers c ON o.cust_id=c.cust_id")
##   order_id cust_id order_qty cust_id name age
## 1        1       1         4       1    A  20
## 2        2       2         7       2    B  30
## 3        3       4         8      NA <NA>  NA
# Right join (simulate, because SQLITE doesn't support)
sqldf("SELECT * FROM customers c LEFT JOIN orders o ON o.cust_id=c.cust_id")
##   cust_id name age order_id cust_id order_qty
## 1       1    A  20        1       1         4
## 2       2    B  30        2       2         7
## 3       3    C  50       NA      NA        NA
# Full join (simulate, because SQLITE doesn't support)
sqldf(
 "SELECT order_id, order_qty, o.cust_id, name, age FROM orders o LEFT JOIN customers c ON o.cust_id=c.cust_id
  UNION
  SELECT order_id, order_qty, c.cust_id, name, age FROM customers c LEFT JOIN orders o ON o.cust_id=c.cust_id"
)
##   order_id order_qty o.cust_id name age
## 1       NA        NA         3    C  50
## 2        1         4         1    A  20
## 3        2         7         2    B  30
## 4        3         8         4 <NA>  NA