An upsert is a combination of "insert" and "update" where you want to insert a new row into a table if it doesn't already exist, or update the existing row if it does. This process can be tricky to handle efficiently, but PostgreSQL provides a powerful feature that makes upserts a breeze. In this blog post, we'll explore the upsert operation in PostgreSQL with examples to help you master this useful technique.
Understanding the Anatomy of Upsert
In PostgreSQL, the upsert operation is accomplished using the INSERT INTO ON CONFLICT
statement. The key to making it work is the ON CONFLICT
clause, which allows you to specify what action to take when there is a conflict between the new row and an existing row in the table.
Here's the basic syntax for the upsert operation:
INSERT INTO table_name (column1, column2, ..., columnN)
VALUES (value1, value2, ..., valueN)
ON CONFLICT (conflict_column)
DO UPDATE SET column1 = value1, column2 = value2, ..., columnN = valueN;
Let's break down the components:
table_name
: The name of the target table where you want to perform the upsert.(column1, column2, ..., columnN)
: The list of columns you want to insert data into.VALUES (value1, value2, ..., valueN)
: The values you want to insert into the specified columns.ON CONFLICT (conflict_column)
: The column that may cause a conflict (e.g., a unique constraint or primary key).DO UPDATE SET ...
: The columns you want to update in case of a conflict.
Example 1: Simple Upsert
Let's consider a hypothetical table called "employees" with the following structure:
id (Primary Key) | name | department |
1 | John | Engineering |
2 | Jane | Marketing |
Suppose we want to upsert a new employee record or update an existing one based on the "id" column. We can use the following query:
INSERT INTO employees (id, name, department)
VALUES (3, 'Alice', 'Finance')
ON CONFLICT (id)
DO UPDATE SET name = 'Alice', department = 'Finance';
In this example, if there is no record with id = 3
, a new row will be inserted. However, if a row with id = 3
already exists, the "name" and "department" fields will be updated with the new values.
Example 2: Upsert with Constraint Violation
Consider a table called "students" with the following structure:
roll_no (Unique Constraint) | name | age |
101 | John | 21 |
102 | Jane | 22 |
Now, let's perform an upsert on the "students" table:
INSERT INTO students (roll_no, name, age)
VALUES (103, 'Alice', 20)
ON CONFLICT (roll_no)
DO UPDATE SET name = EXCLUDED.name, age = EXCLUDED.age;
In this example, if a student with roll_no = 103
is not present in the table, a new row will be inserted. However, if a row with roll_no = 103
already exists, the "name" and "age" fields will be updated with the new values using the special EXCLUDED
table.
Understanding the EXCLUDED Pseudo-table
We mentioned the use of the EXCLUDED
pseudo-table within the DO UPDATE SET
clause. The EXCLUDED
table is a special table that represents the values of the conflicting row that caused the upsert operation to be triggered. It allows you to reference the incoming row's values during the update phase of the upsert.
The EXCLUDED
pseudo-table is especially useful when dealing with unique constraints or exclusion constraints, as it provides an easy way to access the values that would have been inserted if there were no conflicts.
Let's dive deeper into how EXCLUDED
works:
EXCLUDED Columns
When you use EXCLUDED
in the DO UPDATE SET
clause, you can refer to the columns that caused the conflict in the first place. For instance, in the "students" table example from the previous section, the roll_no
column had a unique constraint. So, when an upsert operation attempted to insert a new row with a roll_no
that already existed, the EXCLUDED.roll_no
value would represent the conflicting value.
Here's the relevant part of the query from the previous example:
ON CONFLICT (roll_no)
DO UPDATE SET name = EXCLUDED.name, age = EXCLUDED.age;
In this case, EXCLUDED.name
and EXCLUDED.age
refer to the values that would have been inserted if the roll_no
conflict didn't occur. By referencing EXCLUDED
in the SET
clause, you can easily update the conflicting row with the new values.
Other Methods to Achieve Upsert
In addition to using the INSERT INTO ON CONFLICT
statement with the EXCLUDED
pseudo-table, PostgreSQL provides alternative methods to achieve the upsert operation. Let's explore two more approaches using the MERGE
statement and the CTE
(Common Table Expressions).
Method 1: Upsert with MERGE Statement
The MERGE
statement, also known as an "upsert" statement, is a SQL standard that has been adopted by some database systems, including PostgreSQL. It allows you to perform insert, update, or delete operations based on a specified condition, making it a powerful tool for handling upsert scenarios.
Here's the basic syntax for using the MERGE
statement in PostgreSQL:
MERGE INTO target_table AS target
USING source_table AS source
ON (target.conflict_column = source.conflict_column)
WHEN MATCHED THEN
UPDATE SET target.column1 = source.value1, target.column2 = source.value2, ..., target.columnN = source.valueN
WHEN NOT MATCHED THEN
INSERT (column1, column2, ..., columnN)
VALUES (source.value1, source.value2, ..., source.valueN);
Let's illustrate this with an example:
Assume we have a table called "books" with the following structure:
isbn (Primary Key) | title | author |
9781234567890 | Book A | Author X |
9789876543210 | Book B | Author Y |
We want to upsert new book records based on the isbn
column. We can use the MERGE
statement as follows:
MERGE INTO books AS target
USING (VALUES ('9780123456789', 'Book C', 'Author Z')) AS source (isbn, title, author)
ON (target.isbn = source.isbn)
WHEN MATCHED THEN
UPDATE SET title = source.title, author = source.author
WHEN NOT MATCHED THEN
INSERT (isbn, title, author)
VALUES (source.isbn, source.title, source.author);
In this example, if the book with isbn = '9780123456789'
exists in the "books" table, its title and author will be updated with the new values. Otherwise, a new row with the specified isbn
, title
, and author
will be inserted.
Method 2: Upsert with Common Table Expressions (CTE)
Common Table Expressions (CTEs) provide a way to define temporary result sets that can be used within a subsequent SQL statement. By leveraging CTEs, you can perform upsert operations in PostgreSQL efficiently.
Here's how to achieve an upsert using CTEs:
WITH source_data (conflict_column, column1, column2, ..., columnN) AS (
VALUES ('conflict_value', 'value1', 'value2', ..., 'valueN')
)
INSERT INTO target_table (conflict_column, column1, column2, ..., columnN)
SELECT conflict_column, column1, column2, ..., columnN
FROM source_data
ON CONFLICT (conflict_column)
DO UPDATE SET column1 = EXCLUDED.column1, column2 = EXCLUDED.column2, ..., columnN = EXCLUDED.columnN;
Let's demonstrate this with an example:
Suppose we have a table called "inventory" with the following structure:
product_code (Primary Key) | quantity |
1001 | 50 |
1002 | 30 |
We want to upsert inventory data based on the product_code
column. Here's the CTE-based upsert query:
WITH source_data (product_code, quantity) AS (
VALUES (1003, 20)
)
INSERT INTO inventory (product_code, quantity)
SELECT product_code, quantity
FROM source_data
ON CONFLICT (product_code)
DO UPDATE SET quantity = EXCLUDED.quantity;
In this example, if a record with product_code = 1003
doesn't exist in the "inventory" table, a new row will be inserted with the specified product code and quantity. If the product code already exists, the quantity will be updated with the new value.