To find duplicate values in a SQL table, you need to identify rows where specific columns (or combinations of columns) have the same values. Below are detailed methods and examples for detecting duplicates:
1. Basic Method: GROUP BY
and HAVING
Identify duplicates based on one or more columns using aggregation.
Example 1: Find Duplicates in a Single Column
SELECT column_name, COUNT(*) AS duplicate_count
FROM table_name
GROUP BY column_name
HAVING COUNT(*) > 1;
Example 2: Find Duplicates Across Multiple Columns
SELECT column1, column2, COUNT(*) AS duplicate_count
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1;
2. List All Duplicate Rows
To see all rows that are duplicates (not just the grouped keys), join the table with the grouped duplicates.
Example:
SELECT t.*
FROM table_name t
INNER JOIN (
SELECT column1, column2
FROM table_name
GROUP BY column1, column2
HAVING COUNT(*) > 1
) dup
ON t.column1 = dup.column1 AND t.column2 = dup.column2;
3. Use ROW_NUMBER()
to Identify Duplicates
Assign a row number to duplicates and filter them. This is useful for deleting duplicates later.
Example:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY (SELECT NULL)
) AS row_num
FROM table_name
)
SELECT * FROM cte WHERE row_num > 1;
4. Find Entire Duplicate Rows
Check for rows where all column values are identical.
Example:
SELECT *
FROM table_name
GROUP BY column1, column2, column3, ... -- List all columns
HAVING COUNT(*) > 1;
5. Delete Duplicates (Optional)
Once duplicates are identified, you can delete them. Use a CTE
or subquery for safety.
Example:
WITH cte AS (
SELECT *,
ROW_NUMBER() OVER (
PARTITION BY column1, column2
ORDER BY id -- Use a unique identifier to decide which row to keep
) AS row_num
FROM table_name
)
DELETE FROM cte WHERE row_num > 1;
Key Notes
- Columns Matter: Duplicates are defined by the columns you include in
GROUP BY
orPARTITION BY
. - NULL Values:
NULL = NULL
is treated as not equal in SQL. Adjust your logic if NULLs are present. - Performance: Use indexes on columns involved in grouping/partitioning for large tables.
- Backup: Always back up data before deleting duplicates.
Example Walkthrough
Assume a table employees
with duplicates in email
:
id | name | |
---|---|---|
1 | Alice | alice@mail.com |
2 | Bob | bob@mail.com |
3 | Alice | alice@mail.com |
Step 1: Identify Duplicate Emails
SELECT email, COUNT(*) AS duplicate_count
FROM employees
GROUP BY email
HAVING COUNT(*) > 1;
Result:
duplicate_count | |
---|---|
alice@mail.com | 2 |
Step 2: List All Duplicate Rows
SELECT * FROM employees
WHERE email IN (
SELECT email
FROM employees
GROUP BY email
HAVING COUNT(*) > 1
);
Result:
id | name | |
---|---|---|
1 | Alice | alice@mail.com |
3 | Alice | alice@mail.com |
Summary
- Use
GROUP BY
andHAVING
to find duplicates. - Use
ROW_NUMBER()
to isolate duplicates for deletion. - Adjust logic based on your definition of a “duplicate” (specific columns vs. entire rows).