To remove duplicate rows in SQL Server, you can use several methods depending on your specific requirements. Below are the most common and effective approaches:
1. Using ROW_NUMBER()
with a CTE (Common Table Expression)
This method identifies duplicates using a window function and deletes them while retaining one instance of each duplicate group.
Example:
WITH CTE AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY Column1, Column2, Column3 -- Columns defining duplicates
ORDER BY (SELECT 0) -- Optional: Order to decide which row to keep
) AS row_num
FROM
YourTable
)
DELETE FROM CTE WHERE row_num > 1;
Steps:
PARTITION BY
: List the columns that define duplicates (e.g.,FirstName
,LastName
).ORDER BY
: Determines which duplicate to keep (e.g.,CreatedDate DESC
to keep the newest row).- Delete rows where
row_num > 1
(retains the first row in each partition).
2. Using GROUP BY
and MIN/MAX
(for tables with a unique identifier)
If your table has a unique column (e.g., an ID
), delete duplicates by retaining one ID per group.
Example:
DELETE FROM YourTable
WHERE ID NOT IN (
SELECT MIN(ID) -- Keep the smallest ID in each duplicate group
FROM YourTable
GROUP BY Column1, Column2, Column3 -- Columns defining duplicates
);
3. Using a Temporary Table
Copy distinct rows into a temporary table, truncate the original table, and re-insert the deduplicated data.
Example:
-- Step 1: Select distinct rows into a temp table
SELECT DISTINCT * INTO #TempTable FROM YourTable;
-- Step 2: Truncate the original table
TRUNCATE TABLE YourTable;
-- Step 3: Re-insert the deduplicated data
INSERT INTO YourTable SELECT * FROM #TempTable;
-- Step 4: Drop the temp table
DROP TABLE #TempTable;
4. Using DELETE
with a Self-Join
Identify duplicates and delete them using a self-join.
Example:
DELETE t1
FROM YourTable t1
JOIN YourTable t2 ON
t1.Column1 = t2.Column1 AND
t1.Column2 = t2.Column2 AND
t1.Column3 = t2.Column3 -- Columns defining duplicates
WHERE t1.ID > t2.ID; -- Keep the row with the smallest ID
Key Considerations
- Backup First: Always test with a
SELECT
statement to verify which rows will be deleted.
-- Test before deleting
WITH CTE AS (...)
SELECT * FROM CTE WHERE row_num > 1;
- Indexes: Add indexes on the columns defining duplicates to improve performance for large tables.
- Unique Constraints: After deduplication, add a
UNIQUE
constraint to prevent future duplicates:
ALTER TABLE YourTable
ADD CONSTRAINT UC_YourTable UNIQUE (Column1, Column2, Column3);
Example Workflow
- Identify Duplicates:
SELECT
Column1, Column2, Column3,
COUNT(*) AS DuplicateCount
FROM YourTable
GROUP BY Column1, Column2, Column3
HAVING COUNT(*) > 1;
- Delete Duplicates (using the CTE method).
- Add Constraints to prevent recurrence.
Summary
Method | Best For | Pros | Cons |
---|---|---|---|
ROW_NUMBER() with CTE | Most scenarios, flexible ordering | Retains control over which row to keep | Requires understanding of window functions |
GROUP BY & MIN/MAX | Tables with a unique identifier | Simple logic | Requires a unique column (e.g., ID ) |
Temporary Table | Simple deduplication | Works for all duplicate types | Resets identity columns, slower for large tables |
Self-Join | Tables with unique IDs | Explicit control over deletions | Complex for multiple duplicate columns |
Choose the method that best fits your table structure and requirements!