How do I remove duplicates from a list, while preserving order in Python?

To remove duplicates from a list while preserving the original order in Python, you can use the following methods:

1. Using a Loop with a `seen` Set (All Python Versions)

Iterate through the list, track seen elements with a set, and build a new list with first occurrences.

def remove_duplicates_preserve_order(lst):
    seen = set()
    result = []
    for item in lst:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

# Example:
original_list = [3, 2, 1, 2, 4]
print(remove_duplicates_preserve_order(original_list))  # Output: [3, 2, 1, 4]

Key Points:

Works in all Python versions.
Time Complexity: O(n) (efficient due to set lookups).
Preserves the order of the first occurrence of each element.

2. Using `dict.fromkeys` (Python 3.7+)

Leverage the fact that dictionaries in Python 3.7+ preserve insertion order. Convert the list to a dictionary (keys remove duplicates) and back to a list.

original_list = [3, 2, 1, 2, 4]
deduped_list = list(dict.fromkeys(original_list))
print(deduped_list)  # Output: [3, 2, 1, 4]

Key Points:

Concise and readable for Python 3.7+.
Time Complexity: O(n).
Same result as the loop method.

3. Using `collections.OrderedDict` (Python < 3.7)

For compatibility with older Python versions, use OrderedDict to maintain order.

from collections import OrderedDict

original_list = [3, 2, 1, 2, 4]
deduped_list = list(OrderedDict.fromkeys(original_list))
print(deduped_list)  # Output: [3, 2, 1, 4]

Key Points:

Works in Python 2.7+ and 3.x.
Same logic as dict.fromkeys but uses OrderedDict for older versions.

Examples with Different Data Types

Strings:

words = ["apple", "banana", "apple", "orange", "banana"]
print(list(dict.fromkeys(words)))  # Output: ['apple', 'banana', 'orange']

Mixed Types:

mixed = [1, "a", 1, 2, "a", (3, 4)]
print(list(dict.fromkeys(mixed)))  # Output: [1, 'a', 2, (3, 4)]

Handling Unhashable Types (e.g., Lists)

If your list contains unhashable elements (e.g., nested lists), convert them to a hashable type (e.g., tuples) first:

original = [[1, 2], [3, 4], [1, 2]]
# Convert inner lists to tuples
deduped = [list(t) for t in dict.fromkeys(tuple(item) for item in original)]
print(deduped)  # Output: [[1, 2], [3, 4]]

Performance Comparison

Method	Time Complexity	Space Complexity	Python Version
Loop with `seen` set	O(n)	O(n)	All versions
`dict.fromkeys`	O(n)	O(n)	3.7+
`OrderedDict.fromkeys`	O(n)	O(n)	2.7+, 3.x

Summary

Python 3.7+: Use list(dict.fromkeys(your_list)) for simplicity.
Older Versions: Use OrderedDict or the loop-with-seen-set method.
Unhashable Elements: Convert to hashable types (e.g., tuples) first.

By choosing the appropriate method, you can efficiently deduplicate a list while preserving its order.

1. Using a Loop with a seen Set (All Python Versions)

Key Points:

2. Using dict.fromkeys (Python 3.7+)

Key Points:

3. Using collections.OrderedDict (Python < 3.7)

Key Points:

Examples with Different Data Types

Strings:

Mixed Types:

Handling Unhashable Types (e.g., Lists)

Performance Comparison

Summary

Leave a Reply Cancel reply

1. Using a Loop with a `seen` Set (All Python Versions)

2. Using `dict.fromkeys` (Python 3.7+)

3. Using `collections.OrderedDict` (Python < 3.7)