How do I remove duplicates from a list, while preserving order in Python?

To remove duplicates from a list while preserving the original order in Python, you can use the following methods:

1. Using a Loop with a seen Set (All Python Versions)

Iterate through the list, track seen elements with a set, and build a new list with first occurrences.

def remove_duplicates_preserve_order(lst):
    seen = set()
    result = []
    for item in lst:
        if item not in seen:
            seen.add(item)
            result.append(item)
    return result

# Example:
original_list = [3, 2, 1, 2, 4]
print(remove_duplicates_preserve_order(original_list))  # Output: [3, 2, 1, 4]

Key Points:

  • Works in all Python versions.
  • Time Complexity: O(n) (efficient due to set lookups).
  • Preserves the order of the first occurrence of each element.

2. Using dict.fromkeys (Python 3.7+)

Leverage the fact that dictionaries in Python 3.7+ preserve insertion order. Convert the list to a dictionary (keys remove duplicates) and back to a list.

original_list = [3, 2, 1, 2, 4]
deduped_list = list(dict.fromkeys(original_list))
print(deduped_list)  # Output: [3, 2, 1, 4]

Key Points:

  • Concise and readable for Python 3.7+.
  • Time Complexity: O(n).
  • Same result as the loop method.

3. Using collections.OrderedDict (Python < 3.7)

For compatibility with older Python versions, use OrderedDict to maintain order.

from collections import OrderedDict

original_list = [3, 2, 1, 2, 4]
deduped_list = list(OrderedDict.fromkeys(original_list))
print(deduped_list)  # Output: [3, 2, 1, 4]

Key Points:

  • Works in Python 2.7+ and 3.x.
  • Same logic as dict.fromkeys but uses OrderedDict for older versions.

Examples with Different Data Types

Strings:

words = ["apple", "banana", "apple", "orange", "banana"]
print(list(dict.fromkeys(words)))  # Output: ['apple', 'banana', 'orange']

Mixed Types:

mixed = [1, "a", 1, 2, "a", (3, 4)]
print(list(dict.fromkeys(mixed)))  # Output: [1, 'a', 2, (3, 4)]

Handling Unhashable Types (e.g., Lists)

If your list contains unhashable elements (e.g., nested lists), convert them to a hashable type (e.g., tuples) first:

original = [[1, 2], [3, 4], [1, 2]]
# Convert inner lists to tuples
deduped = [list(t) for t in dict.fromkeys(tuple(item) for item in original)]
print(deduped)  # Output: [[1, 2], [3, 4]]

Performance Comparison

MethodTime ComplexitySpace ComplexityPython Version
Loop with seen setO(n)O(n)All versions
dict.fromkeysO(n)O(n)3.7+
OrderedDict.fromkeysO(n)O(n)2.7+, 3.x

Summary

  • Python 3.7+: Use list(dict.fromkeys(your_list)) for simplicity.
  • Older Versions: Use OrderedDict or the loop-with-seen-set method.
  • Unhashable Elements: Convert to hashable types (e.g., tuples) first.

By choosing the appropriate method, you can efficiently deduplicate a list while preserving its order.

Leave a Reply

Your email address will not be published. Required fields are marked *