How to remove duplicates in lists with Python ?

To remove duplicates from a list in Python while optionally preserving order, use the following methods:

1. Using a Set (Order Not Preserved)

Convert the list to a set to remove duplicates, then back to a list.
Note: This method does not preserve the original order.

original_list = [2, 3, 2, 1, 5, 4, 4]
unique_list = list(set(original_list))
print(unique_list)  # Output (order varies): [1, 2, 3, 4, 5]

2. Using a Loop and seen Set (Order Preserved)

Iterate through the list and append elements to a new list only if they haven’t been seen before.
Preserves the order of first occurrence.

original_list = [2, 3, 2, 1, 5, 4, 4]
seen = set()
unique_list = []
for item in original_list:
    if item not in seen:
        seen.add(item)
        unique_list.append(item)
print(unique_list)  # Output: [2, 3, 1, 5, 4]

3. Using dict.fromkeys() (Order Preserved in Python 3.7+)

Convert the list to a dictionary (keys are unique), then back to a list.
Preserves order in Python 3.7+:

original_list = [2, 3, 2, 1, 5, 4, 4]
unique_list = list(dict.fromkeys(original_list))
print(unique_list)  # Output: [2, 3, 1, 5, 4]

4. Using OrderedDict (Order Preserved in Older Python Versions)

For Python <3.7, use collections.OrderedDict:

from collections import OrderedDict
original_list = [2, 3, 2, 1, 5, 4, 4]
unique_list = list(OrderedDict.fromkeys(original_list))
print(unique_list)  # Output: [2, 3, 1, 5, 4]

5. Using List Comprehension (Order Preserved)

Combine a seen set with a list comprehension for brevity:

original_list = [2, 3, 2, 1, 5, 4, 4]
seen = set()
unique_list = [x for x in original_list if not (x in seen or seen.add(x))]
print(unique_list)  # Output: [2, 3, 1, 5, 4]

6. Using itertools.groupby (Sorted List Only)

Remove consecutive duplicates after sorting the list:

from itertools import groupby
original_list = [2, 3, 2, 1, 5, 4, 4]
sorted_list = sorted(original_list)  # Sort first
unique_list = [k for k, _ in groupby(sorted_list)]
print(unique_list)  # Output: [1, 2, 3, 4, 5] (order changed)

7. Handling Unhashable Elements (e.g., Lists of Lists)

For lists containing unhashable elements (like nested lists), convert elements to hashable types first:

original_list = [[1, 2], [3], [1, 2], [4, 5]]
seen = set()
unique_list = []
for sublist in original_list:
    # Convert list to tuple (hashable)
    tuple_sublist = tuple(sublist)
    if tuple_sublist not in seen:
        seen.add(tuple_sublist)
        unique_list.append(sublist)
print(unique_list)  # Output: [[1, 2], [3], [4, 5]]

Summary Table

MethodOrder Preserved?Time ComplexityUse Case
set()❌ NoO(n)Quick deduplication, order irrelevant
Loop + seen set✅ YesO(n)Order matters, general use
dict.fromkeys()✅ Yes (Python 3.7+)O(n)Concise, modern Python
OrderedDict✅ YesO(n)Python <3.7 compatibility
List Comprehension + seen✅ YesO(n)Compact code
itertools.groupby❌ No (sorted)O(n log n)Sorted lists, consecutive duplicates

Key Takeaways

  • Preserve Order: Use dict.fromkeys() (Python 3.7+), loop with seen, or OrderedDict.
  • Speed: The set() method is fastest but doesn’t preserve order.
  • Unhashable Elements: Convert elements to hashable types (e.g., tuples) first.

By choosing the right method, you can efficiently remove duplicates while meeting your specific needs!

Leave a Reply

Your email address will not be published. Required fields are marked *