How to split string with multiple delimiters in Python ?

In Python, splitting a string with multiple delimiters can be achieved using the re.split() method from the re module (regular expressions). Below are detailed methods with examples:

1. Using re.split() (Recommended for Multiple Delimiters)

The re.split() method allows you to split a string using a regex pattern that matches all desired delimiters.

Example 1: Split on Spaces, Commas, and Semicolons

import re

text = "apple, banana; cherry  date"
delimiters = r"[ ,;]+"  # Matches one or more occurrences of space, comma, or semicolon

result = re.split(delimiters, text)
print(result)  # Output: ['apple', 'banana', 'cherry', 'date']

Example 2: Split on Mixed Delimiters (e.g., |, -, /)

text = "cat|dog-bird/fish"
delimiters = r"[|/-]+"  # Split on |, -, or /
result = re.split(delimiters, text)
print(result)  # Output: ['cat', 'dog', 'bird', 'fish']

Example 3: Ignore Empty Strings (Consecutive Delimiters)

Use a list comprehension to filter out empty strings:

text = "apple,,banana;;;cherry"
delimiters = r"[,;]+"
result = [s for s in re.split(delimiters, text) if s]
print(result)  # Output: ['apple', 'banana', 'cherry']

2. Using str.replace() for Simple Cases

For simple replacements, convert all delimiters to a single type before splitting:

text = "apple, banana; cherry"
# Replace commas and semicolons with spaces
text = text.replace(",", " ").replace(";", " ")
result = text.split()
print(result)  # Output: ['apple', 'banana', 'cherry']

Limitation: Inefficient for many delimiters or complex patterns.

3. Using re.findall() to Extract Words

Instead of splitting, extract non-delimiter segments:

import re

text = "apple, banana; cherry"
pattern = r"[^ ,;]+"  # Match sequences that are NOT delimiters
result = re.findall(pattern, text)
print(result)  # Output: ['apple', 'banana', 'cherry']

4. Split on Words as Delimiters

Use regex | to split on multi-character delimiters like and or or:

text = "apple and banana or cherry"
delimiters = r"\b(?:and|or)\b"  # Split on "and" or "or"
result = re.split(delimiters, text)
print(result)  # Output: ['apple ', ' banana ', ' cherry']

Key Notes

  • Regex Patterns:
  • r"[ ,;]+": Split on one or more spaces, commas, or semicolons.
  • r"\b(?:and|or)\b": Split on whole words and or or.
  • r"[^a-z]+": Split on any non-alphabetic characters (case-sensitive).
  • Handling Case Sensitivity:
    Use re.IGNORECASE to split case-insensitively:
  text = "Apple AND banana OR Cherry"
  delimiters = r"\b(?:and|or)\b"
  result = re.split(delimiters, text, flags=re.IGNORECASE)
  # Output: ['Apple ', ' banana ', ' Cherry']

Edge Cases

  1. Leading/Trailing Delimiters:
   text = ",,apple,banana;"
   delimiters = r"[,;]+"
   result = [s for s in re.split(delimiters, text) if s]
   print(result)  # Output: ['apple', 'banana']
  1. Overlapping Delimiters:
    Use explicit regex patterns (e.g., r"[,;]+" avoids splitting ";," into two delimiters).

Comparison of Methods

MethodProsCons
re.split()Handles complex patternsRequires regex knowledge
str.replace()Simple for basic casesInefficient for many delimiters
re.findall()Extracts non-delimiter segmentsLess intuitive for splitting

Summary

  • Use re.split() for splitting with regex patterns (most flexible).
  • Filter empty strings with a list comprehension if needed.
  • Avoid str.split() for multiple delimiters, as it only supports a single delimiter.

Leave a Reply

Your email address will not be published. Required fields are marked *