How to extract the extension from a filename in Python?

To extract the file extension from a filename in Python, you can use the following methods, depending on whether you want the extension with or without the leading dot (.):

1. Using os.path.splitext (Classic Approach)

The os.path module provides a reliable way to split the filename into its base and extension:

import os

filename = "document.txt"
root, extension = os.path.splitext(filename)
print(extension)  # Output: .txt
  • Handles edge cases:
  • Filenames with no extension: "file"("file", "").
  • Hidden files (Unix-like): ".bashrc"(".bashrc", "").
  • Multiple dots: "image.tar.gz"("image.tar", ".gz").

2. Using pathlib.Path (Modern Approach, Python 3.4+)

The pathlib module offers an object-oriented approach:

from pathlib import Path

file_path = Path("photo.jpg")
extension = file_path.suffix  # Includes the leading dot
print(extension)  # Output: .jpg

# To get the extension without the dot:
extension_without_dot = file_path.suffix.lstrip(".")
print(extension_without_dot)  # Output: jpg
  • Handles multiple extensions:
  path = Path("archive.tar.gz")
  print(path.suffixes)  # Output: ['.tar', '.gz']
  print(path.suffix)    # Output: .gz (last extension only)

3. Using String Manipulation (Manual Method)

For simple cases (not recommended for complex paths):

filename = "data.csv"
if "." in filename:
    extension = filename.split(".")[-1]  # "csv"
else:
    extension = ""

Limitations:

  • Fails for filenames like ".gitignore" (hidden files) or paths with directories (e.g., "folder/file.txt").
  • Does not handle multiple dots correctly (e.g., "file.tar.gz" → splits into "gz").

Key Notes

  • Leading Dot: Use .lstrip(".") to remove it (e.g., .txttxt).
  • No Extension: Check if the result is empty (e.g., "").
  • Edge Cases: Prefer os.path or pathlib for cross-platform compatibility.

Examples

Filenameos.path.splitextpathlib.Path.suffix
"report.docx"(".docx").docx
"image.png"(".png").png
"README"("")""
"archive.tar.gz"(".gz").gz
".bashrc"("")""

Best Practice

Use pathlib (Python 3.4+) for clean, modern code, or os.path.splitext for compatibility with older Python versions. Avoid manual string splitting for complex paths.

Leave a Reply

Your email address will not be published. Required fields are marked *