How can I iterate over files in a given directory in Python?

To iterate over files in a directory in Python, you can use the os module, glob module, or the modern pathlib library. Below is a detailed guide with examples for each approach.

1. Using os Module

The os module provides low-level directory and file operations.

Method 1: os.listdir()

Lists all entries (files and directories) in a specified directory.
Example:

import os

directory = "/path/to/directory"

# List all entries in the directory
entries = os.listdir(directory)

# Iterate and filter files
for entry in entries:
    full_path = os.path.join(directory, entry)  # Create full path
    if os.path.isfile(full_path):  # Check if it's a file
        print(f"File: {entry}")

Method 2: os.scandir() (Python 3.5+)

More efficient than os.listdir() and returns DirEntry objects with metadata.
Example:

import os

directory = "/path/to/directory"

with os.scandir(directory) as entries:
    for entry in entries:
        if entry.is_file():  # Directly check if it's a file
            print(f"File: {entry.name}")

Method 3: os.walk() (Recursive)

Iterates through all files in a directory and its subdirectories.
Example:

import os

root_dir = "/path/to/directory"

# Walk through all subdirectories and files
for root, dirs, files in os.walk(root_dir):
    for file in files:
        full_path = os.path.join(root, file)
        print(f"File: {full_path}")

2. Using glob Module

The glob module supports Unix-style path pattern matching (e.g., *.txt).

Example 1: Non-Recursive Search

import glob

directory = "/path/to/directory"

# Find all .txt files in the directory
txt_files = glob.glob(f"{directory}/*.txt")

for file_path in txt_files:
    print(f"Text File: {file_path}")

Example 2: Recursive Search

import glob

# Search for .txt files in all subdirectories (** is recursive)
all_txt_files = glob.glob(f"{directory}/**/*.txt", recursive=True)

for file_path in all_txt_files:
    print(f"Text File: {file_path}")

3. Using pathlib (Python 3.4+)

The pathlib library provides an object-oriented approach for path manipulation.

Example 1: List Files in a Directory

from pathlib import Path

directory = Path("/path/to/directory")

# Iterate over files
for file in directory.iterdir():
    if file.is_file():
        print(f"File: {file.name}")

Example 2: Glob-Style Search

from pathlib import Path

directory = Path("/path/to/directory")

# Find all .csv files (non-recursive)
csv_files = directory.glob("*.csv")

for file in csv_files:
    print(f"CSV File: {file.name}")

# Recursive search for .csv files
all_csv_files = directory.rglob("*.csv")

for file in all_csv_files:
    print(f"CSV File: {file}")

Key Considerations

  1. Absolute vs. Relative Paths:
  • Use os.path.abspath() or Path.resolve() to get absolute paths.
  1. Filtering Hidden Files:
  • Skip hidden files (e.g., starting with . on Unix):
    python for entry in os.scandir(directory): if entry.is_file() and not entry.name.startswith('.'): print(entry.name)
  1. Sorting Files:
  • Sort entries alphabetically:
    python sorted_files = sorted(os.listdir(directory))
  1. Performance:
  • os.scandir() and pathlib are faster for large directories compared to os.listdir().

Full Example: Process Files with Metadata

import os
from pathlib import Path

def process_directory(directory):
    for entry in os.scandir(directory):
        if entry.is_file():
            file_path = Path(entry.path)
            print(f"""
                File Name: {file_path.name}
                Size: {file_path.stat().st_size} bytes
                Modified: {file_path.stat().st_mtime}
            """)

process_directory("/path/to/directory")

Summary

  • Use os.scandir() for efficient iteration (Python 3.5+).
  • Use glob for pattern-based searches (e.g., *.txt).
  • Use pathlib for an object-oriented approach (Python 3.4+).
  • Use os.walk() for recursive traversal of directories.

Choose the method that best fits your use case and Python version!

Leave a Reply

Your email address will not be published. Required fields are marked *