How to download a file over HTTP in Python?

To download a file over HTTP in Python, you can use built-in libraries like urllib or third-party libraries like requests. Below are detailed methods with examples:

Method 1: Using urllib (Standard Library)

The urllib module is part of Python’s standard library and requires no additional installation.

Example 1: Simple Download with urllib.request.urlretrieve

from urllib.request import urlretrieve

url = "https://example.com/file.zip"
save_path = "downloaded_file.zip"

# Download the file
urlretrieve(url, save_path)
print(f"File saved to {save_path}")

Example 2: Download with Progress Tracking

For large files, use chunked downloads to avoid memory issues:

from urllib.request import urlopen

url = "https://example.com/large_file.zip"
save_path = "large_file.zip"

# Open the URL and create a local file
with urlopen(url) as response:
    with open(save_path, "wb") as f:
        while True:
            chunk = response.read(8192)  # Read 8KB at a time
            if not chunk:
                break
            f.write(chunk)
print("Download complete.")

Method 2: Using requests (Third-Party Library)

The requests library simplifies HTTP interactions. Install it first:

pip install requests

Example 1: Basic Download

import requests

url = "https://example.com/image.jpg"
save_path = "image.jpg"

response = requests.get(url, stream=True)
if response.status_code == 200:
    with open(save_path, "wb") as f:
        f.write(response.content)
    print("File downloaded successfully.")
else:
    print(f"Failed to download: Status code {response.status_code}")

Example 2: Download Large Files in Chunks

import requests

url = "https://example.com/large_video.mp4"
save_path = "video.mp4"

response = requests.get(url, stream=True)
if response.status_code == 200:
    with open(save_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("Large file downloaded.")
else:
    print(f"Error: {response.status_code}")

Example 3: Download with Progress Bar

Use tqdm to show download progress (install with pip install tqdm):

import requests
from tqdm import tqdm

url = "https://example.com/huge_file.iso"
save_path = "huge_file.iso"

response = requests.get(url, stream=True)
total_size = int(response.headers.get("content-length", 0))

with open(save_path, "wb") as f, tqdm(
    desc=save_path,
    total=total_size,
    unit="B",
    unit_scale=True,
    unit_divisor=1024,
) as progress_bar:
    for chunk in response.iter_content(chunk_size=8192):
        f.write(chunk)
        progress_bar.update(len(chunk))
print("Download with progress complete.")

Method 3: Handle Errors and Headers

Add error handling and custom headers (e.g., user agents):

import requests

url = "https://example.com/protected_file.pdf"
save_path = "file.pdf"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}

try:
    response = requests.get(url, headers=headers, stream=True, timeout=10)
    response.raise_for_status()  # Raise error for bad status codes (4xx/5xx)
    with open(save_path, "wb") as f:
        for chunk in response.iter_content(chunk_size=8192):
            f.write(chunk)
    print("File downloaded with headers.")
except requests.exceptions.HTTPError as e:
    print(f"HTTP error: {e}")
except requests.exceptions.ConnectionError:
    print("Connection failed.")
except requests.exceptions.Timeout:
    print("Request timed out.")

Key Considerations

  1. Memory Efficiency: Use stream=True in requests or chunked downloads in urllib for large files.
  2. Error Handling: Check HTTP status codes and handle exceptions.
  3. User-Agent: Some servers block default Python user agents. Mimic a browser with headers.
  4. Progress Tracking: Use tqdm for visual feedback on large downloads.
  5. SSL Verification: Disable with verify=False in requests for problematic HTTPS sites (not recommended for security).

Summary

  • For simplicity: Use urllib.request.urlretrieve.
  • For advanced needs: Use requests with chunking and error handling.
  • For large files: Always stream downloads to avoid memory overload.

Leave a Reply

Your email address will not be published. Required fields are marked *