To download a file over HTTP in Python, you can use built-in libraries like urllib
or third-party libraries like requests
. Below are detailed methods with examples:
Method 1: Using urllib
(Standard Library)
The urllib
module is part of Python’s standard library and requires no additional installation.
Example 1: Simple Download with urllib.request.urlretrieve
from urllib.request import urlretrieve
url = "https://example.com/file.zip"
save_path = "downloaded_file.zip"
# Download the file
urlretrieve(url, save_path)
print(f"File saved to {save_path}")
Example 2: Download with Progress Tracking
For large files, use chunked downloads to avoid memory issues:
from urllib.request import urlopen
url = "https://example.com/large_file.zip"
save_path = "large_file.zip"
# Open the URL and create a local file
with urlopen(url) as response:
with open(save_path, "wb") as f:
while True:
chunk = response.read(8192) # Read 8KB at a time
if not chunk:
break
f.write(chunk)
print("Download complete.")
Method 2: Using requests
(Third-Party Library)
The requests
library simplifies HTTP interactions. Install it first:
pip install requests
Example 1: Basic Download
import requests
url = "https://example.com/image.jpg"
save_path = "image.jpg"
response = requests.get(url, stream=True)
if response.status_code == 200:
with open(save_path, "wb") as f:
f.write(response.content)
print("File downloaded successfully.")
else:
print(f"Failed to download: Status code {response.status_code}")
Example 2: Download Large Files in Chunks
import requests
url = "https://example.com/large_video.mp4"
save_path = "video.mp4"
response = requests.get(url, stream=True)
if response.status_code == 200:
with open(save_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print("Large file downloaded.")
else:
print(f"Error: {response.status_code}")
Example 3: Download with Progress Bar
Use tqdm
to show download progress (install with pip install tqdm
):
import requests
from tqdm import tqdm
url = "https://example.com/huge_file.iso"
save_path = "huge_file.iso"
response = requests.get(url, stream=True)
total_size = int(response.headers.get("content-length", 0))
with open(save_path, "wb") as f, tqdm(
desc=save_path,
total=total_size,
unit="B",
unit_scale=True,
unit_divisor=1024,
) as progress_bar:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
progress_bar.update(len(chunk))
print("Download with progress complete.")
Method 3: Handle Errors and Headers
Add error handling and custom headers (e.g., user agents):
import requests
url = "https://example.com/protected_file.pdf"
save_path = "file.pdf"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"
}
try:
response = requests.get(url, headers=headers, stream=True, timeout=10)
response.raise_for_status() # Raise error for bad status codes (4xx/5xx)
with open(save_path, "wb") as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
print("File downloaded with headers.")
except requests.exceptions.HTTPError as e:
print(f"HTTP error: {e}")
except requests.exceptions.ConnectionError:
print("Connection failed.")
except requests.exceptions.Timeout:
print("Request timed out.")
Key Considerations
- Memory Efficiency: Use
stream=True
inrequests
or chunked downloads inurllib
for large files. - Error Handling: Check HTTP status codes and handle exceptions.
- User-Agent: Some servers block default Python user agents. Mimic a browser with headers.
- Progress Tracking: Use
tqdm
for visual feedback on large downloads. - SSL Verification: Disable with
verify=False
inrequests
for problematic HTTPS sites (not recommended for security).
Summary
- For simplicity: Use
urllib.request.urlretrieve
. - For advanced needs: Use
requests
with chunking and error handling. - For large files: Always stream downloads to avoid memory overload.