To get the row count of a Pandas DataFrame, here are detailed examples covering various scenarios, including edge cases, performance, and advanced use cases:
1. Basic Row Count Methods
All methods return the total number of rows, including duplicates and NaN
values.
Example 1: Simple DataFrame
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
"Name": ["Alice", "Bob", "Charlie"],
"Age": [25, 30, 35]
})
print(len(df)) # 3 (most common)
print(df.shape[0]) # 3 (tuple: rows, columns)
print(len(df.index)) # 3 (index-based)
2. Empty DataFrame
Handling a DataFrame with zero rows:
empty_df = pd.DataFrame(columns=["A", "B"])
print(empty_df.shape[0]) # 0
3. Filtered Rows
Count rows that meet a condition:
# Count rows where Age > 28
filtered_count = df[df["Age"] > 28].shape[0] # 2 (Bob:30, Charlie:35)
4. Counting Non-Null Rows
Exclude rows with NaN
values:
import numpy as np
df_with_nan = pd.DataFrame({
"A": [1, np.nan, 3],
"B": [4, 5, np.nan]
})
# Count rows without ANY NaN values
valid_rows = df_with_nan.dropna().shape[0] # 1 (only first row)
5. Grouped Row Counts
Count rows per group:
df = pd.DataFrame({
"Category": ["A", "B", "A", "B"],
"Value": [10, 20, 30, 40]
})
# Group by 'Category' and count rows in each group
group_counts = df.groupby("Category").size()
# Output: A → 2, B → 2
6. Time-Series Data
Count rows within a date range:
date_df = pd.DataFrame({
"Date": pd.date_range("2023-01-01", periods=5),
"Sales": [100, 200, 150, 300, 250]
})
# Count rows between two dates
mask = (date_df["Date"] >= "2023-01-02") & (date_df["Date"] <= "2023-01-04")
date_filtered_count = date_df[mask].shape[0] # 3 rows
7. Performance Comparison
All methods are optimized, but here’s a benchmark for large DataFrames:
import timeit
large_df = pd.DataFrame({"A": range(1_000_000)})
def test_len():
return len(large_df)
def test_shape():
return large_df.shape[0]
print("len(df):", timeit.timeit(test_len, number=1000)) # ~0.001s
print("df.shape[0]:", timeit.timeit(test_shape, number=1000)) # ~0.001s
8. Common Pitfalls
Using df.count()
This returns column-wise non-null counts, not row count!
print(df.count())
# Output:
# Name 3
# Age 3
Counting Unique Rows
Use nunique()
for unique values in a column, not row count:
print(df["Name"].nunique()) # 3 (unique names)
Summary
Use Case | Method |
---|---|
Basic row count | len(df) , df.shape[0] |
Filtered rows | df[condition].shape[0] |
Non-null rows | df.dropna().shape[0] |
Grouped row counts | df.groupby(col).size() |
Time-series filtering | Boolean indexing + .shape[0] |
Choose the method that best fits your workflow!