How do I get the row count of a Pandas DataFrame?

To get the row count of a Pandas DataFrame, here are detailed examples covering various scenarios, including edge cases, performance, and advanced use cases:

1. Basic Row Count Methods

All methods return the total number of rows, including duplicates and NaN values.

Example 1: Simple DataFrame

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    "Name": ["Alice", "Bob", "Charlie"],
    "Age": [25, 30, 35]
})

print(len(df))           # 3 (most common)
print(df.shape[0])       # 3 (tuple: rows, columns)
print(len(df.index))     # 3 (index-based)

2. Empty DataFrame

Handling a DataFrame with zero rows:

empty_df = pd.DataFrame(columns=["A", "B"])
print(empty_df.shape[0])  # 0

3. Filtered Rows

Count rows that meet a condition:

# Count rows where Age > 28
filtered_count = df[df["Age"] > 28].shape[0]  # 2 (Bob:30, Charlie:35)

4. Counting Non-Null Rows

Exclude rows with NaN values:

import numpy as np

df_with_nan = pd.DataFrame({
    "A": [1, np.nan, 3],
    "B": [4, 5, np.nan]
})

# Count rows without ANY NaN values
valid_rows = df_with_nan.dropna().shape[0]  # 1 (only first row)

5. Grouped Row Counts

Count rows per group:

df = pd.DataFrame({
    "Category": ["A", "B", "A", "B"],
    "Value": [10, 20, 30, 40]
})

# Group by 'Category' and count rows in each group
group_counts = df.groupby("Category").size()
# Output: A → 2, B → 2

6. Time-Series Data

Count rows within a date range:

date_df = pd.DataFrame({
    "Date": pd.date_range("2023-01-01", periods=5),
    "Sales": [100, 200, 150, 300, 250]
})

# Count rows between two dates
mask = (date_df["Date"] >= "2023-01-02") & (date_df["Date"] <= "2023-01-04")
date_filtered_count = date_df[mask].shape[0]  # 3 rows

7. Performance Comparison

All methods are optimized, but here’s a benchmark for large DataFrames:

import timeit

large_df = pd.DataFrame({"A": range(1_000_000)})

def test_len():
    return len(large_df)

def test_shape():
    return large_df.shape[0]

print("len(df):", timeit.timeit(test_len, number=1000))      # ~0.001s
print("df.shape[0]:", timeit.timeit(test_shape, number=1000)) # ~0.001s

8. Common Pitfalls

Using df.count()

This returns column-wise non-null counts, not row count!

print(df.count())
# Output:
# Name    3
# Age     3

Counting Unique Rows

Use nunique() for unique values in a column, not row count:

print(df["Name"].nunique())  # 3 (unique names)

Summary

Use CaseMethod
Basic row countlen(df), df.shape[0]
Filtered rowsdf[condition].shape[0]
Non-null rowsdf.dropna().shape[0]
Grouped row countsdf.groupby(col).size()
Time-series filteringBoolean indexing + .shape[0]

Choose the method that best fits your workflow!

Leave a Reply

Your email address will not be published. Required fields are marked *