How to create a Pandas Dataframe by appending one row at a time ?

To create a Pandas DataFrame by appending one row at a time, you can use multiple methods. Below is a revised guide with corrected examples and explanations, ensuring clarity and efficiency.

Method 1: Using loc or iloc (Inefficient for Large Data)

Append rows by specifying the next index.
Use Case: Small datasets or one-off additions.

Example 1: Appending Rows with loc

import pandas as pd

# Create an empty DataFrame with columns
df = pd.DataFrame(columns=["Name", "Age", "City"])

# Append rows one by one
df.loc[0] = ["Alice", 25, "New York"]
df.loc[1] = ["Bob", 30, "London"]
df.loc[2] = {"Name": "Charlie", "Age": 35, "City": "Paris"}

print(df)

Output:

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris

Method 2: Using pd.concat() with Single-Row DataFrames

Collect rows as individual DataFrames and concatenate them once.
Use Case: Moderate-sized datasets where you need incremental control.

Example 2: Collect Single-Row DataFrames and Concatenate

rows = []

# Append rows as single-row DataFrames
rows.append(pd.DataFrame([{"Name": "Alice", "Age": 25, "City": "New York"}]))
rows.append(pd.DataFrame([{"Name": "Bob", "Age": 30, "City": "London"}]))
rows.append(pd.DataFrame([{"Name": "Charlie", "Age": 35, "City": "Paris"}]))

# Concatenate all rows into a DataFrame
df = pd.concat(rows, ignore_index=True)
print(df)

Output:

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris

Method 3: Collect Rows in a List (Most Efficient)

Collect rows as dictionaries in a list and create the DataFrame once.
Use Case: Large datasets or performance-critical applications.

Example 3: Build a List of Dictionaries

rows = []
rows.append({"Name": "Alice", "Age": 25, "City": "New York"})
rows.append({"Name": "Bob", "Age": 30, "City": "London"})
rows.append({"Name": "Charlie", "Age": 35, "City": "Paris"})

# Convert the list to a DataFrame in one step
df = pd.DataFrame(rows)
print(df)

Output:

      Name  Age      City
0    Alice   25  New York
1      Bob   30    London
2  Charlie   35     Paris

Method 4: Using pd.concat() in a Loop (Inefficient)

For incremental appending (e.g., real-time data streams).
Use Case: Avoid unless absolutely necessary.

Example 4: Simulate Real-Time Appending

df = pd.DataFrame(columns=["Name", "Age", "City"])

# Simulate a loop (e.g., reading data from a source)
for i in range(3):
    new_row = pd.DataFrame([{"Name": f"User_{i}", "Age": 20+i, "City": "Berlin"}])
    df = pd.concat([df, new_row], ignore_index=True)  # Reset index after each append

print(df)

Output:

     Name  Age    City
0  User_0   20  Berlin
1  User_1   21  Berlin
2  User_2   22  Berlin

Key Notes

  1. Avoid df.append():
    Deprecated in Pandas 1.4.0 and removed in Pandas 2.0. Use pd.concat() instead.
  2. Efficiency:
  • Inefficient: Appending rows one by one (Methods 1 and 4) has O(n²) time complexity.
  • Efficient: Collect rows in a list (Method 3) or use batch concatenation (Method 2) for O(n) time complexity.
  1. Alternatives for Large Data:
  • Use libraries like Polars (faster for row-wise operations) or Dask for distributed computing.

Performance Comparison

MethodTime ComplexityUse Case
loc or concat in a loopO(n²)Small datasets or unavoidable incremental updates
pd.concat() with single-row DataFramesO(n)Moderate-sized data with incremental control
Collect rows in a listO(n)Best for large datasets

Final Recommendation

  1. For Small Data: Use loc (Method 1) for simplicity.
  2. For Moderate Data: Use pd.concat() with a list of single-row DataFrames (Method 2).
  3. For Large Data: Always collect rows in a list and create the DataFrame once (Method 3).
  4. For Real-Time Data: Use a buffer (e.g., collect 1000 rows at a time) to minimize concat calls.

Example 5: Buffered Appending (Optimized for Real-Time Data)

buffer = []
buffer_size = 1000  # Adjust based on memory constraints

for i in range(1, 2501):  # Simulate 2500 rows of data
    row = {"ID": i, "Value": i * 10}
    buffer.append(row)

    # Append in chunks to reduce concat calls
    if len(buffer) >= buffer_size:
        df = pd.concat([df, pd.DataFrame(buffer)], ignore_index=True)
        buffer = []  # Reset buffer

# Append remaining rows
if buffer:
    df = pd.concat([df, pd.DataFrame(buffer)], ignore_index=True)

By following these methods, you can efficiently build DataFrames in Pandas while minimizing performance overhead!

Leave a Reply

Your email address will not be published. Required fields are marked *