To extract a list from a pandas DataFrame column or row, you can use the following methods. Below is a detailed explanation with examples:
1. Extracting a Column as a List
Method: df['column_name'].tolist()
- Steps:
- Select the column using bracket notation
df['col']
(returns aSeries
) - Convert the
Series
to a list with.tolist()
- Example:
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']}
df = pd.DataFrame(data)
# Extract 'Name' column as a list
name_list = df['Name'].tolist()
print(name_list) # Output: ['Alice', 'Bob', 'Charlie']
Alternative: list(df['column_name'])
- Works but less efficient for large datasets:
age_list = list(df['Age'])
print(age_list) # Output: [25, 30, 35]
2. Extracting a Row as a List
Method 1: df.iloc[row_index].tolist()
- Steps:
- Select the row by integer position with
.iloc[]
- Convert the row (a
Series
) to a list
- Example:
# Extract the first row (index 0) as a list
row_0_list = df.iloc[0].tolist()
print(row_0_list) # Output: ['Alice', 25, 'New York']
Method 2: df.loc[row_label].tolist()
- Use row labels (if using custom indices):
# Set custom index
df = df.set_index('Name')
# Extract row for 'Bob'
bob_list = df.loc['Bob'].tolist()
print(bob_list) # Output: [30, 'London']
3. Extracting All Rows as a List of Lists
Method: df.values.tolist()
- Converts the entire DataFrame into a list of lists (each inner list is a row)
- Example:
all_rows = df.values.tolist()
print(all_rows)
# Output: [['Alice', 25, 'New York'],
# ['Bob', 30, 'London'],
# ['Charlie', 35, 'Paris']]
4. Extracting Unique Values from a Column
Method: df['col'].unique().tolist()
- Gets distinct values:
unique_cities = df['City'].unique().tolist()
print(unique_cities) # Output: ['New York', 'London', 'Paris']
5. Handling Edge Cases
A. Missing Values (NaN
):
.tolist()
preservesNaN
values. Usedropna()
to exclude them:
df_with_nan = pd.DataFrame({'Scores': [90, None, 85]})
clean_list = df_with_nan['Scores'].dropna().tolist()
print(clean_list) # Output: [90.0, 85.0]
B. Rows with Custom Indices:
- When rows have custom indices,
.iloc
/.loc
ensure correct selection:
df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['x', 'y'])
row_y = df.loc['y'].tolist() # Output: [2, 4]
Key Notes
Method | Use Case | Output Type |
---|---|---|
df['col'].tolist() | Single column | list of values |
df.iloc[0].tolist() | Row by integer index | list of values |
df.loc['label'].tolist() | Row by label (custom index) | list of values |
df.values.tolist() | Entire DataFrame (rows) | list of lists |
df['col'].unique().tolist() | Unique column values | list of values |
Example Workflow
import pandas as pd
# Create DataFrame
df = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry'],
'Price': [1.20, 0.50, 2.30],
'In_Stock': [True, True, False]
})
# 1. Extract 'Product' column
products = df['Product'].tolist()
print("Products:", products) # Output: ['Apple', 'Banana', 'Cherry']
# 2. Extract first row
first_row = df.iloc[0].tolist()
print("First row:", first_row) # Output: ['Apple', 1.2, True]
# 3. Extract all rows
all_rows = df.values.tolist()
print("All rows:", all_rows)
# Output: [['Apple', 1.2, True], ['Banana', 0.5, True], ['Cherry', 2.3, False]]
# 4. Unique prices
unique_prices = df['Price'].unique().tolist()
print("Unique prices:", unique_prices) # Output: [1.2, 0.5, 2.3]
Best Practices
- Use
.tolist()
: Optimized for pandas and efficient. - Avoid
list(df['col'])
: Slower for large datasets. - Prefer
.iloc
/.loc
for rows: Explicit indexing prevents ambiguity. - Handle
NaN
: Clean data withdropna()
if needed.