To get a value from a specific cell in a pandas DataFrame, you have several methods at your disposal. Here’s a detailed explanation with examples:
Core Methods for Cell Access
loc[]
– Label-based indexingiloc[]
– Integer position-based indexingat[]
– Fast label-based scalar accessiat[]
– Fast integer-based scalar access
Example DataFrame Setup
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'London', 'Paris']
}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
print(df)
Output:
Name Age City
a Alice 25 New York
b Bob 30 London
c Charlie 35 Paris
1. Using loc[]
(Label-Based)
Access by row label and column name.
# Get Alice's Age (row 'a', column 'Age')
value = df.loc['a', 'Age']
print(value) # Output: 25
# Get Bob's City (row 'b', column 'City')
value = df.loc['b', 'City']
print(value) # Output: London
2. Using iloc[]
(Integer-Based)
Access by row position (0-indexed) and column position.
# Get Charlie's Name (row 2, column 0)
value = df.iloc[2, 0]
print(value) # Output: Charlie
# Get Bob's Age (row 1, column 1)
value = df.iloc[1, 1]
print(value) # Output: 30
3. Using at[]
(Fast Label Access)
Optimized for single scalar values (label-based).
# Get Alice's City
value = df.at['a', 'City']
print(value) # Output: New York
# ~3x faster than loc[] for single values
4. Using iat[]
(Fast Integer Access)
Optimized for single scalar values (position-based).
# Get Charlie's Age (row 2, column 1)
value = df.iat[2, 1]
print(value) # Output: 35
# ~3x faster than iloc[] for single values
Handling Edge Cases
Case 1: Duplicate Index Labels
df_dup = df.copy()
df_dup.index = ['a', 'a', 'b'] # Duplicate index 'a'
# loc[] returns a Series when duplicates exist
print(df_dup.loc['a', 'Age'])
Output:
a 25
a 30
Name: Age, dtype: int64
Case 2: Non-Existent Labels
# Use try-except to handle KeyErrors
try:
value = df.loc['x', 'Salary'] # Invalid labels
except KeyError:
print("Label not found!")
Case 3: Conditional Selection
# Get Age where Name is 'Bob'
value = df[df['Name'] == 'Bob']['Age'].values[0]
print(value) # Output: 30
Performance Comparison
Method | Use Case | Speed |
---|---|---|
at[] | Single cell by label | Fastest |
iat[] | Single cell by integer position | Fast |
loc[] | Label-based selection | Medium |
iloc[] | Position-based selection | Medium |
Best Practices
- Prefer
at[]/iat[]
for single values – Significant speed advantage - Use
loc[]/iloc[]
for ranges/slices:
# First two rows, 'Name' and 'City' columns
df.loc[['a','b'], ['Name','City']]
- Convert to dictionary for repeated access:
data_dict = df.set_index('Name').to_dict()
print(data_dict['Age']['Bob']) # Output: 30
Complete Example Workflow
# Create DataFrame
df = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry'],
'Price': [1.20, 0.50, 2.75],
'Stock': [15, 42, 8]
}, index=['Store1', 'Store2', 'Store3'])
# Access Banana's price using different methods
print(df.loc['Store2', 'Price']) # 0.50 (label)
print(df.iloc[1, 1]) # 0.50 (position)
print(df.at['Store2', 'Price']) # 0.50 (fast label)
print(df.iat[1, 1]) # 0.50 (fast position)
# Get Cherry's stock (conditional)
cherry_stock = df[df['Product']=='Cherry']['Stock'].values[0]
print(cherry_stock) # 8
Key Notes
- Indexing starts at 0 for
iloc[]
andiat[]
- Labels matter for
loc[]
andat[]
(checkdf.index
anddf.columns
) - Use
values
attribute for NumPy array conversion:
# Get entire row as array
df.loc['a'].values # ['Alice' 25 'New York']
- Chained indexing (e.g.,
df['Age']['a']
) may cause issues – prefer direct methods
By mastering these techniques, you’ll efficiently extract values from DataFrames for data analysis and manipulation tasks.