To select multiple columns in a Pandas DataFrame, use one of the following methods:
1. Using Double Square Brackets [ ]
Pass a list of column names to select specific columns in a desired order:
import pandas as pd
# Sample DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000],
'Department': ['HR', 'Tech', 'Finance']
})
# Select 'Name' and 'Age' columns
selected_columns = df[['Name', 'Age']]
Output:
Name | Age | |
---|---|---|
0 | Alice | 25 |
1 | Bob | 30 |
2 | Charlie | 35 |
2. Using loc
(Label-Based Selection)
Select columns by their labels (names):
# Select 'Name' and 'Department' columns for all rows
selected_columns = df.loc[:, ['Name', 'Department']]
3. Using iloc
(Position-Based Selection)
Select columns by their integer positions (indexes):
# Select the first and third columns (positions 0 and 2)
selected_columns = df.iloc[:, [0, 2]]
Output:
Name | Salary | |
---|---|---|
0 | Alice | 50000 |
1 | Bob | 60000 |
2 | Charlie | 70000 |
4. Using filter()
Select columns by name patterns or exact matches:
# Select columns containing 'Name' or 'Age'
selected_columns = df.filter(items=['Name', 'Age'])
# Use regex to match patterns (e.g., columns ending with 't')
selected_columns = df.filter(regex='t$') # Selects 'Department'
5. Using Column Ranges
Select a sequence of columns by their positions:
# Select columns 0 to 2 (exclusive of the end index)
selected_columns = df.iloc[:, 0:2] # Columns 0 (Name) and 1 (Age)
Key Notes
- Order Preservation: Columns are returned in the order you specify.
- Return Type: All methods return a new DataFrame (a view or copy).
- Common Errors:
KeyError
: Occurs if a column name doesn’t exist.IndexError
: Occurs if using an invalid integer position withiloc
.
Advanced Selection
Combine with Conditions
# Select columns where the mean of numeric values > 30
numeric_cols = df.select_dtypes(include='number')
selected_columns = numeric_cols.loc[:, numeric_cols.mean() > 30]
Dynamic Column Selection
# Select columns containing 'Sal' in their names
selected_columns = df.loc[:, df.columns.str.contains('Sal')]
Summary Table
Method | Use Case | Example |
---|---|---|
df[['col1', 'col2']] | Simple column selection by name | df[['Name', 'Age']] |
loc | Label-based selection with flexibility | df.loc[:, ['Name', 'Salary']] |
iloc | Position-based selection | df.iloc[:, [0, 2]] |
filter() | Select columns by name patterns/regex | df.filter(items=['Name', 'Age']) |
Use these methods to efficiently extract subsets of columns from your DataFrame!