How to selecting multiple columns in a Pandas dataframe ?

To select multiple columns in a Pandas DataFrame, use one of the following methods:

1. Using Double Square Brackets [ ]

Pass a list of column names to select specific columns in a desired order:

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000],
    'Department': ['HR', 'Tech', 'Finance']
})

# Select 'Name' and 'Age' columns
selected_columns = df[['Name', 'Age']]

Output:

NameAge
0Alice25
1Bob30
2Charlie35

2. Using loc (Label-Based Selection)

Select columns by their labels (names):

# Select 'Name' and 'Department' columns for all rows
selected_columns = df.loc[:, ['Name', 'Department']]

3. Using iloc (Position-Based Selection)

Select columns by their integer positions (indexes):

# Select the first and third columns (positions 0 and 2)
selected_columns = df.iloc[:, [0, 2]]

Output:

NameSalary
0Alice50000
1Bob60000
2Charlie70000

4. Using filter()

Select columns by name patterns or exact matches:

# Select columns containing 'Name' or 'Age'
selected_columns = df.filter(items=['Name', 'Age'])

# Use regex to match patterns (e.g., columns ending with 't')
selected_columns = df.filter(regex='t$')  # Selects 'Department'

5. Using Column Ranges

Select a sequence of columns by their positions:

# Select columns 0 to 2 (exclusive of the end index)
selected_columns = df.iloc[:, 0:2]  # Columns 0 (Name) and 1 (Age)

Key Notes

  • Order Preservation: Columns are returned in the order you specify.
  • Return Type: All methods return a new DataFrame (a view or copy).
  • Common Errors:
  • KeyError: Occurs if a column name doesn’t exist.
  • IndexError: Occurs if using an invalid integer position with iloc.

Advanced Selection

Combine with Conditions

# Select columns where the mean of numeric values > 30
numeric_cols = df.select_dtypes(include='number')
selected_columns = numeric_cols.loc[:, numeric_cols.mean() > 30]

Dynamic Column Selection

# Select columns containing 'Sal' in their names
selected_columns = df.loc[:, df.columns.str.contains('Sal')]

Summary Table

MethodUse CaseExample
df[['col1', 'col2']]Simple column selection by namedf[['Name', 'Age']]
locLabel-based selection with flexibilitydf.loc[:, ['Name', 'Salary']]
ilocPosition-based selectiondf.iloc[:, [0, 2]]
filter()Select columns by name patterns/regexdf.filter(items=['Name', 'Age'])

Use these methods to efficiently extract subsets of columns from your DataFrame!

Leave a Reply

Your email address will not be published. Required fields are marked *