To sort a pandas DataFrame by one column, use the sort_values()
method. Below is a detailed explanation with examples:
Key Parameters of sort_values()
Parameter | Description |
---|---|
by | Column name (string) to sort by. |
ascending | True (default) for ascending order, False for descending. |
inplace | If True , modifies the DataFrame in-place (default False ). |
na_position | Place NaN values: 'last' (default) or 'first' . |
Examples
1. Basic Sorting (Ascending/Descending)
import pandas as pd
# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 22, 35],
'Salary': [50000, 80000, 45000, 90000]}
df = pd.DataFrame(data)
# Sort by 'Age' in ascending order (default)
sorted_asc = df.sort_values(by='Age')
print(sorted_asc)
Output:
Name Age Salary
2 Charlie 22 45000
0 Alice 25 50000
1 Bob 30 80000
3 David 35 90000
# Sort by 'Salary' in descending order
sorted_desc = df.sort_values(by='Salary', ascending=False)
print(sorted_desc)
Output:
Name Age Salary
3 David 35 90000
1 Bob 30 80000
0 Alice 25 50000
2 Charlie 22 45000
2. Handling Missing Values (NaN
)
# DataFrame with NaN
import numpy as np
data_nan = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, np.nan, 22, 35]}
df_nan = pd.DataFrame(data_nan)
# Sort with NaN at the top
sorted_nan_first = df_nan.sort_values(by='Age', na_position='first')
print(sorted_nan_first)
Output:
Name Age
1 Bob NaN # NaN placed first
2 Charlie 22.0
0 Alice 25.0
3 David 35.0
3. In-Place Sorting
Modify the original DataFrame instead of creating a new one:
df.sort_values(by='Name', inplace=True)
print(df)
Output:
Name Age Salary
0 Alice 25 50000
1 Bob 30 80000
2 Charlie 22 45000
3 David 35 90000
4. Sorting by Index After Sort
Reset the index to maintain order after sorting:
sorted_df = df.sort_values(by='Age').reset_index(drop=True)
print(sorted_df)
Output (index now starts at 0 and increments sequentially):
Name Age Salary
0 Charlie 22 45000
1 Alice 25 50000
2 Bob 30 80000
3 David 35 90000
Key Notes
- Immutable Operation: By default,
sort_values()
returns a new DataFrame. Useinplace=True
to modify the original. - Multiple Columns: To sort by multiple columns, pass a list:
df.sort_values(by=['Column1', 'Column2'])
. - Performance: Sorting is efficient with the
kind
parameter (e.g.,kind='mergesort'
for stable sorts).
Common Pitfalls
- Column Name Typos: Ensure the
by
column exists. - Ignoring Index: Use
reset_index(drop=True)
to clean up the index after sorting.
This method works for all data types (numeric, strings, datetime). For string columns, sorting is case-sensitive (use key=lambda col: col.str.lower()
for case-insensitive sorting).