How to sort pandas dataframe by one column ?

To sort a pandas DataFrame by one column, use the sort_values() method. Below is a detailed explanation with examples:

Key Parameters of sort_values()

ParameterDescription
byColumn name (string) to sort by.
ascendingTrue (default) for ascending order, False for descending.
inplaceIf True, modifies the DataFrame in-place (default False).
na_positionPlace NaN values: 'last' (default) or 'first'.

Examples

1. Basic Sorting (Ascending/Descending)

import pandas as pd

# Sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 22, 35],
        'Salary': [50000, 80000, 45000, 90000]}
df = pd.DataFrame(data)

# Sort by 'Age' in ascending order (default)
sorted_asc = df.sort_values(by='Age')
print(sorted_asc)

Output:

      Name  Age  Salary
2  Charlie   22   45000
0    Alice   25   50000
1      Bob   30   80000
3    David   35   90000
# Sort by 'Salary' in descending order
sorted_desc = df.sort_values(by='Salary', ascending=False)
print(sorted_desc)

Output:

      Name  Age  Salary
3    David   35   90000
1      Bob   30   80000
0    Alice   25   50000
2  Charlie   22   45000

2. Handling Missing Values (NaN)

# DataFrame with NaN
import numpy as np
data_nan = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
            'Age': [25, np.nan, 22, 35]}
df_nan = pd.DataFrame(data_nan)

# Sort with NaN at the top
sorted_nan_first = df_nan.sort_values(by='Age', na_position='first')
print(sorted_nan_first)

Output:

      Name   Age
1      Bob   NaN  # NaN placed first
2  Charlie  22.0
0    Alice  25.0
3    David  35.0

3. In-Place Sorting

Modify the original DataFrame instead of creating a new one:

df.sort_values(by='Name', inplace=True)
print(df)

Output:

      Name  Age  Salary
0    Alice   25   50000
1      Bob   30   80000
2  Charlie   22   45000
3    David   35   90000

4. Sorting by Index After Sort

Reset the index to maintain order after sorting:

sorted_df = df.sort_values(by='Age').reset_index(drop=True)
print(sorted_df)

Output (index now starts at 0 and increments sequentially):

      Name  Age  Salary
0  Charlie   22   45000
1    Alice   25   50000
2      Bob   30   80000
3    David   35   90000

Key Notes

  1. Immutable Operation: By default, sort_values() returns a new DataFrame. Use inplace=True to modify the original.
  2. Multiple Columns: To sort by multiple columns, pass a list:
    df.sort_values(by=['Column1', 'Column2']).
  3. Performance: Sorting is efficient with the kind parameter (e.g., kind='mergesort' for stable sorts).

Common Pitfalls

  • Column Name Typos: Ensure the by column exists.
  • Ignoring Index: Use reset_index(drop=True) to clean up the index after sorting.

This method works for all data types (numeric, strings, datetime). For string columns, sorting is case-sensitive (use key=lambda col: col.str.lower() for case-insensitive sorting).

Leave a Reply

Your email address will not be published. Required fields are marked *