How to select first row in each GROUP BY group?

To select the first row in each GROUP BY group in SQL, use the ROW_NUMBER() window function (standard SQL) or database-specific methods like DISTINCT ON (PostgreSQL). Below are the most efficient approaches:

1. Using `ROW_NUMBER()` (Standard SQL, Works in Most Databases)

Assign a row number to each group and select the first row from each group.

Example Table (sales):

sale_id	region	sale_date	amount
1	East	2023-01-05	100
2	East	2023-01-01	150
3	West	2023-01-03	200
4	West	2023-01-02	250

Query:

WITH ranked_sales AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (
      PARTITION BY region 
      ORDER BY sale_date  -- Order by the column defining "first" (e.g., earliest date)
    ) AS rn
  FROM sales
)
SELECT sale_id, region, sale_date, amount
FROM ranked_sales
WHERE rn = 1;  -- Select first row per group

Result:

sale_id	region	sale_date	amount
2	East	2023-01-01	150
4	West	2023-01-02	250

2. Using `DISTINCT ON` (PostgreSQL Only)

Directly select the first row per group without a CTE:

SELECT DISTINCT ON (region) *
FROM sales
ORDER BY region, sale_date;  -- Order by group column and the column defining "first"

3. Using a Subquery with `MIN()`/`MAX()` (Works in MySQL, SQLite, etc.)

Find the earliest/latest value in the group and join back to the original table:

SELECT s.*
FROM sales s
INNER JOIN (
  SELECT 
    region, 
    MIN(sale_date) AS earliest_date  -- Replace with MAX() for the latest row
  FROM sales
  GROUP BY region
) sub ON s.region = sub.region AND s.sale_date = sub.earliest_date;

Limitations:

Fails if multiple rows share the same MIN()/MAX() value in a group.
Requires a unique identifier for exact matches.

Key Considerations

Ordering: Use ORDER BY in the window function to define what “first” means (e.g., earliest date, smallest ID).
Ties: Use RANK() instead of ROW_NUMBER() if you want to include ties (multiple “first” rows).
Performance: The ROW_NUMBER() method is efficient for large datasets.

Handling Ties (Multiple “First” Rows)

To include all rows tied for first place, use RANK():

WITH ranked_sales AS (
  SELECT 
    *,
    RANK() OVER (PARTITION BY region ORDER BY sale_date) AS rk
  FROM sales
)
SELECT sale_id, region, sale_date, amount
FROM ranked_sales
WHERE rk = 1;

Database-Specific Notes

Database	Best Method
PostgreSQL	`DISTINCT ON`
MySQL	Subquery with `MIN()`/`MAX()`
SQL Server	`ROW_NUMBER()`
Oracle	`ROW_NUMBER()`
SQLite	`ROW_NUMBER()` (v3.25+)

Summary

Standard Approach: Use ROW_NUMBER() with a CTE.
PostgreSQL: Prefer DISTINCT ON for simplicity.
Legacy Databases: Use subqueries with MIN()/MAX() and joins.

1. Using ROW_NUMBER() (Standard SQL, Works in Most Databases)

2. Using DISTINCT ON (PostgreSQL Only)

3. Using a Subquery with MIN()/MAX() (Works in MySQL, SQLite, etc.)