To select the first row in each GROUP BY
group in SQL, use the ROW_NUMBER()
window function (standard SQL) or database-specific methods like DISTINCT ON
(PostgreSQL). Below are the most efficient approaches:
1. Using ROW_NUMBER()
(Standard SQL, Works in Most Databases)
Assign a row number to each group and select the first row from each group.
Example Table (sales
):
sale_id | region | sale_date | amount |
---|---|---|---|
1 | East | 2023-01-05 | 100 |
2 | East | 2023-01-01 | 150 |
3 | West | 2023-01-03 | 200 |
4 | West | 2023-01-02 | 250 |
Query:
WITH ranked_sales AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY region
ORDER BY sale_date -- Order by the column defining "first" (e.g., earliest date)
) AS rn
FROM sales
)
SELECT sale_id, region, sale_date, amount
FROM ranked_sales
WHERE rn = 1; -- Select first row per group
Result:
sale_id | region | sale_date | amount |
---|---|---|---|
2 | East | 2023-01-01 | 150 |
4 | West | 2023-01-02 | 250 |
2. Using DISTINCT ON
(PostgreSQL Only)
Directly select the first row per group without a CTE:
SELECT DISTINCT ON (region) *
FROM sales
ORDER BY region, sale_date; -- Order by group column and the column defining "first"
3. Using a Subquery with MIN()
/MAX()
(Works in MySQL, SQLite, etc.)
Find the earliest/latest value in the group and join back to the original table:
SELECT s.*
FROM sales s
INNER JOIN (
SELECT
region,
MIN(sale_date) AS earliest_date -- Replace with MAX() for the latest row
FROM sales
GROUP BY region
) sub ON s.region = sub.region AND s.sale_date = sub.earliest_date;
Limitations:
- Fails if multiple rows share the same
MIN()
/MAX()
value in a group. - Requires a unique identifier for exact matches.
Key Considerations
- Ordering: Use
ORDER BY
in the window function to define what “first” means (e.g., earliest date, smallest ID). - Ties: Use
RANK()
instead ofROW_NUMBER()
if you want to include ties (multiple “first” rows). - Performance: The
ROW_NUMBER()
method is efficient for large datasets.
Handling Ties (Multiple “First” Rows)
To include all rows tied for first place, use RANK()
:
WITH ranked_sales AS (
SELECT
*,
RANK() OVER (PARTITION BY region ORDER BY sale_date) AS rk
FROM sales
)
SELECT sale_id, region, sale_date, amount
FROM ranked_sales
WHERE rk = 1;
Database-Specific Notes
Database | Best Method |
---|---|
PostgreSQL | DISTINCT ON |
MySQL | Subquery with MIN() /MAX() |
SQL Server | ROW_NUMBER() |
Oracle | ROW_NUMBER() |
SQLite | ROW_NUMBER() (v3.25+) |
Summary
- Standard Approach: Use
ROW_NUMBER()
with a CTE. - PostgreSQL: Prefer
DISTINCT ON
for simplicity. - Legacy Databases: Use subqueries with
MIN()
/MAX()
and joins.