Mastering the Art of Selecting the First Row from Each GROUP BY in SQL

Response: To choose the initial row in each Group By cluster, the aggregate function such as min() can be utilized.

Choosing the initial row in every Group By cluster helps in identifying the earliest entry and permits you to retrieve the first entry of a particular column within each group. In this article, let’s delve into various methods in SQL for selecting the first row in each GROUP By cluster.

Contents Overview:

Approaches to Select the First Row in each Group By cluster in SQL

Initially, let’s establish a dataset for selecting the first row in each GroupBy cluster and utilize this table as a reference for all methods discussed.

Illustration:

--Creation of the students table:
 CREATE TABLE students (
    id INT PRIMARY KEY,
    student_name VARCHAR(50),
    sport VARCHAR(50),
    enrollment_date DATE
);

CREATE TABLE students (
    id INT,
    student_name VARCHAR(100),
    sport VARCHAR(50),
    enrollment_date DATE
);

INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', '2023-01-20'),
(2, 'Deepak', 'Basketball', '2023-03-05'),
(3, 'Arun', 'Football', '2023-01-10'),
(4, 'Bhuvan', 'Football', '2023-02-15'),
(5, 'Ram', 'Kabaddi', '2023-02-01');

SELECT * FROM students;

Outcome:

Mastering the Art of Selecting the First Row from Each GROUP BY in SQL

Approach 1: Utilizing aggregate function min() in SQL

The min() aggregate function in SQL retrieves the lowest value from within the group.

Illustration:

SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
    SELECT sport, MIN(enrollment_date) AS first_enrollment
    FROM students
    GROUP BY sport
) t ON s.sport = t.sport AND s.enrollment_date = t.first_enrollment;

Outcome:

Clarification: The MIN(enrollment_date) AS first_enrollment fetched the data for the student who enrolled earliest in their sport.

Approach 2: The FETCH instruction with DISTINCT ON in SQL

The FETCH instruction with DISTINCT ON will pull the first row from the Group BY cluster sorted data. This method will execute efficiently in Postgresql.

Illustration:

SELECT DISTINCT ON (sport) id, student_name, sport, enrollment_date
FROM students
ORDER BY sport, enrollment_date ASC
FETCH FIRST 1 ROW ONLY;

Outcome:

Clarification: The DISTINCT ON (sport) ID obtains the data of the sport that had the earliest enrollment, which in this case is basketball.

Approach 3: Implementing JOIN with a subquery that orders students in SQL

The JOIN with a subquery in SQL includes assigning values to each row based on a specified user-defined condition. This method will perform effectively in Postgresql.

Illustration:

-- Query to rank students by enrollment date and join to find the first enrollment in each sport
SELECT s.id, s.student_name, s.sport, s.enrollment_date
FROM students s
JOIN (
    SELECT 
        id, 
        sport, 
        ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date) AS rank
    FROM students
) ranked_students ON s.id = ranked_students.id
WHERE ranked_students.rank = 1;

Outcome:

Clarification: This (PARTITION BY sport ORDER BY enrollment_date ASC) subquery retrieved the data pertaining to students who ranked first in their respective groups based on enrollment dates.

Approach 4: Utilizing ROW_NUMBERS in SQL

The ROW_NUMBER() function in SQL assigns a distinct row number to each entry within the group after sorting. This technique will work well in Postgresql.

Illustration:

WITH RankedStudents AS (
    SELECT id, student_name, sport, enrollment_date,
           ROW_NUMBER() OVER (ORDER BY enrollment_date ASC) AS row_num
    FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;

Outcome:

Clarification:

The information regarding students is retrieved based on the criterion “rank.” The student who registered first is given the highest rank.

Utilizing ROW_NUMBERS with Partition By command in SQL

The ROW_NUMBER command combined with Partition By will divide the data into smaller subsets and then select the initial row from every subset. This method will function effectively with Postgresql.

Illustration:

WITH RankedStudents AS (
    SELECT id, student_name, sport, enrollment_date,
        ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
    FROM students
)
SELECT id, student_name, sport, enrollment_date
FROM RankedStudents
WHERE row_num = 1;

Result:

Clarification: This Partition By command will retrieve the first entry of each subset; here Chahar, Arun, and Ram were the earliest to enroll in their respective sports.

Method 5: Employing FILTER command with min() in SQL

The FILTER command in SQL applies aggregate functions such as min() to obtain the first entry in each group. This will determine the values according to the set condition. It will effectively execute on Postgresql.

Illustration:

SELECT
    sport,
    MIN(student_name) FILTER (
        WHERE enrollment_date = (SELECT MIN(enrollment_date) 
                                 FROM students s2 
                                 WHERE s2.sport = s1.sport)
    ) AS first_student,
    MIN(enrollment_date) AS first_enrollment
FROM students s1
GROUP BY sport;

Result:

Clarification: By utilizing the FILTER WHERE enrollment_date, the dataset of students who registered first for the sport is retrieved.

How to Address Ties in the Dataset?

By employing the command KEEP (DENSE_RANK FIRST), this approach does not require any subqueries as used in the ROW_NUMBER command. It proves to be very beneficial when we aim to fetch the primary entry without utilizing subqueries. This approach will function efficiently in the Oracle SQL database.

Illustration:

-- 1. Create the 'students' table
CREATE TABLE students (
    id INT,
    student_name VARCHAR2(100),
    sport VARCHAR2(50),
    enrollment_date DATE
);

-- 2. Insert records into the 'students' table
INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(2, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(3, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(4, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));

Result:

SELECT 
    sport, 
    student_name, 
    TO_CHAR(enrollment_date, 'DD-MON-YYYY') AS enrollment_date
FROM (
    SELECT 
           sport, 
           student_name, 
           enrollment_date,
           DENSE_RANK() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS rank
    FROM students
)
WHERE rank = 1;

Clarification: This KEEP (DENSE_RANK FIRST) method will retrieve the details of students who registered first for each sport. It compared participants of identical sports and provided the output of the one who registered first. A separate entry was created for Kabaddi since it contains solely one student.

Alternative Approach with PL/SQL

If dealing with a substantial dataset, PL/SQL can effectively retrieve the first record from each GroupBy group. Should you need to extract the initial record from a selection of rows, Partition By combined with ORDER BY subqueries will prove beneficial. It can manage large datasets proficiently.

Illustration:

CREATE TABLE students (
    id INT PRIMARY KEY,
    student_name VARCHAR(50),
    sport VARCHAR(50),
    enrollment_date DATE
);

INSERT INTO students (id, student_name, sport, enrollment_date) VALUES
(1, 'Arun', 'Football', TO_DATE('2023-01-10', 'YYYY-MM-DD')),
(2, 'Bhuvan', 'Football', TO_DATE('2023-02-15', 'YYYY-MM-DD')),
(3, 'Chahar', 'Basketball', TO_DATE('2023-01-20', 'YYYY-MM-DD')),
(4, 'Deepak', 'Basketball', TO_DATE('2023-03-05', 'YYYY-MM-DD')),
(5, 'Ram', 'Kabaddi', TO_DATE('2023-02-01', 'YYYY-MM-DD'));

SET SERVEROUTPUT ON;
DECLARE
    CURSOR c_students IS 
        SELECT sport, student_name, enrollment_date
        FROM (
            SELECT sport, student_name, enrollment_date,
                   ROW_NUMBER() OVER (PARTITION BY sport ORDER BY enrollment_date ASC) AS row_num
            FROM students
        )
        WHERE row_num = 1;
BEGIN
    FOR student_rec IN c_students LOOP
        DBMS_OUTPUT.PUT_LINE(student_rec.sport || ': ' || student_rec.student_name);
    END LOOP;
END;

Result:

Clarification: Using PL/SQL, we retrieved the details of the first-enrolled individuals from the student entries alongside matching data from the sports entries.

Practical Applications of GROUP BY Clause in SQL

1. Healthcare & Patient Documentation: This method is employed in healthcare facilities for the documentation of patients’ medical history. By utilizing these techniques, we can extract the initial appointment dates for each patient.

Illustration:

-- Create the appointments table
CREATE TABLE appointments (
    patient_id INT,
    appointment_date DATE
);

-- Insert records into the appointments table
INSERT INTO appointments (patient_id, appointment_date) VALUES
(101, TO_DATE('2023-03-15', 'YYYY-MM-DD')),
(102, TO_DATE('2023-02-10', 'YYYY-MM-DD')),
(103, TO_DATE('2023-04-20', 'YYYY-MM-DD'));

-- Query to display patient_id and the first (earliest) appointment date
SELECT patient_id, appointment_date AS first_appointment
FROM

“`html
appointments
WHERE appointment_date = (SELECT MIN(appointment_date) FROM appointments);

Result:

Clarification: This will be beneficial when retrieving data of individuals who secured an appointment earliest. MIN(appointment_date) AS first_appointment This query retrieves the details of the foremost individual from the appointment records. It operates effectively within the PL/SQL server.

2. In Banking & Finance: This query enables you to present the details of the initial transaction, followed by subsequent transactions.

Example:

CREATE TABLE transactions (
    account_id INT,
    transaction_date DATE
);

-- Insert information into the transactions table
INSERT INTO transactions (account_id, transaction_date) VALUES
(1001, TO_DATE('2023-05-12', 'YYYY-MM-DD')),
(1002, TO_DATE('2023-03-08', 'YYYY-MM-DD')),
(1003, TO_DATE('2023-06-25', 'YYYY-MM-DD'));

-- Query to showcase the account_id and the first (earliest) transaction date
SELECT account_id, transaction_date AS first_transaction
FROM transactions
WHERE transaction_date = (SELECT MIN(transaction_date) FROM transactions);

Result:

Mastering the Art of Selecting the First Row from Each GROUP BY in SQL

Clarification: This retrieves the transaction details and narrows it down based on their transaction ID.

Final Thoughts

Selecting the initial row within each GroupBy category is vital when managing extensive data sets or time-related records such as appointments and transactions. There are various techniques to obtain the first row in each GroupBy category, including FETCH, MIN(), ROW_NUMBER, and FILTER combined with subqueries. These approaches assist in filtering or acquiring the earliest records from the database, which can be time-efficient. By grasping these techniques, you can proficiently choose the first row in every GROUP BY group within SQL.

The publication How to Select the First Row in Each GROUP BY Group in SQL? was first featured on Intellipaat Blog.

“`

Approaches to Select the First Row in each Group By cluster in SQL

Approach 1: Utilizing aggregate function min() in SQL

Approach 2: The FETCH instruction with DISTINCT ON in SQL

Approach 3: Implementing JOIN with a subquery that orders students in SQL

Approach 4: Utilizing ROW_NUMBERS in SQL

Method 5: Employing FILTER command with min() in SQL

How to Address Ties in the Dataset?

Alternative Approach with PL/SQL

Practical Applications of GROUP BY Clause in SQL

Final Thoughts

Leave a Reply Cancel reply