window-functions-in-sql

The SQL window function is an essential capability for all individuals, as it allows for the computation of a specific subset of rows while preserving the entirety of the rows in the dataset. This concept is known as “windows.” These play a significant role in sophisticated data analysis and database administration.

In this document, we will explore the meaning of SQL window functions, their types, and practical illustrations with respect to their functionality.

Table of Contents

What are Window Functions in SQL Server?

SQL window functions enable you to derive values over a collection of records associated with the current record, all without aggregating them. Unlike aggregate functions like SUM() or AVG(), window functions maintain the total row count in the dataset rather than altering it; they instead enhance understanding of individual records within the data.

Window functions can compute running totals, ranks, or moving averages while keeping the full dataset unchanged.

Types of Window Functions

SQL window functions can be divided into three principal categories:

1. Ranking Functions

In the realm of SQL, ranking functions are utilized to assign a rank to specific rows based on a certain order. They organize the dataset without omitting any rows.

For our implementation, we will create an employee table comprising columns for employee_id, name, department, and salary.

CREATE TABLE employees (

    employee_id INT PRIMARY KEY,

    name VARCHAR(100),

    department VARCHAR(50),

    salary DECIMAL(10,2)

);

INSERT INTO employees (employee_id, name, department, salary) VALUES
(1, 'Alice', 'HR', 60000),
(2, 'Bob', 'HR', 75000),
(3, 'Charlie', 'HR', 70000),
(4, 'David', 'Engineering', 90000),
(5, 'Eve', 'Engineering', 85000),
(6, 'Frank', 'Engineering', 95000),
(7, 'Grace', 'Marketing', 72000),
(8, 'Hank', 'Marketing', 78000),
(9, 'Ivy', 'Marketing', 75000);

1. ROW_NUMBER()

This function provides a distinct value to each row in a specific order without any ties. In this context, there are no duplicates, and every row receives a unique number, such as 1, 2, 3, and so on.

SELECT employee_id, department, salary,
       ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) AS row_num
FROM employees;

2. RANK()

This function assigns a rank to each row within a specified order. Its ranking skips numbers if there are ties. For example, if two rows share a rank value of 2, the next row would be assigned a rank of 4.

 SELECT employee_id, department, salary,
       RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS rank
FROM employees;

3. DENSE_RANK()

This function acts as a variant of the RANK function. It does not bypass ranks for duplicate values. For instance, if there are two rows with a rank of 2, the next row will receive a rank value of 3.

SELECT employee_id, department, salary,
       DENSE_RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dense_rank
FROM employees;

4. NTILE()

This function divides the data into approximately equal segments (as closely as possible) after partitioning it into n sections and assigns a group number to each row. For example, if the data is segmented into 4 sections, each row will be assigned a number from 1 to 4.

SELECT employee_id, department, salary,
       NTILE(4) OVER (PARTITION BY department ORDER BY salary DESC) AS quartile

2. Aggregate Functions

Aggregate functions in SQL are unique mechanisms that consolidate values from numerous rows into a singular result. They operate in the background to furnish totals, averages, counts, etc.

1. SUM()

This function computes the total value of all entries in a particular column. It is predominantly used when deriving a total is necessary, for instance, calculating the overall revenue of a company by summing all sales.

SELECT employee_id, department, salary,

       SUM(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS running_total
FROM employees;

2. AVG()

This function determines the average value of items within a column. It is particularly useful for identifying trends, such as finding the average salary of employees in a department.

SELECT employee_id, department, salary,

       AVG(salary) OVER (PARTITION BY department) AS average_salary
FROM employees;

3. MIN() & MAX()

MIN() – This function fetches the minimum value (lowest amount) from the column. It is beneficial for identifying the smallest value, for example, determining the lowest product price in an inventory.

MAX() – This function retrieves the maximum value (highest amount) from the column. It is useful for identifying the most valuable assets, such as finding the highest marks in a class.

SELECT employee_id, department, salary,

       MIN(salary) OVER (PARTITION BY department) AS min_salary,

       MAX(salary) OVER (PARTITION BY department) AS max_salary
FROM employees;

3. Analytic Functions

Analytic functions in SQL are analogous to window functions. They execute computations within a dataset based on a defined set of rows, yet they do not modify the displayed data. These functions enable one to view data in a more complex and multivariate manner while preserving the original dataset.

1. LEAD()

This function allows access to the next row’s value in a column. It is advantageous for making comparisons between the current row and the following one.

Example: Compare today’s sales figures with those of the next day.

SELECT employee_id, department, salary,
       LEAD(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS next_salary
FROM employees;

2. LAG()

This allows you to access the value from the preceding row in a column. It is beneficial for contrasting the present row with the one that precedes it. For instance: Compare today’s earnings with those from the previous day.

SELECT employee_id, department, salary,
       LAG(salary, 1) OVER (PARTITION BY department ORDER BY salary DESC) AS previous_salary
FROM employees;

3. FIRST_VALUE()

This function returns the initial value within a defined set of rows. For example: Retrieve the first product price from an ordered list.

4. LAST_VALUE()

This function returns the final value within a defined set of rows. For example: Retrieve the last order date from an arranged list of orders.

SELECT employee_id, department, salary,
       FIRST_VALUE(salary) OVER (PARTITION BY department ORDER BY salary DESC) AS highest_salary,
       LAST_VALUE(salary) OVER (
    PARTITION BY department 
    ORDER BY salary DESC 
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS lowest_salary

Applications of Window Functions with Illustrations

1. Calculating Accumulated Totals

Window functions facilitate the monitoring of accumulated totals, where each row signifies the total amount from the beginning up to that row.

SELECT employee_id, sales_date, sales_amount,

       SUM(sales_amount) OVER (PARTITION BY employee_id ORDER BY sales_date) AS running_total

FROM sales;

2. Ranking Leading Performers

Using window functions, one can monitor the top n entries, such as highest sales or leading employees, without losing sight of the underlying data.

SELECT employee_id, department, performance_score,

       RANK() OVER (PARTITION BY department ORDER BY performance_score DESC) AS rank

FROM employee_performance

WHERE rank <= 3;

3. Calculating Average Trends

One can monitor average trends to eliminate short-term variations and discern underlying patterns.

SELECT sales_date, sales_amount,

       AVG(sales_amount) OVER (ORDER BY sales_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg

FROM sales;

4. Comparing Present and Past Sales

You can compare the current row’s value with that of a previous row effortlessly, for instance, by tracking today’s sales against yesterday’s.

SELECT sales_date, sales_amount,

       LAG(sales_amount, 1) OVER (ORDER BY sales_date) AS previous_sales,

       (sales_amount - LAG(sales_amount, 1) OVER (ORDER BY sales_date)) AS sales_growth

FROM sales;

5. Recognizing Initial and Final Transactions

Window functions facilitate the tracking of the first or last value within a series, such as the first transaction in a series of transactions or the last one.

SELECT customer_id, transaction_date, amount,

       FIRST_VALUE(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date) AS first_transaction,

       LAST_VALUE(transaction_date) OVER (PARTITION BY customer_id ORDER BY transaction_date ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS last_transaction

FROM transactions;

Performance Aspects of Window Functions

Similar to aggregation functions, window functions are frequently not fully optimized. For instance, ensure that the column utilized in the PARTITION BY is indexed for swift access. If it’s unindexed, the system must scan each row, leading to slow query performance. Furthermore, performance may be influenced by sorting (ORDER BY), particularly when handling extensive datasets. Processing times can escalate when multiple window functions are present within a single query. To enhance efficiency, consider limiting the number of rows processed by employing ROWS BETWEEN instead of considering the entire dataset. If queries remain sluggish, decompose them into simpler components or opt for materialized views to expedite query execution.

Final Thoughts

SQL utilizes window functions to assist in managing intricate calculations without sacrificing the nuance of each distinct record. It enables the execution of complex tasks, including data ranking, cumulative row totals, and inter-row comparisons seamlessly. Functions such as ROW_NUMBER, RANK, LEAD, and LAG significantly enhance your analysis capabilities. Consequently, mastering window functions broadens your proficiency in SQL and allows for more sophisticated data analysis methods.

The article Window Functions in SQL appeared first on Intellipaat Blog.


Leave a Reply

Your email address will not be published. Required fields are marked *

Share This