Window functions (part3) - new functions
In the first and the second part of window
function series, I talked about the OVER
clause, with/without partitions and with/without ordering. Till then, no new function had been
introduced, so it has about the already existing aggregate functions.
Now, new window-only functions has been introduced. I'll separate them in two groups: ranking and navigational. Both set of functions can be used with/without partition/ordering, but them does not make much sense without ordering.
Ranking functions (DENSE_RANK, RANK and ROW_NUMBER)
With these functions, one can create different type of incremental counters. Think about SUM(1) OVER (ORDER BY SALARY)
, these functions
does this type of thing, but all of them in different ways. Lets see an example query, also comparing with the SUM
behavior.
select
id,
salary,
dense_rank() over (order by salary),
rank() over (order by salary),
row_number() over (order by salary),
sum(1) over (order by salary)
from employee
order by salary;
And the result set:
id | salary | dense_rank | rank | row_number | sum |
---|---|---|---|---|---|
3 | 8.00 | 1 | 1 | 1 | 1 |
4 | 9.00 | 2 | 2 | 2 | 2 |
1 | 10.00 | 3 | 3 | 3 | 4 |
5 | 10.00 | 3 | 3 | 4 | 4 |
2 | 12.00 | 4 | 5 | 5 | 5 |
As you see, the functions differs when repeated values are found in the order key (the salary of 10.00). With DENSE_RANK
, no gaps are
created and all repeated values receive the same counter. With RANK
, the initial counter is used for all repeated values, but it make
gaps so the next non-repeating value (the salary of 12.00) does not consider the previously repeated values as in different positions.
With ROW_NUMBER
, each line receives an incremental value. And SUM(1)
is very like RANK
, but the value is computed after all repeated
values are summed, so the gap is before the repeated values.
Navigational functions (LAG and LEAD)
With these functions, an expression could get the value of a previously (LAG
) or a posterior (LEAD
) row of the query. Follow the
demonstrating query.
select
id,
salary,
lag(salary) over (order by salary),
lead(salary) over (order by salary)
from employee
order by salary;
And the result set:
id | salary | lag | lead |
---|---|---|---|
3 | 8.00 | <null> |
9.00 |
4 | 9.00 | 8.00 | 10.00 |
1 | 10.00 | 9.00 | 10.00 |
5 | 10.00 | 10.00 | 12.00 |
2 | 12.00 | 10.00 | <null> |
The functions have a second argument, that is implicitly 1, which means the number of previously/posterior rows will be queried. If this
row does not exist, <null>
is returned.
And of course, you can use these functions as expressions and compute, for example, the difference of an employee salary from the one greater and/or lesser.