Window functions (part3) - new functions

Posted on February 19, 2010 in firebird

In the first and the second part of window function series, I talked about the OVER clause, with/without partitions and with/without ordering. Till then, no new function had been introduced, so it has about the already existing aggregate functions.

Now, new window-only functions has been introduced. I'll separate them in two groups: ranking and navigational. Both set of functions can be used with/without partition/ordering, but them does not make much sense without ordering.

Ranking functions (DENSE_RANK, RANK and ROW_NUMBER)

With these functions, one can create different type of incremental counters. Think about SUM(1) OVER (ORDER BY SALARY), these functions does this type of thing, but all of them in different ways. Lets see an example query, also comparing with the SUM behavior.

select
    id,
    salary,
    dense_rank() over (order by salary),
    rank() over (order by salary),
    row_number() over (order by salary),
    sum(1) over (order by salary)
  from employee
  order by salary;

And the result set:

id	salary	dense_rank	rank	row_number	sum
3	8.00	1	1	1	1
4	9.00	2	2	2	2
1	10.00	3	3	3	4
5	10.00	3	3	4	4
2	12.00	4	5	5	5

As you see, the functions differs when repeated values are found in the order key (the salary of 10.00). With DENSE_RANK, no gaps are created and all repeated values receive the same counter. With RANK, the initial counter is used for all repeated values, but it make gaps so the next non-repeating value (the salary of 12.00) does not consider the previously repeated values as in different positions. With ROW_NUMBER, each line receives an incremental value. And SUM(1) is very like RANK, but the value is computed after all repeated values are summed, so the gap is before the repeated values.

Navigational functions (LAG and LEAD)

With these functions, an expression could get the value of a previously (LAG) or a posterior (LEAD) row of the query. Follow the demonstrating query.

select
    id,
    salary,
    lag(salary) over (order by salary),
    lead(salary) over (order by salary)
  from employee
  order by salary;

And the result set:

id	salary	lag	lead
3	8.00	`<null>`	9.00
4	9.00	8.00	10.00
1	10.00	9.00	10.00
5	10.00	10.00	12.00
2	12.00	10.00	`<null>`

The functions have a second argument, that is implicitly 1, which means the number of previously/posterior rows will be queried. If this row does not exist, <null> is returned.

And of course, you can use these functions as expressions and compute, for example, the difference of an employee salary from the one greater and/or lesser.