Window Functions

Posted on January 19, 2010 in firebird

By the SQL specification, window functions are a kind of aggregation, but which does not "filter" the result set of a query. The aggregated data is mixed with the query result set. That sort of functions are used with the OVER clause. Users of Oracle also knows window functions as analytical functions.

We have promissed very basic support for window functions in Firebird 3, which was the OVER () clause using the current aggregate functions. With the OVER () clause, one can mix with the query result set the aggregated data over the entire result set. Let me explain with an example.

We have a table EMPLOYEE with columns ID, NAME and SALARY, and want to show each employee with his respective salary and the percentage of his salary over the payroll. With a "normal" query, we would do:

select
    id,
    name,
    salary,
    salary / (select sum(salary) from employee) percentage
  from employee;

We need to repeat ourselves and wait so much to see the results. We can also make it hopefully faster using a cross join, but still the whole employee table will need to be read more than one time. Change the table by a complex view or add various "windows" and we'll have a performance problem for sure.

The same query could be specified in much more elegant and faster way using a window function. Here is it how:

select
    id,
    name,
    salary,
    salary / sum(salary) over () percentage
  from employee;

Here, sum(salary) over () is computed with the sum of all SALARY from the query (the employee table). This is what the OVER () clause does.

But the OVER clause is not just that, and now I've added another of its subclauses to Firebird. It's the PARTITION subclause.

A partition is a way to make the OVER aggregation based on a GROUP. Its syntax is:

<window function>([<expr>]) OVER (PARTITION BY <expr> [, <expr> ...])

So now since the aggregation is done over a group, it could produce more than one row. So the result set generated by a partition is joined with the main query using the same expression list of the partition.

For a example, lets add to our employee table a ROLE column. Now we want the same information, but instead of see the percentage of the employee salary over all employees, we want to see that value based on the employees occuping the same role. Here is how that query could be written:

select
    id,
    name,
    role,
    salary,
    salary / sum(salary) over (partition by role)
  from employee;

With the current implementation, the query (without the window functions) is executed one time and cached. Extra aggregation is done on this cached data. Partitions are joined by the new hash join algorithmn, and the main window (OVER without a partition) is joined through a cross join with the main query.

There is much more about window functions, but this subset seems to cover some major use cases of them.