DSQL/BLR compilers internals

Posted on in firebird

This post is some raw notes and is not very understandable by people who never worked with this code. In fact, the old implementation was so ugly that it was not simple even for me (who is constantly working in this area) to achieve this result. It needed many tries and reverts, but IMNSHO ended very well. Also, this was a large work that took many time, and I'm just writing some records here.

Back in 2006, just after being invited to take DSQL work, I started to think what would we need to do to improve the compilers (DSQL and BLR) internals. My look started at src/dsql/pass1.cpp and src/jrd/cmp.cpp. Till then, even the addition of a simple function was a nightmare. This dialog happened by Nickolay Samofatov and myself in September-2004:

Adriano: I added the "Lower" function. I don't knew that it is so difficult. "Length" will be added using blr too?

Nickolay: why difficult?

Nickolay: just fix 15 files or so :-)

So I spent some years fixing "15 files or so" to add each feature and became very bored. In v2.5, to implement ALTER CHARACTER SET and AUTONOMOUS TRANSACTIONS statements, I added a "compatibility layer" to implement statements in an more OO-way. I described the DDL work in this post.

As I had promised in that post, it allowed to implement subroutines, as well packages.

Within the v3.0 development cycle started, it was clear that not only statements was needing this layer, but expressions too. Expressions are much more "interesting" than statements. There are many types of expressions, all of them were mangled in the (now removed!) ubiquitous dsql_nod/jrd_nod structures and algorithms. These types are record sources, values, booleans, aggregate values, windowed values and lists.

The compatibility expression layer allowed to implement window functions (more here, here and here), but at the same time it create more ugly code in the (internal) interfaces.

The complete removal of jrd_nod was easy and done a lot of time ago. But dsql_nod remained. It was a lot of code needing a complete refactor (rewrite), which now is finally done.

This is the Node classes hierarchy. Except lists, each of them has its inherited classes (omitted here).

Node
--- DdlNode
--- DmlNode
------ StmtNode
------ ExprNode
--------- BoolExprNode
--------- ValueExprNode
------------ AggNode
--------------- WinFuncNode
--------- RecordSourceNode
--------- ListExprNode
------------ ValueListExprNode
------------ RecSourceListExprNode

As I said, much time was spent on this, but without it many other features were not possible (or viable).

It's easy to say now that the old code was a crap (and that is what it was, really!), but many things are involved here (like C vs C++, bison vs btyacc). The main thing is that this rework was now possible, so it was done.