Influence of network latency in the Firebird wire protocol

People often talk that Firebird performs badly in networks with high latency. I did some tests to see how the new protocol has been improved compared to the old one, and to see how it could be improved even more. My tests only cares about latency, and not about bandwidth.

It simulate network latency in Linux using Netem. With the tc utility, we can add a delay for outbound traffic. Since I did it for lo interface, it happens that it works for in/out traffic. I set delay to 100ms (that is more or less a value I found doing it in the internet with my horrible Telefonica speedy connection), using this command:

$ sudo tc qdisc add dev lo root handle 1:0 netem delay 100ms

And here is the result of a ping:

$ ping localhost
PING localhost (127.0.0.1) 56(84) bytes of data.
64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=2 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=3 ttl=64 time=200 ms
64 bytes from localhost (127.0.0.1): icmp_seq=4 ttl=64 time=200 ms

To later remove the delay, the command is:

$ sudo tc qdisc del dev lo root

The test uses Firebird library present on (my till unknown) CppSys project (more on CppSys in a future post). The CppSys database library has interfaces and semantics based on JDBC. The relevant test code is here:

println("Connecting to the database");
AutoPtr<FbDatabase> database(client->openDatabase(
    String("firebird:") + databaseName));

println("Starting the transaction");
AutoPtr<FbTransaction> transaction(database->startTransaction(
    TransactionIsolation::SNAPSHOT));

println("Preparing stmt1");
AutoPtr<FbPreparedStatement> stmt1(database->prepareStatement(
    transaction,
    "select rdb$relation_name from rdb$relations"
    "    where rdb$system_flag = 1"
    "    order by rdb$relation_name"));

println("Preparing stmt2");
AutoPtr<FbPreparedStatement> stmt2(database->prepareStatement(
    transaction,
    "select rdb$field_name from rdb$relation_fields"
    "    where rdb$relation_name = ?"
    "    order by rdb$field_name"));

println("Preparing stmt3");
AutoPtr<FbPreparedStatement> stmt3(database->prepareStatement(
    transaction,
    "select rdb$trigger_name from rdb$triggers"
    "    where rdb$relation_name = ?"
    "    order by rdb$trigger_name"));

println("Executing stmt1");
AutoPtr<FbResultSet> rs1(stmt1->executeQuery());

println("Starting fetch rs1");

while (rs1->fetch())
{
    println("rs1: %s", rs1->getString(1).getPtr());

    println("\tExecuting stmt2");
    stmt2->setString(1, rs1->getString(1));
    AutoPtr<FbResultSet> rs2(stmt2->executeQuery());

    println("\tExecuting stmt3");
    stmt3->setString(1, rs1->getString(1));
    AutoPtr<FbResultSet> rs3(stmt3->executeQuery());

    println("\tStarting fetch rs2");

    while (rs2->fetch())
        println("\trs2: %s", rs2->getString(1).getPtr());

    println("\tStarting fetch rs3");

    while (rs3->fetch())
        println("\trs3: %s", rs3->getString(1).getPtr());
}

The println function buffers the text and prints it in its next run, with the current time (in minus) milliseconds the time when the text was buffered. Here is its code:

static void println(const char* format, ...)
{
    static char lineBuffer[1024];
    static int64 lastTime = 0;

    timeb tb;
    ftime(&tb);

    int64 thisTime = int64(tb.time) * 1000 + tb.millitm;

    if (lastTime == 0)
        lastTime = thisTime;
    else
    {
        printf("%06d: %s\n", (int) (thisTime - lastTime),
            lineBuffer);
    }

    va_list va;
    va_start(va, format);

    vsprintf(lineBuffer, format, va);

    va_end(va);

    lastTime = thisTime;
}

As you see, the test does what every client/server database developer must knows that he/she shouldn’t do: a main query with nested queries for each record. If you ever run isql over the internet, you had see it performs very slow, and that is the reason. isql does just that. It does that because it’s written using embedded SQL with gpre and sometimes this is needed to access new columns that may not be presented on a database of an older ODS. And due to embedded SQL nature, “details” of where and when to prepare statements are just ignored.

So far, here is the results.

Result for the old protocol, with 2.0 client and 2.5 server:

000665: Connecting to the database
000201: Starting the transaction
001234: Preparing stmt1
001204: Preparing stmt2
001204: Preparing stmt3
000200: Executing stmt1
000202: Starting fetch rs1
000000: rs1: MON$ATTACHMENTS
000200:     Executing stmt2
000201:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: MON$ATTACHMENT_ID
000000:     rs2: MON$ATTACHMENT_NAME
000000:     rs2: MON$CHARACTER_SET_ID
000001:     rs2: MON$GARBAGE_COLLECTION
000000:     rs2: MON$REMOTE_ADDRESS
000000:     rs2: MON$REMOTE_PID
000000:     rs2: MON$REMOTE_PROCESS
000001:     rs2: MON$REMOTE_PROTOCOL
000000:     rs2: MON$ROLE
000000:     rs2: MON$SERVER_PID
000000:     rs2: MON$STATE
000001:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000000:     rs2: MON$USER
000601:     Starting fetch rs3
000000: rs1: MON$CALL_STACK
000200:     Executing stmt2
000201:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: MON$CALLER_ID
000000:     rs2: MON$CALL_ID
000000:     rs2: MON$OBJECT_NAME
000000:     rs2: MON$OBJECT_TYPE
000000:     rs2: MON$SOURCE_COLUMN
000000:     rs2: MON$SOURCE_LINE
000000:     rs2: MON$STATEMENT_ID
000001:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000601:     Starting fetch rs3
<...>
000000: rs1: RDB$CHECK_CONSTRAINTS
000200:     Executing stmt2
000200:     Executing stmt3
000200:     Starting fetch rs2
000001:     rs2: RDB$CONSTRAINT_NAME
000000:     rs2: RDB$TRIGGER_NAME
000200:     Starting fetch rs3
000000:     rs3: RDB$TRIGGER_14
000000:     rs3: RDB$TRIGGER_15
000001:     rs3: RDB$TRIGGER_16
000400:     rs3: RDB$TRIGGER_35
000000: rs1: RDB$COLLATIONS
<...>

Result for the new protocol, with 2.5 client and 2.5 server:

000670: Connecting to the database
000200: Starting the transaction
000229: Preparing stmt1
000203: Preparing stmt2
000203: Preparing stmt3
000000: Executing stmt1
000202: Starting fetch rs1
000000: rs1: MON$ATTACHMENTS
000000:     Executing stmt2
000000:     Executing stmt3
000202:     Starting fetch rs2
000000:     rs2: MON$ATTACHMENT_ID
000000:     rs2: MON$ATTACHMENT_NAME
000000:     rs2: MON$CHARACTER_SET_ID
000000:     rs2: MON$GARBAGE_COLLECTION
000000:     rs2: MON$REMOTE_ADDRESS
000000:     rs2: MON$REMOTE_PID
000000:     rs2: MON$REMOTE_PROCESS
000000:     rs2: MON$REMOTE_PROTOCOL
000000:     rs2: MON$ROLE
000000:     rs2: MON$SERVER_PID
000000:     rs2: MON$STATE
000000:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000000:     rs2: MON$USER
000200:     Starting fetch rs3
000000: rs1: MON$CALL_STACK
000001:     Executing stmt2
000000:     Executing stmt3
000200:     Starting fetch rs2
000000:     rs2: MON$CALLER_ID
000000:     rs2: MON$CALL_ID
000000:     rs2: MON$OBJECT_NAME
000001:     rs2: MON$OBJECT_TYPE
000000:     rs2: MON$SOURCE_COLUMN
000000:     rs2: MON$SOURCE_LINE
000000:     rs2: MON$STATEMENT_ID
000000:     rs2: MON$STAT_ID
000000:     rs2: MON$TIMESTAMP
000200:     Starting fetch rs3
<...>
000000: rs1: RDB$CHECK_CONSTRAINTS
000000:     Executing stmt2
000000:     Executing stmt3
000201:     Starting fetch rs2
000000:     rs2: RDB$CONSTRAINT_NAME
000000:     rs2: RDB$TRIGGER_NAME
000200:     Starting fetch rs3
000000:     rs3: RDB$TRIGGER_14
000000:     rs3: RDB$TRIGGER_15
000000:     rs3: RDB$TRIGGER_16
000000:     rs3: RDB$TRIGGER_35
000000: rs1: RDB$COLLATIONS
<...>

FWIW, I did tested the old protocol on real network with similar latency too. The result was very resemblant.

The total times are:

Old protocol 55.5s
New Protocol 18.5s
New Protocol minus round-trip times of rs3 10.1s

So we may see that:

  1. The old protocol is a crap.
  2. The new protocol does a good job removing some “unnecessary” round-trips.
  3. It may be improved.

The first interesting problem is the series of statement preparations. The preparation of a statement may result in an error, so it necessary involves a round-trip so we can get the status from the server. But does it really needs it? Using Oracle JDBC driver, a prepareStatement with a wrong statement does not throw an exception. It’s deferred. In fact, this is a contract, and a change in a contract would require some way for the client applications to enable or disable it.

The second interesting thing is that executing a statement that needs fetch does not involves a round-trip. Does it happens too when selecting from a SP? Won’t this be a contract change as well, since selectable SPs could write to the database? I didn’t did further tests… Anyway, that’s what allows the optimizations I talk below.

Then we start the fetches. The first fetch for rs1 necessarily involves a round-trip and buffers some records in the client. Subsequents fetches are resolved locally.

The third interesting thing is about the nested queries. Since queries are executed, a round-trip would be inevitable, but it does one for each query. A more intelligent approach would be to fetch rs3 too when asked to fetch rs2, since it’s already executed. In this test, it would run ~8s faster. And a much more intelligent approach would do it in the background (and transmitting asynchronous), without putting some overhead on the rs2 call.

With a super intelligent server/client/protocol, the client and the server may cache prepared statements, and the server may asynchronous send statements invalidations to the client. Then, preparation of statements would have no round-trip (when already prepared one time), being sure that the statement is not wrong.

So what you may do now?

  1. Do not talk with the server in this poor (and common) way. Use selectable SPs to run multiple queries.
  2. Do not prepare/close/prepare your common statements. Cache them in your application.