Table of Content

Introduction

The PG2 library version 0.1.2 is out. One of its features is a significant performance boost when processing SELECT queries. The more fields and rows you have in a result, the faster is the processing. Here is a chart that measures a query with a single column:

No difference between the previous release of PG and the new one. But with nine fields, the average execution time is less now:

Briefly, PG2 0.1.2 allows you to fetch the data 7-8 times faster than Next.JDBC does. But before we proceed with other charts and numbers, let me explain how the new processing algorithm works.

When you send a query to Postgres, it responds with a series of messages. The leading message called RowDescription describes the structure of the result. When parsed, it can be presented as an array of maps; each map holds the name of the column, its type code (so called OID), the code of a table where this field comes from, and more. One should save this message somewhere as it’s used later.

[{:name "id"
  :oid 23   ;; type int4
  :format 0 ;; text format
  ...}
 {:name "email"
  :oid 25   ;; type text
  :format 0 ;; text format
  ...}
 ]

Then, Postgres sends you a series of DataRow messages. Each DataRow message is a long array of bytes which holds concatenated values. Before each value, there are four bytes (int32) representing the lengths of the subsequent chunk of bytes to parse:

[<N of cols><len1><value1><len2><value2>, ...]

(By the way, if <lenX> is -1 than the value is NULL, and there is no bytes for <valueX>.)

The question is, how to deal with all this to make your Postgres client faster. Obviously you can parse DataRow messages into maps one by one as you’re getting them from a TCP socket. This is what PG2 did before. The problem with this approach is that it slows down reading data from the network. Parsing bytes is a really CPU-heavy task. While you’re parsing a row, you’re not reading the next one.

On one hand, you fairly parse data and have a complete map at the end. On another hand, it slows down the process of consuming the response.

There is an opposite approach: you read the data as fast as you can with no (or minimum) parsing. Then, when you’ve done with reading, you parse what you got. The speed of reading the socket increases. As a result, the time window required to execute a query shrinks. The connection might serve more queries than before because now each request takes less time to be finished — in terms of the server, of course.

To process the result in Clojure, you need to parse it first. But now you have not plain maps but byte arrays with some metadata about the structure of the result. To prevent users from manual parsing, each row pretends to be a Clojure map that, when certain methods are triggered, parses the underlying bytes.

This is what the recent version of PG2 does under the hood. Before, it blindly parsed all the data on the fly. It was pretty fast: 2 or 3 times faster than using Next.JDBC. But with lazy parsing described above, it’s even faster: 7 to 9 times in favour of PG2. It affects only SELECT... and ...RETURNING queries. The new algorithm has nothing in common with inserting or deleting the data.

Now, benchmarks and timings. Most of the technical details like the environment, code snippets and SQL queries were covered in the previous post. Please refer to it for details.

Test1. Reading a single-column random query

This benchmark reads 50.000 randomly generated rows with a single integer column. The code from this benchmark hasn’t changed since the previous post. Having one column only in a row doesn’t make any difference with version of PG2 0.1.1:

Timings:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 43.026307 127.677926 44.36297
core i9 2,4 GHz 8-Core 32G 39.551719 83.932103 27.672117
arm m1 10 cores 32G 18.517718 49.340986 14.670815

Test2. Reading a multi-column, complex random query

This test reads 50.000 randomly generated rows with 9 columns of different types: numbers, strings, and timestamps. This is where the lazy algorithm beats the old one:

Timings:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 273.866142 579.59995 80.352246
core i9 2,4 GHz 8-Core 32G 270.262248 447.326861 54.34384
arm m1 10 cores 32G 117.241426 206.371502 30.444798

Test3. Reading random JSON

In this test, we select 50.000 randomly generated JSON objects. The absolute numbers are higher yet the ratio between the timings is the same:

Timings:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 227.067587 498.160381 64.590117
core i9 2,4 GHz 8-Core 32G 230.741915 413.006218 52.063469
arm m1 10 cores 32G 105.99894 188.33619 30.75433

Test 4. Connection pools

This benchmark doesn’t bring any new information. It just confirms that the lazy algorithm works with a connection pool.

Timings:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 334.488016 445.280716 59.709788
core i9 2,4 GHz 8-Core 32G 163.220533 243.250634 56.457382
arm m1 10 cores 32G 112.234569 154.656431 31.408932

Now that we clearly see that lazy parsing is useful, it’s still a half of the picture. We measure the time spent on reading the rows from the network and wrapping each into a pseudo-map class. But how much does it take to parse the rows? There are some dedicated benchmarks for that.

Test 5. Forcibly evaluate each row after reading

This test is similar to the previous one where I select randomly generated values. The difference is, now we associate an extra key to each row after reading. Assoc-ing a key to a lazy map triggers the process of parsing all the underlying bytes and composing a real Clojure map.

The query is really tough in terms of parsing: 60 timestamp fields :–)

select
   now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
from
  generate_series(1,50000) as s(x)

JDBC code:

(with-title "next.JDBC select many fields WITH ASSOC"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (let [rows
           (jdbc/execute! conn
                          [QUERY_SELECT_RANDOM_MANY_FIELDS]
                          {:as rs/as-unqualified-maps})]
       (doseq [row rows]
         (assoc row :extra 42))))))

PG2 code:

(with-title "pg select many fields WITH ASSOC"
  (pg/with-connection [conn pg-config]
    (quick-bench
      (let [rows
            (pg/execute conn
                        QUERY_SELECT_RANDOM_MANY_FIELDS)]
        (doseq [row rows]
          (assoc row :extra 42))))))

The result:

Timings:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 1161.563 2540.014 1502.913
core i9 2,4 GHz 8-Core 32G 1330.753 953.199066 1473.492
arm m1 10 cores 32G 547.914391 448.419085 619.253412

The chart shows that PG2 is a bit slower when it comes to parsing lazy maps. I believe there is a way to improve it in the future.

Anyway, we will get the clearest score if we measure the RPS of an HTTP server that sends the data from the database in JSON. Such a server was described in the previous post. We measure it again with the new release of PG2.

Test 6. Benchmarking the HTTP server with various -c values

1 concurrent request

ab -n 1000 -c 1 -l http://127.0.0.1:18080/

The chart and metrics (here and below: more is better):

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 225.22 116.24 202.88
core i9 2,4 GHz 8-Core 32G 322.93 179.68 279.36
arm m1 10 cores 32G 428.41 286.04 269.4

PG2 didn’t show great results on the M1 machine: it’s even slower than JDBC for just one concurrent request. But when it comes to many parallel requests, it’s as fast as PG 0.1.1 again.

16 concurrent requests

ab -n 1000 -c 16 -l http://127.0.0.1:18080/

The results:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 968.51 555.55 915.93
core i9 2,4 GHz 8-Core 32G 1909.19 1304 1688.72
arm m1 10 cores 32G 2999.06 1902.31 3026

PG2 version 0.1.2 is slightly slower than 0.1.1 because parsing lazy maps takes CPU resources.

64 concurrent requests

ab -n 1000 -c 64 -l http://127.0.0.1:18080/

The results:

Platform PG2 0.1.1 JDBC.Next PG2 0.1.2
core i5 2 GHz Quad-Core 16G 1025.62 543.66 945.51
core i9 2,4 GHz 8-Core 32G 2025.94 1113.81 1806.29
arm m1 10 cores 32G 3348.75 1909.23 3205.83

Both 0.1.1 and 0.1.2 are almost even in terms of performance.

Summary

Introducing lazy maps and lazy result parsing was an interesting experiment. Although it slightly slows down processing the result, it brings some benefits. First, the time required for connection to serve a query reduces drastically. The connection dumps the payload it got for you, and then you parse it while the connection is serving another query. It’s especially useful when working with connection pools. For pools, you have to return the connections back as soon as possible as their number is limited. Spawning a fresh connection is expensive as it runs a new process on the server side.

Second, lazy parsing reduces the odds of getting an exception in the middle of reading the data from a connection. This is a weird situation when you got the data partially, and then an exception popped up. Without capturing it and rethrowing later, the connection might hang into a broken state.

To sum up — I’m not a big fan of laziness in Clojure but this time, the idea seems to be the right choice. Let’s see how it goes.