PG2 release 0.1.2: more performance, benchmarks, part 3

Table of Content

Introduction
Test1. Reading a single-column random query
Test2. Reading a multi-column, complex random query
Test3. Reading random JSON
Test 4. Connection pools
Test 5. Forcibly evaluate each row after reading
Test 6. Benchmarking the HTTP server with various -c values
Summary

Introduction

The PG2 library version 0.1.2 is out. One of its features is a significant performance boost when processing SELECT queries. The more fields and rows you have in a result, the faster is the processing. Here is a chart that measures a query with a single column:

No difference between the previous release of PG and the new one. But with nine fields, the average execution time is less now:

Briefly, PG2 0.1.2 allows you to fetch the data 7-8 times faster than Next.JDBC does. But before we proceed with other charts and numbers, let me explain how the new processing algorithm works.

When you send a query to Postgres, it responds with a series of messages. The leading message called RowDescription describes the structure of the result. When parsed, it can be presented as an array of maps; each map holds the name of the column, its type code (so called OID), the code of a table where this field comes from, and more. One should save this message somewhere as it’s used later.

[{:name "id"
  :oid 23   ;; type int4
  :format 0 ;; text format
  ...}
 {:name "email"
  :oid 25   ;; type text
  :format 0 ;; text format
  ...}
 ]

Then, Postgres sends you a series of DataRow messages. Each DataRow message is a long array of bytes which holds concatenated values. Before each value, there are four bytes (int32) representing the lengths of the subsequent chunk of bytes to parse:

[<N of cols><len1><value1><len2><value2>, ...]

(By the way, if <lenX> is -1 than the value is NULL, and there is no bytes for <valueX>.)

The question is, how to deal with all this to make your Postgres client faster. Obviously you can parse DataRow messages into maps one by one as you’re getting them from a TCP socket. This is what PG2 did before. The problem with this approach is that it slows down reading data from the network. Parsing bytes is a really CPU-heavy task. While you’re parsing a row, you’re not reading the next one.

On one hand, you fairly parse data and have a complete map at the end. On another hand, it slows down the process of consuming the response.

There is an opposite approach: you read the data as fast as you can with no (or minimum) parsing. Then, when you’ve done with reading, you parse what you got. The speed of reading the socket increases. As a result, the time window required to execute a query shrinks. The connection might serve more queries than before because now each request takes less time to be finished — in terms of the server, of course.

To process the result in Clojure, you need to parse it first. But now you have not plain maps but byte arrays with some metadata about the structure of the result. To prevent users from manual parsing, each row pretends to be a Clojure map that, when certain methods are triggered, parses the underlying bytes.

This is what the recent version of PG2 does under the hood. Before, it blindly parsed all the data on the fly. It was pretty fast: 2 or 3 times faster than using Next.JDBC. But with lazy parsing described above, it’s even faster: 7 to 9 times in favour of PG2. It affects only SELECT... and ...RETURNING queries. The new algorithm has nothing in common with inserting or deleting the data.

Now, benchmarks and timings. Most of the technical details like the environment, code snippets and SQL queries were covered in the previous post. Please refer to it for details.

Test1. Reading a single-column random query

This benchmark reads 50.000 randomly generated rows with a single integer column. The code from this benchmark hasn’t changed since the previous post. Having one column only in a row doesn’t make any difference with version of PG2 0.1.1:

Timings:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	43.026307	127.677926	44.36297
core i9 2,4 GHz 8-Core 32G	39.551719	83.932103	27.672117
arm m1 10 cores 32G	18.517718	49.340986	14.670815

Test2. Reading a multi-column, complex random query

This test reads 50.000 randomly generated rows with 9 columns of different types: numbers, strings, and timestamps. This is where the lazy algorithm beats the old one:

Timings:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	273.866142	579.59995	80.352246
core i9 2,4 GHz 8-Core 32G	270.262248	447.326861	54.34384
arm m1 10 cores 32G	117.241426	206.371502	30.444798

Test3. Reading random JSON

In this test, we select 50.000 randomly generated JSON objects. The absolute numbers are higher yet the ratio between the timings is the same:

Timings:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	227.067587	498.160381	64.590117
core i9 2,4 GHz 8-Core 32G	230.741915	413.006218	52.063469
arm m1 10 cores 32G	105.99894	188.33619	30.75433

Test 4. Connection pools

This benchmark doesn’t bring any new information. It just confirms that the lazy algorithm works with a connection pool.

Timings:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	334.488016	445.280716	59.709788
core i9 2,4 GHz 8-Core 32G	163.220533	243.250634	56.457382
arm m1 10 cores 32G	112.234569	154.656431	31.408932

Now that we clearly see that lazy parsing is useful, it’s still a half of the picture. We measure the time spent on reading the rows from the network and wrapping each into a pseudo-map class. But how much does it take to parse the rows? There are some dedicated benchmarks for that.

Test 5. Forcibly evaluate each row after reading

This test is similar to the previous one where I select randomly generated values. The difference is, now we associate an extra key to each row after reading. Assoc-ing a key to a lazy map triggers the process of parsing all the underlying bytes and composing a real Clojure map.

The query is really tough in terms of parsing: 60 timestamp fields :–)

select
   now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
  ,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
from
  generate_series(1,50000) as s(x)

JDBC code:

(with-title "next.JDBC select many fields WITH ASSOC"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (let [rows
           (jdbc/execute! conn
                          [QUERY_SELECT_RANDOM_MANY_FIELDS]
                          {:as rs/as-unqualified-maps})]
       (doseq [row rows]
         (assoc row :extra 42))))))

PG2 code:

(with-title "pg select many fields WITH ASSOC"
  (pg/with-connection [conn pg-config]
    (quick-bench
      (let [rows
            (pg/execute conn
                        QUERY_SELECT_RANDOM_MANY_FIELDS)]
        (doseq [row rows]
          (assoc row :extra 42))))))

The result:

Timings:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	1161.563	2540.014	1502.913
core i9 2,4 GHz 8-Core 32G	1330.753	953.199066	1473.492
arm m1 10 cores 32G	547.914391	448.419085	619.253412

The chart shows that PG2 is a bit slower when it comes to parsing lazy maps. I believe there is a way to improve it in the future.

Anyway, we will get the clearest score if we measure the RPS of an HTTP server that sends the data from the database in JSON. Such a server was described in the previous post. We measure it again with the new release of PG2.

Test 6. Benchmarking the HTTP server with various -c values

1 concurrent request

ab -n 1000 -c 1 -l http://127.0.0.1:18080/

The chart and metrics (here and below: more is better):

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	225.22	116.24	202.88
core i9 2,4 GHz 8-Core 32G	322.93	179.68	279.36
arm m1 10 cores 32G	428.41	286.04	269.4

PG2 didn’t show great results on the M1 machine: it’s even slower than JDBC for just one concurrent request. But when it comes to many parallel requests, it’s as fast as PG 0.1.1 again.

16 concurrent requests

ab -n 1000 -c 16 -l http://127.0.0.1:18080/

The results:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	968.51	555.55	915.93
core i9 2,4 GHz 8-Core 32G	1909.19	1304	1688.72
arm m1 10 cores 32G	2999.06	1902.31	3026

PG2 version 0.1.2 is slightly slower than 0.1.1 because parsing lazy maps takes CPU resources.

64 concurrent requests

ab -n 1000 -c 64 -l http://127.0.0.1:18080/

The results:

Platform	PG2 0.1.1	JDBC.Next	PG2 0.1.2
core i5 2 GHz Quad-Core 16G	1025.62	543.66	945.51
core i9 2,4 GHz 8-Core 32G	2025.94	1113.81	1806.29
arm m1 10 cores 32G	3348.75	1909.23	3205.83

Both 0.1.1 and 0.1.2 are almost even in terms of performance.

Summary

Introducing lazy maps and lazy result parsing was an interesting experiment. Although it slightly slows down processing the result, it brings some benefits. First, the time required for connection to serve a query reduces drastically. The connection dumps the payload it got for you, and then you parse it while the connection is serving another query. It’s especially useful when working with connection pools. For pools, you have to return the connections back as soon as possible as their number is limited. Spawning a fresh connection is expensive as it runs a new process on the server side.

Second, lazy parsing reduces the odds of getting an exception in the middle of reading the data from a connection. This is a weird situation when you got the data partially, and then an exception popped up. Without capturing it and rethrowing later, the connection might hang into a broken state.

To sum up — I’m not a big fan of laziness in Clojure but this time, the idea seems to be the right choice. Let’s see how it goes.

← Gzip

PG2 release 0.1.3: Next.JDBC-compatible API →