TL;DR: https://github.com/igrishaev/pg2

Table of Content

Introduction

During the last year, I was working on PG — JDBC-free PostgreSQL driver in pure Clojure. By purity it means, there is only a TCP socket to read and write bytes according to the official PostgreSQL protocol, and nothing more.

It was fun: the very idea of implementing something low-level in Clojure was a challenge. I’ve made a series of tricks to squeeze the performance: mutable collections in favour of Clojure’s ones, special macros to traverse collections using Iterator and .forEach, and so on. After all, my driver was about 1.5 times slower than Next.JDBC, and I still think there is room for further improvement.

One may ask what is the point of making a driver from scratch in 2024. The reason is, I’m still missing plenty of Postgres features when working with it from Clojure. Namely:

  • no built-in JSON support. In every project, I’ve got to extend some protocols with PGObject, encode and decode the JSON manually;

  • no COPY support. In Postgres, COPY IN/OUT is one of the best features I can remember. But there is no built-in CSV nor binary encoding, for example.

  • Poor time support: selecting a timestamp returns an instance of java.sql.Timestamp which is based on java.util.Date: a mutable object deprecated in Java 1.1.

  • Poor array support: only a certain number of types, no multidimensional arrays, etc.

  • No built-in connection pool.

Today I deprecated the PG project in favour of PG2. This is a successor of PG(one), my second attempt to make a JDBC-free driver for Postgres. This time around, it’s written completely in Java with a thin Clojure layer. I’ve made some benchmarks and it looks like PG2 2-3 times faster than Next.JDBC. There is no documentation yet, only integration tests that cover plenty of cases (I borrowed the old tests and improved then). Although the documentation is highly required, I could not wait to announce PG2 and share the benchmarks.

How the benchmarks were made

I’ve got three Mac devices with core i5, i9, and ARM M1 processors. On each device, I’ve got PostgreSQL installed. All the settings are default, no changes are made. JVM is version 21 although the 16th is the minimum version. The benchmarks are made with Criterium version 0.4.6.

There are two types of benchmarks, actually. In the first group, I measure query/execute functions using the Criterium framework. In the second group, I’m running a simple HTTP server using Ring + Jetty + JSON and measuring requests per second with Apache Benchmark (ab). For each test, I show the source code with a chart and comments.

The source code of benchmarks can be found in the repository.

Test 1. Querying a single-column, series-based query with 50000 rows

The query:

select x from generate_series(1,50000) as s(x)

JDBC.Next code:

(with-title "next.JDBC simple value select"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (jdbc/execute! conn
                    [QUERY_SELECT_RANDOM_SIMPLE]
                    {:as rs/as-unqualified-maps}))))

PG2 code:

(with-title "pg simple select"
  (pg/with-connection [conn pg-config]
    (quick-bench
     (pg/execute conn
                 QUERY_SELECT_RANDOM_SIMPLE))))

Pay attention that, in both cases, the connection is opened outside the quick-bench macro. It means that the procedure of establishing a connection is not taken into account when measuring the body of the quick-bench macro.

This was my first benchmark ever made in this project, and I found the results promising:

2 times faster on i9, and 3 times faster on i5 and arm M1! Isn’t it great? Getting a 200-300% performance boost when querying a database would be amazing for web apps because they spend most of the time reading data.

By the way, when measuring pure SELECT queries, PG2 is even faster than bare JDBC. Here is the code for low-level DB access using the official PostgreSQL Java driver:

(with-title "pure JDBC simple select"
  (let [^java.sql.Connection conn
        (DriverManager/getConnection JDBC-URL USER USER)
        ^PreparedStatement stmt
        (.prepareStatement conn QUERY_SELECT_RANDOM_SIMPLE)]
    (quick-bench
     (let [^ResultSet rs (.executeQuery stmt)
           ^ArrayList l (new ArrayList 50000)]
       (while (.next rs)
         (let [^HashMap m (new HashMap)]
           (.put m "x" (.getString rs "x"))
           (.add l m)))))))

On my i5 machine, the average time is 50 msec vs 43 in favour of PG2. My clumsy Java code is faster than the official Postgres driver!

Benchmark 2. A complex, multi-column, randomly generated SELECT query

Let’s try a more complex query with integers, floats, numeric, and various date/time types. The amount of rows is 50.000:

select
  x::int4                  as int4,
  x::int8                  as int8,
  x::numeric               as numeric,
  x::text || 'foobar'      as line,
  x > 100500               as bool,
  now()                    as ts,
  now()::date              as date,
  now()::time              as time,
  null                     as nil
from
  generate_series(1,50000) as s(x)

Timings in milliseconds:

Again, PG2 was two times faster than Next.JDBC.

Test 3. Querying randomly generated JSON

The query produces 50k rows of a single column; the column is a JSON object where the values are random float values:

select row_to_json(row(1, random(), 2, random()))
   from generate_series(1,50000)

One important note: PG2 library carries its own JSON Java module based on Jsonista. Thus, there is no need to import a third-party JSON library and extend any protocols. For Next.JDBC, I grabbed the code specified in the official documentation. It extends certain protocols and the PGObject class with encode/decode logic from jsonista.

Timings:

Although the numbers are higher than before, PG2 is still two times faster.

Test 4. Inserting a single row with no transaction

Let’s prepare a table for further insertions: three fields, one of them is of a timestamp type, and everything is not null.

create table if not exists test1 (
  id integer not null,
  name text not null,
  created_at timestamp not null
)

Insertion query:

insert into test1(id, name, created_at) values (?, ?, ?)

PG2 benchmark code:

(with-title "pg insert values"
  (pg/with-connection [conn pg-config]
    (pg/with-statement [stmt
                        conn
                        QUERY_INSERT_PG]
      (quick-bench
       (let [x (rand-int 10000)]
         (pg/execute-statement conn
                               stmt
                               {:params [x,
                                         (format "name%s" x)
                                         (LocalDateTime/now)]}))))))

Next.JDBC benchmark code:

(with-title "next.JDBC insert values in TRANSACTION"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (let [x (rand-int 10000)]
       (jdbc/with-transaction [tx conn]
         (jdbc/execute! tx
                        [QUERY_INSERT_JDBC
                         x,
                         (format "name%s" x)
                         (LocalDateTime/now)]))))))

This is a situation where I could not beat JDBC. The timings are equal across all the platforms:

An interesting moment here is, I’ve got to use the pg/with-statement macro in the PG2 code. It creates a prepared statement which is used later on so the insertion query is not parsed every time you perform it. I suspect that JDBC does the same implicitly under the hood with some sort of a cache.

Test 5. Inserting a row in a transaction

In both cases, the code is the same with a small change. The insertion invocation is wrapped with (jdbc/with-transaction [tx conn] ...) and (pg/with-tx [conn] ...) macros that produce BEGIN … COMMIT commands before and after the INSERT statement. In this benchmark, PG2 is slightly faster:

Test 6. COPY IN a vast CSV file from disk

Let’s proceed with COPY operations. First, we generate a CSV file with 1 million rows:

(let [rows
      (for [x (range 0 1000000)]
        [x
         (str "name" x)
         (LocalDateTime/now)])]
  (with-open [writer (-> SAMPLE_CSV
                         io/file
                         io/writer)]
    (csv/write-csv writer rows)))

Now import this file using both libraries. PG2 code:

(with-title "PG COPY in from a stream"
  (pg/with-connection [conn pg-config]
    (quick-bench
     (pg/copy-in conn
                 QUERY_IN_STREAM
                 (-> SAMPLE_CSV io/file io/input-stream)))))

Next.JDBC doesn’t have a wrapper for copying data. Here is a low-level code that uses the CopyManager class:

(with-title "JDBC COPY in from a stream"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (let [copy
           (new CopyManager conn)]

       (.copyIn copy
                ^String QUERY_IN_STREAM
                ^InputStream (-> SAMPLE_CSV io/file io/input-stream))))))

The timings are pretty even:

Test 7. Copy IN a collection of rows in CSV format

First, let’s discuss this case. Imagine you have a collection of rows you’d like to COPY into a table. You cannot do it with JDBC. There is no function or a class that takes a collection of rows and streams it as a series of CopyData messages. I’ve got to encode these rows into a CSV file (or a buffer) and then stream the result into the DB. This is long, it involves disk, and sometimes it leads to weird CSV formatting. Some developers don’t use CSV libraries and just str the data joining them with a comma. It works for plain types like numbers and strings. But JSON, arrays, or strings with quotes always ruin this, and it always happens when you’re in a rush.

So. This benchmark does different things depending on the library we’re testing. For PG2, it measures invocation of the pg/copy-in-rows function as follows:

(with-title "PG COPY in from rows CSV"
  (pg/with-connection [conn pg-config]
    (let [rows (generate-rows)]
      (quick-bench
       (pg/copy-in-rows conn
                        QUERY_IN_STREAM
                        rows)))))

For JDBC, it turns the rows into a CSV input stream and then sends it to the database using CopyManager:

(defn rows->csv-input-stream ^InputStream [rows]
  (let [out (new ByteArrayOutputStream)]
    (with-open [writer (io/writer out)]
      (csv/write-csv writer rows))
    (-> out .toByteArray io/input-stream)))

  (with-title "JDBC COPY in from rows CSV"
    (with-open [conn (jdbc/get-connection
                      jdbc-config)]
      (let [rows (generate-rows)]
        (quick-bench
         (let [input-stream
               (rows->csv-input-stream rows)
               copy
               (new CopyManager conn)]
           (.copyIn copy
                    ^String QUERY_IN_STREAM
                    ^InputStream input-stream))))))

PG2 is two times faster again, and there is no need to dump the rows into a CSV buffer.

Test 8. Copy IN a collection of rows in binary format

Most of the developers don’t know that PostgreSQL supports binary format to transfer data. It’s more complex in terms of development but more robust. The good news is that PG2 supports binary mode for COPY IN! This benchmark measures the time required to insert randomly generated rows in binary mode. Here is the code:

(with-title "PG COPY in from rows BIN"
  (pg/with-connection [conn pg-config]
    (let [rows (generate-rows)]
      (quick-bench
       (pg/copy-in-rows conn
                        QUERY_IN_STREAM_BIN
                        rows
                        {:copy-bin? true
                         :oids [oid/int4 oid/text oid/timestamp]})))))

Since JDBC cannot binary encode data on the fly, I took the numbers from the previous benchmark for CSV:

One can see that, although PG2 is still faster than JDBC, binary mode beats CSV performance. Thus, whenever you can, prefer binary COPY operations.

Test 9. COPY OUT a table into an OutputStream

This test measures the COPY … TO STDOUT … command. In both cases, I used nullOutputStream to not depend on the disk. Code for PG2:

(with-title "PG COPY out"
  (pg/with-connection [conn pg-config]
    (quick-bench
     (pg/copy-out conn
                  QUERY_OUT_STREAM
                  (OutputStream/nullOutputStream)))))

Code for Next.JDBC:

(with-title "JDBC COPY out"
  (with-open [conn (jdbc/get-connection
                    jdbc-config)]
    (quick-bench
     (let [copy
           (new CopyManager conn)]
       (.copyOut copy
                 ^String QUERY_OUT_STREAM
                 ^OutputStream (OutputStream/nullOutputStream))))))

Timings:

I was a bit surprised to see such a significant difference in performance on core i5. Don’t know what was the reason but x2.5 times performance boost for dumping the data is great. On other platforms, the timings are even.

Test 10. Measuring connection pools

Benching a single connection is not enough; it’s important to know how much time it takes to borrow a connection for a pool, use it and release it afterwards.

Next.JDBC has no built-in pool so I used HikariCP. Here is the code:

(with-title "JDBC pool"
  (with-open [^HikariDataSource datasource
              (cp/make-datasource cp-options)]
    (quick-bench
     (with-open [conn (jdbc/get-connection datasource)]
       :nothing))))

PG2 code:

(with-title "PG pool"
  (pool/with-pool [pool pg-config]
    (quick-bench
     (pool/with-connection [conn pool]
       :nothing))))

In both cases, pools get opened before entering the quick-bench macro. Timings:

As a result, borrowing a connection from PG2 pool is faster.

In the next post, I’m going to share HTTP benchmarks with a simple Ring application.

(to be continued)