PG2 release 0.1.2: more performance, benchmarks, part 3
Table of Content
- Introduction
- Test1. Reading a single-column random query
- Test2. Reading a multi-column, complex random query
- Test3. Reading random JSON
- Test 4. Connection pools
- Test 5. Forcibly evaluate each row after reading
- Test 6. Benchmarking the HTTP server with various -c values
- Summary
Introduction
The PG2 library version 0.1.2 is out. One of its features is a significant performance boost when processing SELECT queries. The more fields and rows you have in a result, the faster is the processing. Here is a chart that measures a query with a single column:
No difference between the previous release of PG and the new one. But with nine fields, the average execution time is less now:
Briefly, PG2 0.1.2 allows you to fetch the data 7-8 times faster than Next.JDBC does. But before we proceed with other charts and numbers, let me explain how the new processing algorithm works.
When you send a query to Postgres, it responds with a series of messages. The
leading message called RowDescription
describes the structure of the
result. When parsed, it can be presented as an array of maps; each map holds the
name of the column, its type code (so called OID), the code of a table where
this field comes from, and more. One should save this message somewhere as it’s
used later.
[{:name "id"
:oid 23 ;; type int4
:format 0 ;; text format
...}
{:name "email"
:oid 25 ;; type text
:format 0 ;; text format
...}
]
Then, Postgres sends you a series of DataRow
messages. Each DataRow
message
is a long array of bytes which holds concatenated values. Before each value,
there are four bytes (int32) representing the lengths of the subsequent chunk of
bytes to parse:
[<N of cols><len1><value1><len2><value2>, ...]
(By the way, if <lenX>
is -1 than the value is NULL, and there is no bytes for
<valueX>
.)
The question is, how to deal with all this to make your Postgres client
faster. Obviously you can parse DataRow
messages into maps one by one as
you’re getting them from a TCP socket. This is what PG2 did before. The problem
with this approach is that it slows down reading data from the network. Parsing
bytes is a really CPU-heavy task. While you’re parsing a row, you’re not reading
the next one.
On one hand, you fairly parse data and have a complete map at the end. On another hand, it slows down the process of consuming the response.
There is an opposite approach: you read the data as fast as you can with no (or minimum) parsing. Then, when you’ve done with reading, you parse what you got. The speed of reading the socket increases. As a result, the time window required to execute a query shrinks. The connection might serve more queries than before because now each request takes less time to be finished — in terms of the server, of course.
To process the result in Clojure, you need to parse it first. But now you have not plain maps but byte arrays with some metadata about the structure of the result. To prevent users from manual parsing, each row pretends to be a Clojure map that, when certain methods are triggered, parses the underlying bytes.
This is what the recent version of PG2 does under the hood. Before, it blindly
parsed all the data on the fly. It was pretty fast: 2 or 3 times faster than
using Next.JDBC. But with lazy parsing described above, it’s even faster: 7 to 9
times in favour of PG2. It affects only SELECT...
and ...RETURNING
queries. The new algorithm has nothing in common with inserting or deleting the
data.
Now, benchmarks and timings. Most of the technical details like the environment, code snippets and SQL queries were covered in the previous post. Please refer to it for details.
Test1. Reading a single-column random query
This benchmark reads 50.000 randomly generated rows with a single integer column. The code from this benchmark hasn’t changed since the previous post. Having one column only in a row doesn’t make any difference with version of PG2 0.1.1:
Timings:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 43.026307 | 127.677926 | 44.36297 |
core i9 2,4 GHz 8-Core 32G | 39.551719 | 83.932103 | 27.672117 |
arm m1 10 cores 32G | 18.517718 | 49.340986 | 14.670815 |
Test2. Reading a multi-column, complex random query
This test reads 50.000 randomly generated rows with 9 columns of different types: numbers, strings, and timestamps. This is where the lazy algorithm beats the old one:
Timings:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 273.866142 | 579.59995 | 80.352246 |
core i9 2,4 GHz 8-Core 32G | 270.262248 | 447.326861 | 54.34384 |
arm m1 10 cores 32G | 117.241426 | 206.371502 | 30.444798 |
Test3. Reading random JSON
In this test, we select 50.000 randomly generated JSON objects. The absolute numbers are higher yet the ratio between the timings is the same:
Timings:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 227.067587 | 498.160381 | 64.590117 |
core i9 2,4 GHz 8-Core 32G | 230.741915 | 413.006218 | 52.063469 |
arm m1 10 cores 32G | 105.99894 | 188.33619 | 30.75433 |
Test 4. Connection pools
This benchmark doesn’t bring any new information. It just confirms that the lazy algorithm works with a connection pool.
Timings:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 334.488016 | 445.280716 | 59.709788 |
core i9 2,4 GHz 8-Core 32G | 163.220533 | 243.250634 | 56.457382 |
arm m1 10 cores 32G | 112.234569 | 154.656431 | 31.408932 |
Now that we clearly see that lazy parsing is useful, it’s still a half of the picture. We measure the time spent on reading the rows from the network and wrapping each into a pseudo-map class. But how much does it take to parse the rows? There are some dedicated benchmarks for that.
Test 5. Forcibly evaluate each row after reading
This test is similar to the previous one where I select randomly generated values. The difference is, now we associate an extra key to each row after reading. Assoc-ing a key to a lazy map triggers the process of parsing all the underlying bytes and composing a real Clojure map.
The query is really tough in terms of parsing: 60 timestamp fields :–)
select
now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
,now(),now(),now(),now(),now(),now(),now(),now(),now(),now()
from
generate_series(1,50000) as s(x)
JDBC code:
(with-title "next.JDBC select many fields WITH ASSOC"
(with-open [conn (jdbc/get-connection
jdbc-config)]
(quick-bench
(let [rows
(jdbc/execute! conn
[QUERY_SELECT_RANDOM_MANY_FIELDS]
{:as rs/as-unqualified-maps})]
(doseq [row rows]
(assoc row :extra 42))))))
PG2 code:
(with-title "pg select many fields WITH ASSOC"
(pg/with-connection [conn pg-config]
(quick-bench
(let [rows
(pg/execute conn
QUERY_SELECT_RANDOM_MANY_FIELDS)]
(doseq [row rows]
(assoc row :extra 42))))))
The result:
Timings:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 1161.563 | 2540.014 | 1502.913 |
core i9 2,4 GHz 8-Core 32G | 1330.753 | 953.199066 | 1473.492 |
arm m1 10 cores 32G | 547.914391 | 448.419085 | 619.253412 |
The chart shows that PG2 is a bit slower when it comes to parsing lazy maps. I believe there is a way to improve it in the future.
Anyway, we will get the clearest score if we measure the RPS of an HTTP server that sends the data from the database in JSON. Such a server was described in the previous post. We measure it again with the new release of PG2.
Test 6. Benchmarking the HTTP server with various -c values
1 concurrent request
ab -n 1000 -c 1 -l http://127.0.0.1:18080/
The chart and metrics (here and below: more is better):
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 225.22 | 116.24 | 202.88 |
core i9 2,4 GHz 8-Core 32G | 322.93 | 179.68 | 279.36 |
arm m1 10 cores 32G | 428.41 | 286.04 | 269.4 |
PG2 didn’t show great results on the M1 machine: it’s even slower than JDBC for just one concurrent request. But when it comes to many parallel requests, it’s as fast as PG 0.1.1 again.
16 concurrent requests
ab -n 1000 -c 16 -l http://127.0.0.1:18080/
The results:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 968.51 | 555.55 | 915.93 |
core i9 2,4 GHz 8-Core 32G | 1909.19 | 1304 | 1688.72 |
arm m1 10 cores 32G | 2999.06 | 1902.31 | 3026 |
PG2 version 0.1.2 is slightly slower than 0.1.1 because parsing lazy maps takes CPU resources.
64 concurrent requests
ab -n 1000 -c 64 -l http://127.0.0.1:18080/
The results:
Platform | PG2 0.1.1 | JDBC.Next | PG2 0.1.2 |
---|---|---|---|
core i5 2 GHz Quad-Core 16G | 1025.62 | 543.66 | 945.51 |
core i9 2,4 GHz 8-Core 32G | 2025.94 | 1113.81 | 1806.29 |
arm m1 10 cores 32G | 3348.75 | 1909.23 | 3205.83 |
Both 0.1.1 and 0.1.2 are almost even in terms of performance.
Summary
Introducing lazy maps and lazy result parsing was an interesting experiment. Although it slightly slows down processing the result, it brings some benefits. First, the time required for connection to serve a query reduces drastically. The connection dumps the payload it got for you, and then you parse it while the connection is serving another query. It’s especially useful when working with connection pools. For pools, you have to return the connections back as soon as possible as their number is limited. Spawning a fresh connection is expensive as it runs a new process on the server side.
Second, lazy parsing reduces the odds of getting an exception in the middle of reading the data from a connection. This is a weird situation when you got the data partially, and then an exception popped up. Without capturing it and rethrowing later, the connection might hang into a broken state.
To sum up — I’m not a big fan of laziness in Clojure but this time, the idea seems to be the right choice. Let’s see how it goes.
Нашли ошибку? Выделите мышкой и нажмите Ctrl/⌘+Enter
Viktor, 26th Jul 2024, link
Promising! A faster alternative to JDBC would be huge. What are the units on the different charts?
Ivan Grishaev, 26th Jul 2024, link
Victor, these are either milliseconds or nanoseconds, see the title of charts (ms or ns).