Ivan Grishaev's blog

PG2 release 0.1.9: arrays

Apr 4, 2024 clojure, programming, postgres, array

The latest 0.1.9 release of PG2 supports Postgres arrays.

In JDBC, arrays have always been a pain. Every time you’re about to pass an array to the database and read it back, you’ve got to wrap your data in various Java classes, extend protocols, and multimethods. In Postgres, the array type is quite powerful yet underestimated due to poor support of drivers. This is one more reason for running this project: to bring easy access to Postgres arrays.

PG2 tries its best to provide seamless connection between Clojure vectors and Postgres arrays. When reading an array, you get a Clojure vector. And vice versa: to pass an array object into a query, just submit a vector.

PG2 supports arrays of any type: not only primitives like numbers and strings but uuid, numeric, timestamp(tz), json(b), and more as well.

Arrays might have more than one dimension. Nothing prevents you from having a 3D array of integers like cube::int[][][], and it becomes a nested vector when fetched by PG2.

A technical note: PG2 supports both encoding and decoding of arrays in both text and binary modes.

Here is a short demo session. Let’s prepare a table with an array of strings:

(pg/query conn "create table arr_demo_1 (id serial, text_arr text[])")

Insert a simple item:

(pg/execute conn
            "insert into arr_demo_1 (text_arr) values ($1)"
            {:params [["one" "two" "three"]]})

In arrays, some elements might be NULL:

(pg/execute conn
            "insert into arr_demo_1 (text_arr) values ($1)"
            {:params [["foo" nil "bar"]]})

Now let’s check what we’ve got so far:

(pg/query conn "select * from arr_demo_1")

[{:id 1 :text_arr ["one" "two" "three"]}
 {:id 2 :text_arr ["foo" nil "bar"]}]

Postgres supports plenty of operators for arrays. Say, the && one checks if there is at least one common element on both sides. Here is how we find those records that have either “tree”, “four”, or “five”:

(pg/execute conn
            "select * from arr_demo_1 where text_arr && $1"
            {:params [["three" "four" "five"]]})

[{:text_arr ["one" "two" "three"], :id 1}]

Another useful operator is @> that checks if the left array contains all elements from the right array:

(pg/execute conn
            "select * from arr_demo_1 where text_arr @> $1"
            {:params [["foo" "bar"]]})

[{:text_arr ["foo" nil "bar"], :id 2}]

Let’s proceed with numeric two-dimensional arrays. They’re widely used in math, statistics, graphics, and similar areas:

(pg/query conn "create table arr_demo_2 (id serial, matrix bigint[][])")

Here is how you insert a matrix:

(pg/execute conn
            "insert into arr_demo_2 (matrix) values ($1)"
            {:params [[[[1 2] [3 4] [5 6]]
                       [[6 5] [4 3] [2 1]]]]})

{:inserted 1}

Pay attention: each number can be NULL but you cannot have NULL for an entire sub-array. This will trigger an error response from Postgres.

Reading the matrix back:

(pg/query conn "select * from arr_demo_2")

[{:id 1 :matrix [[[1 2] [3 4] [5 6]]
                 [[6 5] [4 3] [2 1]]]}]

A crazy example: let’s have a three dimension array of timestamps with a time zone. No idea how it can be used but still:

(pg/query conn "create table arr_demo_3 (id serial, matrix timestamp[][][])")

(def -matrix
  [[[[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]]
   [[[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]]
   [[[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]
    [[(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]
     [(OffsetDateTime/now) (OffsetDateTime/now) (OffsetDateTime/now)]]]])

(pg/execute conn
            "insert into arr_demo_3 (matrix) values ($1)"
            {:params [-matrix]})

Now read it back:

(pg/query conn "select * from arr_demo_3")

[{:matrix
  [... truncated
   [[[#object[java.time.LocalDateTime 0x5ed6e62b "2024-04-01T18:32:48.272169"]
      #object[java.time.LocalDateTime 0xb9d6851 "2024-04-01T18:32:48.272197"]
      #object[java.time.LocalDateTime 0x6e35ed84 "2024-04-01T18:32:48.272207"]]
     ...
     [#object[java.time.LocalDateTime 0x7319d217 "2024-04-01T18:32:48.272236"]
      #object[java.time.LocalDateTime 0x6153154d "2024-04-01T18:32:48.272241"]
      #object[java.time.LocalDateTime 0x2e4ffd44 "2024-04-01T18:32:48.272247"]]]
    ...
    [[#object[java.time.LocalDateTime 0x32c6e526 "2024-04-01T18:32:48.272405"]
      #object[java.time.LocalDateTime 0x496a5bc6 "2024-04-01T18:32:48.272418"]
      #object[java.time.LocalDateTime 0x283531ee "2024-04-01T18:32:48.272426"]]
     ...
     [#object[java.time.LocalDateTime 0x677b3def "2024-04-01T18:32:48.272459"]
      #object[java.time.LocalDateTime 0x46d5039f "2024-04-01T18:32:48.272467"]
      #object[java.time.LocalDateTime 0x3d0b906 "2024-04-01T18:32:48.272475"]]]]],
  :id 1}]

You can have an array of JSON(b) objects, too:

(pg/query conn "create table arr_demo_4 (id serial, json_arr jsonb[])")

Inserting an array of three maps:

  (pg/execute conn
              "insert into arr_demo_4 (json_arr) values ($1)"
              {:params [[{:foo 1} {:bar 2} {:test [1 2 3]}]]})

Elements might be everything that can be JSON-encoded: numbers, strings, boolean, etc. The only tricky case is a vector. To not break the algorithm that traverses the matrix, wrap a vector element with pg/json-wrap:

(pg/execute conn
            "insert into arr_demo_4 (json_arr) values ($1)"
            {:params [[42 nil {:some "object"} (pg/json-wrap [1 2 3])]]})

;; Signals that the [1 2 3] is not a nested array but an element.

Now read it back:

(pg/query conn "select * from arr_demo_4")

[{:id 1, :json_arr [42 nil {:some "object"} [1 2 3]]}]

Microsoft Teams

Mar 30, 2024 software, teams

Худшая программа, с которой мне приходится работать — это Microsoft Teams, и вот почему.

Teams — это Слак семилетней давности: тормозной и глючный. Если сегодняшний Слак еще более-менее, то после него Тимс — словно сидишь под водой: каждое действие на долю секунды медленней.

Удивляет, что в Микрософте сделали быстрый редактор VS Code, но не осилили месаджер. По-моему, ребят из Teams надо запереть в комнате с командой VS Code, чтобы те передали опыт. Странно, что никому не приходит это в голову.

За короткую жизнь Teams сменил несколько приложений. Сначала был Teams Classic. Потом вышел Teams for School and Work. Теперь вышел Teams New. Это говорит о том, что первое приложение сделано настолько плохо, что проще выпустить новое, чем исправлять старое. Этого никто не скрывает: первый Teams был сделан тяп-ляп, чтобы не дать Слаке занять весь рынок.

Read more →
Смысл жизни

Mar 30, 2024 life, meaning

Опасаюсь, что после сорока лет я начну искать смысл жизни. Знаете, бывает: живет человек, вроде бы все хорошо, а потом раз — на столе Евангелие, Будда и все такое. Начинаются брожения и поиски бога.

Но пока мне 38, и вроде бы ничего не предвещает беды. Я по-прежнему уверен, что бога нет, души нет, мир случаен. Вселенная конечна. Любовь — гормоны и инстинкты.

Философы — балаболы. Ни у одного я не видел крупицы смысла. А если крупицы и есть, то искать их среди 600 страниц — так себе удовольствие.

У Ницше не понял ни абзаца. Читал Гегеля — ощущения, словно вода сквозь пальцы. Слова понятны, смысла не вижу даже отдаленно.

Самый клевый чел — Иван Павлов. Он один сделал больше, чем все Гегели, Вольтеры и Канты вместе взятые.

Читал изыскания Толстого: долгий трактат о том, как он искал бога. Каким-то образом, много раз воздвигнув и разрушив всякие доводы, он построил модель мира, в которой прожил остаток дней. Мне как стороннему наблюдателю это кажется странным. Да, построил свою модель. Это примерно как написать свою версию популярной библиотеки: интересно, стимулирует мозг, не сидишь без дела. Главное — процесс.

Я не вижу проблем в том, что нет смысла. Зачем смысл? Все, кто утверждали, что нашли его, фактически обманули себя. Подобно Толстому, построили свою модель, которая со стороны смотрится нелепо. Я не хочу жить в модели.

Нам навязывают смысл еще на этапе сада и школы. И во взрослой жизни найдутся те, кто подскажут: смысл такой-то, нужно делать то и это. А ведь самое лучшее — жить без смысла. Делать то, что хочется прямо сейчас. И не бояться это признать: да, моя жизнь бессмысленна. Но зато интересна!
Смысл песен

Mar 29, 2024 life, music, meaning

Хороши те песни, где кроме хорошей мелодии присутствует смысл. Например, Анна Герман:

Нужно только выучиться ждать,

Нужно быть спокойным и упрямым,

Чтоб порой от жизни получать

Радости скупые телеграммы.

В четырех строках — жизнь человека. Другой пример, Pink Floyd, Time:

Tired of lying in the sunshine, staying home to watch the rain

You are young and life is long, and there is time to kill today

And then one day you find ten years have got behind you

No one told you when to run, you missed the starting gun

Опять, в четырех строчках — жизнь человека.

А если взять условный Linkin Park, то их текст окажется набором бессмысленных фраз. Я устал от этой лжи, мне нужны ответы, ты стал тем, к чему я стремился, давление ломает меня и так далее.

Это не отменяет того, то старые песни у них задорные. Просто любой задор со временем выветривается, а смысл остается.
DSL

Mar 29, 2024 programming, dsl

У программистов на функциональных языках есть болезнь — выдумывать DSL. Не важно, какую задачу они решают, главное — навертеть что-то похожее на язык и назвать его DSL. Потом поехать на конференцию и выступить, чтобы адепты растаскивали толк по телеграм-каналам.

Почему-то все забывают, что в аббревиатуре DSL буква D означает домен. Язык, специфичный для своего домена. Вопрос: есть ли у вас домен, под который вы пытаетесь подогнать язык? Как правило, нет.

Главное свойство домена в том, что он ортогонален другим доменам. Рассмотрим три вещи: Perl, HTML и SQL. Каждая технология занимает свою нишу. Они не пересекаются, а взаимно дополняют друг друга. Поэтому у каждой технологии — свой язык.

Другой пример: язык команд Redis, XML/XSLT, язык R. Все три — разные сущности, пересечения нет, везде свой язык.

То, что программисты называют DSL — это либо данные, по которым бегает фреймворк, либо макросы, чтобы код был короче. Оба подхода хороши до определенной черты, пока проблем не станет больше, чем пользы. Но называть их DSL — делать себе слишком много чести.

Бывает, макросы становятся такими сложными, что их и вправду можно назвать языком. Возможно, в этом случае их проще удалить. В одном из проектов я выкинул библиотеку meander и заменил на линейный код. Божечки, как это было хорошо! Никакой черной магии, все открыто.

Вот почему слово “DSL” пробуждает во мне плохие эмоции. Вам не нужен DSL. Прогуляйтесь до парка, вернитесь и сделайте задачу без DSL.
Бинарный дамп

Mar 29, 2024 programming, binary, dump, json, nippy

Простое правило: никогда не пользуйтесь бинарными дампами. Если нужно сбросить данные на диск, используйте JSON, EDN, YAML, что угодно. Но не нужно брать библиотеку, которая сериализует любой объект в байтики — это плохо кончится.

Десять лет назад я работал в Wargaming, и для очереди задач там использовали pickle. Кто не знает, это бинарный дамп объектов PyObject. И пока был только Питон, все работало. Но потом в качестве эксперимента завезли Эрланг, и приехали: он не мог читать эти дампы. Ребятам пришлось писать парсер, который с горем пополам вытягивал оттуда данные. Потом пришлось мигрировать сообщеньки с pickle на json, не спать неделю, мониторить и все такое.

В мире Clоjure есть похожая поделка — nippy. Это сброс чего угодно в байты с последующим чтением. Ужасная библиотека с интерфейсом а-ля “чужой для хищников”. Самое главное — с ней везде проблемы. Кастомные классы не читаются без приседаний. Ошибки игнорируются, вместо них мапы “тут что-то не то”. Стоит внедрить, как найдется потребитель с Питоном или Lua, для которых реализаций нет.

Потерял два дня, пытаясь заставить работать nippy в Граале. Не работает. Черная магия десериализации валится со страшными трейсами.

Хочется впасть в отчаяние: почему люди не учатся? Почему я ходил по этим граблям 10 лет назад, и меня заставили снова? Если хочется бинарей, возьми протобуф: стандарт с реализацией под все платформы. Зачем брать поделку от “опен-сорс ламы” (так автор себя называет)?

Все это мне не понятно.

PG2 release 0.1.6: rich JSON capabilities

Mar 20, 2024 clojure, programming, postgres, sql, json

PG2 version 0.1.6 is out, and it ships various improvements to JSON(b) handling.

Table of Content

Basic usage
Json Wrapper
Custom Object Mapper
Utility pg.json namespace
- Reading JSON
- Writing JSON
Ring HTTP middleware

Postgres is amazing when dealing with JSON. There hardly can be a database that serves it better. Unfortunately, Postgres clients never respect the JSON feature, which is horrible. Take JDBC, for example: when querying a JSON(b) value, you’ll get a dull PGObject which should be decoded manually. The same applies to insertion: one cannot just pass a Clojure map or a vector. It should be packed into the PGObject as well.

Of course, this can be automated by extending certain protocols. But it’s still slow as it’s done on Clojure level (not Java), and it forces you to copy the same code across projects.

Fortunately, PG2 supports JSON out from the box. If you query a JSON value, you’ll get its Clojure counter-part: a map, a vector, etc. To insert a JSON value to a table, you pass either a Clojure map or a vector. No additional steps are required.

PG2 relies on jsonista library to handle JSON. At the moment of writing, this is the fastest JSON library for Clojure. Jsonista uses a concept of object mappers: objects holding custom rules to encode and decode values. You can compose your own mapper with custom rules and pass it into the connection config.

Basic usage

Let’s prepare a connection and a test table with a jsonb column:

(def config
  {:host "127.0.0.1"
   :port 10140
   :user "test"
   :password "test"
   :dbname "test"})

(def conn
  (jdbc/get-connection config))

(pg/query conn "create table test_json (
  id serial primary key,
  data jsonb not null
)")

Now insert a row:

(pg/execute conn
            "insert into test_json (data) values ($1)"
            {:params [{:some {:nested {:json 42}}}]})

No need to encode a map manually nor wrap it into a sort of PGObject. Let’s fetch the new row by id:

(pg/execute conn
            "select * from test_json where id = $1"
            {:params [1]
             :first? true})

{:id 1 :data {:some {:nested {:json 42}}}}

Again, the JSON data returns as a Clojure map with no wrappers.

When using JSON with HoneySQL though, some circs are still needed. Namely, you have to wrap a value with [:lift ...] as follows:

(pgh/insert-one conn
                :test_json
                {:data [:lift {:another {:json {:value [1 2 3]}}}]})

{:id 2, :data {:another {:json {:value [1 2 3]}}}}

Without the [:lift ...] tag, HoneySQL will treat the value as a nested SQL map and try to render it as a string, which will fail of course or lead to a SQL injection.

Another way is to use HoneySQL parameters conception:

(pgh/insert-one conn
                :test_json
                {:data [:param :data]}
                {:honey {:params {:data {:some [:json {:map [1 2 3]}]}}}})

For details, see the “HoneySQL Integration” section.

PG2 supports not only Clojure maps but vectors, sets, and lists. Here is an example with with a vector:

(pg/execute conn
            "insert into test_json (data) values ($1)"
            {:params [[:some :vector [:nested :vector]]]})

{:id 3, :data ["some" "vector" ["nested" "vector"]]}

Json Wrapper

In rare cases you might store a string or a number in a JSON field. Say, 123 is a valid JSON value but it’s treated as a number. To tell Postgres it’s a JSON indeed, wrap the value with pg/json-wrap:

(pgh/insert-one conn
                :test_json
                {:data (pg/json-wrap 42)})

{:id 4, :data 42}

The wrapper is especially useful to store a “null” JSON value: not the standard NULL but "null" which, when parsed, becomes nil. For this, pass (pg/json-wrap nil) as follows:

(pgh/insert-one conn
                :test_json
                {:data (pg/json-wrap nil)})

{:id 5, :data nil} ;; "null" in the database

Custom Object Mapper

One great thing about Jsonista is a conception of mapper objects. A mapper is a set of rules how to encode and decode data. Jsonista provides a way to build a custom mapper. Once built, it can be passed to a connection config so the JSON data is written and read back in a special way.

Let’s assume you’re going to tag JSON sub-parts to track their types. For example, if encoding a keyword :foo, you’ll get a vector of ["!kw", "foo"]. When decoding that vector, by the "!kw" string, the mapper understands it a keyword and coerces "foo" to :foo.

Here is how you create a mapper with Jsonista:

(ns ...
  (:import
   clojure.lang.Keyword
   clojure.lang.PersistentHashSet)
  (:require
    [jsonista.core :as j]
    [jsonista.tagged :as jt]))

(def tagged-mapper
  (j/object-mapper
   {:encode-key-fn true
    :decode-key-fn true
    :modules
    [(jt/module
      {:handlers
       {Keyword {:tag "!kw"
                 :encode jt/encode-keyword
                 :decode keyword}
        PersistentHashSet {:tag "!set"
                           :encode jt/encode-collection
                           :decode set}}})]}))

The object-mapper function accepts even more options but we skip them for now.

Now that you have a mapper, pass it into a config:

(def config
  {:host "127.0.0.1"
   :port 10140
   :user "test"
   :password "test"
   :dbname "test"
   :object-mapper tagged-mapper})

(def conn
  (jdbc/get-connection config))

All the JSON operations made by this connection will use the passed object mapper. Let’s insert a set of keywords:

(pg/execute conn
            "insert into test_json (data) values ($1)"
            {:params [{:object #{:foo :bar :baz}}]})

When read back, the JSON value is not a vector of strings any longer but a set of keywords:

(pg/execute conn "select * from test_json")

[{:id 1, :data {:object #{:baz :bar :foo}}}]

To peek a raw JSON value, select it as a plain text and print (just to avoid escaping quotes):

(printl (pg/execute conn "select data::text json_raw from test_json where id = 10"))

;; [{:json_raw {"object": ["!set", [["!kw", "baz"], ["!kw", "bar"], ["!kw", "foo"]]]}}]

If you read that row using another connection with a default object mapper, the data is returned without expanding tags.

Utility pg.json namespace

PG2 provides an utility namespace for JSON encoding and decoding. You can use it for files, HTTP API, etc. If you already have PG2 in the project, there is no need to plug in Cheshire or another JSON library. The namespace is pg.json:

(ns ...
  (:require
   [pg.json :as json]))

Reading JSON

The read-string function reads a value from a JSON string:

(json/read-string "[1, 2, 3]")

[1 2 3]

The first argument might be an object mapper:

(json/read-string tagged-mapper "[\"!kw\", \"hello\"]")

:hello

The functions read-stream and read-reader act the same but accept either an InputStream or a Reader object:

(let [in (-> "[1, 2, 3]" .getBytes io/input-stream)]
  (json/read-stream tagged-mapper in))

(let [in (-> "[1, 2, 3]" .getBytes io/reader)]
  (json/read-reader tagged-mapper in))

Writing JSON

The write-string function dumps an value into a JSON string:

(json/write-string {:test [:hello 1 true]})

;; "{\"test\":[\"hello\",1,true]}"

The first argument might be a custom object mapper. Let’s reuse our tagger mapper:

(json/write-string tagged-mapper {:test [:hello 1 true]})

;; "{\"test\":[[\"!kw\",\"hello\"],1,true]}"

The functions write-stream and write-writer act the same. The only difference is, they accept either an OutputStream or Writer objects. The first argument might be a mapper as well:

(let [out (new ByteArrayOutputStream)]
  (json/write-stream tagged-mapper {:foo [:a :b :c]} out))

(let [out (new StringWriter)]
  (json/write-writer tagged-mapper {:foo [:a :b :c]} out))

Ring HTTP middleware

PG2 provides an HTTP Ring middleware for JSON. It acts like wrap-json-request and wrap-json-response middleware from the ring-json library. Comparing to it, the PG2 stuff has the following advantages:

it’s faster because of Jsonista, whereas Ring-json relies on Cheshire;
it wraps both request and response at once with a shortcut;
it supports custom object mappers.

Imagine you have a Ring handler that reads JSON body and returns a JSON map. Something like this:

(defn api-handler [request]
  (let [user-id (-> request :data :user_id)
        user (get-user-by-id user-id)]
    {:status 200
     :body {:user user}}))

Here is how you wrap it:

(ns ...
  (:require
   [pg.ring.json :refer [wrap-json
                         wrap-json-response
                         wrap-json-request]]))

(def app
  (-> api-handler
      (wrap-this-foo)
      (wrap-json <opt>)
      (wrap-that-bar)))

Above, the wrap-json wrapper is a combination of wrap-json-request and wrap-json-response. You can apply them both explicitly:

(def app
  (-> api-handler
      (wrap-this-foo)
      (wrap-json-request <opt>)
      (wrap-json-response <opt>)
      (wrap-that-bar)))

All the three wrap-json... middleware accept a handler to wrap and a map of options. Here is the options supported:

Name	Direction	Description
`:object-mapper`	request, response	An custom instance of `ObjectMapper`
`:slot`	request	A field to `assoc` the parsed JSON data (1)
`:malformed-response`	request	A ring response returned when payload cannot be parsed (2)

Notes:

The default slot name is :json. Please avoid using :body or :params to prevent overriding existing request fields. This is especially important for :body! Often, you need the origin input stream to calculate an MD5 or SHA-256 hash-sum of the payload. If you overwrite the :body field, you cannot do that.
The default malformed response is something like 400 “Malformed JSON” (plain text).

A full example:

(def json-opt
  {:slot :data
   :object-mapper tagged-mapper ;; see above
   :malformed-response {:status 404
                        :body "<h1>Bad JSON</h1>"
                        :headers {"content-type" "text/html"}}})

(def app
  (-> api-handler
      (wrap-this-foo)
      (wrap-json json-opt)
      (wrap-that-bar)))

PG2 release 0.1.5: Migrations

Mar 14, 2024 clojure, programming, postgres, sql, migrations

PG2 version 0.1.5 ships its own migration engine through the pg2-migration package. Like Migratus or Ragtime, it allows to grow the database schema continuously, track changes and apply them with care.

Table of Content

Installation
Concepts
Naming
SQL
No Code-Driven Migrations
Migration Resources
Migration Table
CLI Interface
Config
Commands
- Create
- List
- Migrate
- Rollback
Lein examples
Deps.edn examples
API Interface
Conflicts

Installation

;; lein
[com.github.igrishaev/pg2-migration "0.1.5]

;; deps
com.github.igrishaev/pg2-migration {:mvn/version "0.1.5"}

Concepts

Migrations are SQL files that are applied to the database in certain order. A migration has an id and a direction: next/up or prev/down. Usually it’s split on two files called <id>.up.sql and <id>.down.sql holding SQL commands. Say, the -up file creates a table with an index, and the -down one drops the index first, and then the table.

Migrations might have a slug: a short human-friendly text describing changes. For example, in a file called 002.create-users-table.up.sql, the slug is “Create users table”.

Naming

In PG2, the migration framework looks for files matching the following pattern:

<id>.<slug>.<direction>.sql

where:

id is a Long number, for example 12345 (a counter), or 20240311 (date precision), or 20240311235959 (date & time precision);
slug is an optional word or group of words joined with - or _, for example create-users-table-and-index or remove_some_view. When rendered, both - and _ are replaced with spaces, and the phrase is capitalized.
direction is either prev/down or next/up. Internally, down and up are transformed to prev and next because these two have the same amount of characters and files look better.

Examples:

001-create-users.next.sql
012-next-only-migration.up.sql
153-add-some-table.next.sql

Above, the leading zeroes in ids are used for better alignment only. Infernally they are transferred into 1, 12 and 153 Long numbers. Thus, 001, 01 and 1 become the same id 1 after parsing.

Each id has at most two directions: prev/down and next/up. On bootstrap, the engine checks it to prevent weird behaviour. The table below shows there are two rows which, after parsing, have the same (id, direction) pair. The bootstrap step will end up with an exception saying which files duplicate each other.

Filename	Parsed
`001-some-trivial-slug.next.sql`	(1, next)
`001-some-simple-slug.next.sql`	(1, next)

A migration might have only one direction, e.g. next/up or prev/down file only.

When parsing, the registry is ignored meaning that both 001-Create-Users.NEXT.sql and 001-CREATE-USERS.next.SQL files produce the same map.

SQL

The files hold SQL expressions to be evaluated by the engine. Here is the content of the 001-create-users.next.sql file:

create table IF NOT EXISTS test_users (
  id serial primary key,
  name text not null
);

BEGIN;

insert into test_users (name) values ('Ivan');
insert into test_users (name) values ('Huan');
insert into test_users (name) values ('Juan');

COMMIT;

Pay attention to the following points.

A single file might have as many SQL expressions as you want. There is no need to separate them with magic comments like --;; as Migratus requires. The whole file is executed in a single query. Use the standard semicolon at the end of each expression.
There is no a hidden transaction management. Transactions are up to you: they are explicit! Above, we wrap tree INSERT queries into a single transaction. You can use save-points, rollbacks, or whatever you want. Note that not all expressions can be in a transaction. Say, the CREATE TABLE one cannot and thus is out from the transaction scope.

For granular transaction control, split your complex changes on two or three files named like this:

# direct parts
001-huge-update-step-1.next.sql
002-huge-update-step-2.next.sql
003-huge-update-step-3.next.sql

# backward counterparts
003-huge-update-step-3.prev.sql
002-huge-update-step-2.prev.sql
001-huge-update-step-1.prev.sql

No Code-Driven Migrations

At the moment, neither .edn nor .clj migrations are supported. This is by design because personally I’m highly against mixing SQL and Clojure. Every time I see an EDN transaction, I get angry. Mixing these two for database management is the worst idea one can come up with. If you’re thinking about migrating a database with Clojure, please close you laptop and have a walk to the nearest park.

Migration Resources

Migration files are stored in project resources. The default search path is migrations. Thus, their physical location is resources/migrations. The engine scans the migrations resource for children files. Files from nested directories are also taken into account. The engine supports Jar resources when running the code from an uberjar.

The resource path can be overridden with settings.

Migration Table

All the applied migrations are tracked in a database table called migrations by default. The engine saves the id and the slug or a migration applied as well as the current timestamp of the event. The timestamp field has a time zone. Here is the structure of the table:

CREATE TABLE IF NOT EXISTS migrations (
  id BIGINT PRIMARY KEY,
  slug TEXT,
  created_at timestamp with time zone not null default current_timestamp
)

Every time you apply a migration, a new record is inserted into the table. On rollback, a corresponding migration is deleted.

You can override the name of the table in settings (see below).

CLI Interface

The migration engine is controlled with both API and CLI interface. Let’s review CLI first.

The pg.migration.cli namespaces acts like the main entry point. It accepts general options, a command, and command-specific options:

<global options> <command> <command options>

General options are:

-c, --config CONNFIG     migration.config.edn      Path to the .edn config file
-p, --port PORT          5432                      Port number
-h, --host HOST          localhost                 Host name
-u, --user USER          The current USER env var  User
-w, --password PASSWORD  <empty string>            Password
-d, --database DATABASE  The current USER env var  Database
    --table TABLE        :migrations               Migrations table
    --path PATH          migrations                Migrations path

Most of the options have default values. Both user and database names come from the USER environment variable. The password is an empty string by default. For local trusted connections, the password might not be required.

The list of the commands:

Name	Meaning
create	Create a pair of blank up & down migration files
help	Print a help message
list	Show all the migrations and their status (applied or not)
migrate	Migrate forward (everything, next only, or up to a certain ID)
rollback	Rollback (the current one, everything, or down to a certain ID)

Each command has its own sub-options which we will describe below.

Here is how you review the migrations:

<lein or deps preamble> \
    -h 127.0.0.1 \
    -p 10150 \
    -u test \
    -w test \
    -d test \
    --table migrations_test \
    --path migrations \
    list

|    ID | Applied? | Slug
| ----- | -------- | --------
|     1 | true     | create users
|     2 | false    | create profiles
|     3 | false    | next only migration
|     4 | false    | prev only migration
|     5 | false    | add some table

Every command has its own arguments and help message. For example, to review the create command, run:

lein with-profile +migrations run -m pg.migration.cli -c config.example.edn create --help

Syntax:
      --id ID             The id of the migration (auto-generated if not set)
      --slug SLUG         Optional slug (e.g. 'create-users-table')
      --help       false  Show help message

Config

Passing -u, -h, and other arguments all the time is inconvenient. The engine can read them at once from a config file. The default config location is migration.config.edn. Override the path to the config using the -c parameter:

<lein/deps> -c config.edn list

The config file has the following structure:

{:host "127.0.0.1"
 :port 10150
 :user "test"
 :password #env PG_PASSWORD
 :database "test"
 :migrations-table :migrations_test
 :migrations-path "migrations"}

The :migrations-table field must be a keyword because it takes place in a HoneySQL map.

The :migrations-path field is a string referencing a resource with migrations.

Pay attention to the #env tag. The engine uses custom readers when loading a config. The tag reads the actual value from an environment variable. Thus, the database password won’t be exposed to everyone. When the variable is not set, an exception is thrown.

Commands

Create

The create command makes a pair of two blank migration files. If not set, the id is generated automatically using the YYYYmmddHHMMSS pattern.

lein with-profile +migration run -m pg.migration.cli \
  -c config.example.edn \
  create

ls -l migrations

20240312074156.next.sql
20240312074156.prev.sql

You can also provide a custom id and a slug as well:

lein with-profile +migration run -m pg.migration.cli \
  -c config.example.edn \
  create \
  --id 100500 \
  --slug 'some huge changes in tables'

ll migrations

100500.some-huge-changes-in-tables.next.sql
100500.some-huge-changes-in-tables.prev.sql
20240312074156.next.sql
20240312074156.prev.sql

List

The list command renders all the migrations and their status: whether they are applied or not.

lein with-profile +migration run -m pg.migration.cli -c config.example.edn list

|    ID | Applied? | Slug
| ----- | -------- | --------
|     1 | true     | create users
|     2 | true     | create profiles
|     3 | true     | next only migration
|     4 | false    | prev only migration
|     5 | false    | add some table

Migrate

The migrate command applies migrations to the database. By default, all the pending migrations are processed. You can change this behaviour using these flags:

... migrate --help

Syntax:
      --all           Migrate all the pending migrations
      --one           Migrate next a single pending migration
      --to ID         Migrate next to certain migration
      --help   false  Show help message

With the --one flag set, only one next migration will be applied. If --to parameter is set, only migrations up to this given ID are processed. Examples:

... migrate           # all migrations
... migrate --all     # all migrations
... migrate --one     # next only
... migrate --to 123  # all that <= 123

Rollback

The rollback command reverses changes in the database and removes corresponding records from the migration table. By default, only the current migration is rolled back. Syntax:

... rollback --help

Syntax:
      --all           Rollback all the previous migrations
      --one           Rollback to the previous migration
      --to ID         Rollback to certain migration
      --help   false  Show help message

The --one argument is the default behaviour. When --all is passed, all the backward migrations are processed. To rollback to a certain migration, pass --to ID. Examples:

... rollback               # current only
... rollback --one         # current only
... rollback --to 20240515 # down to 20240515
... rollback --all         # down to the very beginning

Lein examples

Lein preamble looks usually something like this:

> lein run -m pg.migration.cli <ARGS>

The pg2-migration library must be in dependencies. Since migrations are managed aside from the main application, they’re put into a separate profile, for example:

:profiles
{:migrations
 {:main pg.migration.cli
  :resource-paths ["path/to/resources"]
  :dependencies
  [[com.github.igrishaev/pg2-core ...]]}}

Above, the migrations profile has the dependency and the :main attribute. Now run lein run with migration arguments:

> lein with-profile +migrations run -c migration.config.edn migrate --to 100500

Deps.edn examples

Here is an example of an alias in deps.edn that prints pending migrations:

{:aliases
 {:migrations-list
  {:extra-deps
   {com.github.igrishaev/pg2-migration {:mvn/version "..."}}
   :extra-paths
   ["test/resources"]
   :main-opts
   ["-m" "pg.migration.cli"
    "-h" "127.0.0.1"
    "-p" "10150"
    "-u" "test"
    "-w" "test"
    "-d" "test"
    "--table" "migrations_test"
    "--path" "migrations"
    "list"]}}}

Run it as follows:

> clj -M:migrations-list

You can shorten it by using the config file. Move all the parameters into the migration.config.edn file, and keep only a command with its sub-arguments in the :main-opts vector:

{:aliases
 {:migrations-migrate
  {:extra-deps
   {com.github.igrishaev/pg2-migration {:mvn/version "..."}}
   :extra-paths
   ["test/resources"]
   :main-opts ["migrate" "--all"]}}}

To migrate:

> clj -M:migrations-migrate

API Interface

There is a way to manage migrations through code. The pg.migration.core namespace provides basic functions to list, create, migrate, and rollback migrations.

To migrate, call one of the following functions: migrate-to, migrate-all, and migrate-one. All of them accept a config map:

(ns demo
  (:require
   [pg.migration.core :as mig]))

(def CONFIG
  {:host "127.0.0.1"
   :port 5432
   :user "test"
   :password "secret"
   :database "test"
   :migrations-table :test_migrations
   :migrations-path "migrations"})

;; migrate all pinding migrations
(mig/migrate-all CONFIG)

;; migrate only one next migration
(mig/migrate-one CONFIG)

;; migrate to a certain migration
(mig/migrate-to CONFIG 20240313)

The same applies to rollback:

;; rollback all previously applied migrations
(mig/rollback-all CONFIG)

;; rollback the current migration
(mig/migrate-one CONFIG)

;; rollback to the given migration
(mig/rollback-to CONFIG 20230228)

The read-disk-migrations function reads migrations from disk. It returns a sorted map without information about whether migrations have been applied:

(mig/read-disk-migrations "migrations")

{1
 {:id 1
  :slug "create users"
  :url-prev #object[java.net.URL "file:/.../migrations/001-create-users.prev.sql"]
  :url-next #object[java.net.URL "file:/.../migrations/001-create-users.next.sql"]}
 2
 {:id 2
  :slug "create profiles"
  :url-prev #object[java.net.URL "file:/.../migrations/foobar/002-create-profiles.prev.sql"]
  :url-next #object[java.net.URL "file:/.../migrations/foobar/002-create-profiles.next.sql"]}
 ...}

The make-scope function accepts a config map and returns a scope map. The scope map knows everything about the state of migrations, namely: which of them have been applied, what is the current migration, the table name, the resource path, and more.

The function create-migration-files creates and returns a pair of empty SQL files. By default, the id is generated from the current date & time, and the slug is missing:

(create-migration-files "migrations")

[#object[java.io.File "migrations/20240313120122.prev.sql"]
 #object[java.io.File "migrations/20240313120122.next.sql"]]

Pass id and slug in options if needed:

(create-migration-files "migrations" {:id 12345 :slug "Hello migration"})

[#object[java.io.File "migrations/12345.hello-migration.prev.sql"]
 #object[java.io.File "migrations/12345.hello-migration.next.sql"]]

Conflicts

On bootstrap, the engine checks migrations for conflicts. A conflict is a situation when a migration with less id has been applied before a migration with greater id. Usually it happens when two developers create migrations in parallel and merge them in a wrong order. For example:

the latest migration id is 20240312;
developer A makes a new branch and creates a migration 20240315;
the next day, developer B opens a new branch with a migration 20240316;
dev B merges the branch, now we have 20240312, then 20240316;
dev A merges the branch, and we have 20240312, 20240316, 20240315.

When you try to apply migration 20240315, the engine will check if 20240316 has already been applied. If yes, an exception pops up saying which migration cause the problem (in our case, these are 20240316 and 20240315). To recover from the conflict, rename 20240315 to 20240317.

In other words: this is a conflict:

id        applied?
20240312  true
20240315  false
20240316  true  ;; applied before 20240315

And this is a solution:

id        applied?
20240312  true
20240316  true
20240317  false ;; 20240315 renamed to 20240317

Вдруг изменится

Mar 10, 2024 updates, life

Иногда мы видим замечания коллег: вдруг в будущем эта штука изменится? Давай исправим сейчас, чтобы не возиться потом.

Благие намерения, однако: коллеге кажется, а переделывать вам. Он написал комментарий, а вам переписывать экран кода.

В таких случаях попросите коллегу подтвердить свою гипотезу. Пусть вспомнит примеры из библиотек, проектов, случаи из практики. Если найдется подтверждение, то исправьте, иначе — нет.

Важно понимать, что время, потраченное сейчас, и время, возможно, потраченное в будущем — это разные вещи. Первое мы не нагоним никогда, второе может никогда не наступить.

Иногда коллега прикрывается тем, что печется о будущем фирмы. Разговор можно повернуть так, что он блокирует задачу и ведет ведет нас не в то будущее. Это помогает.

Наконец, за умной гипотезой может быть банальное говноедство — емкое слово, чтобы описать ситуацию (и человека). Но даже в этом случае мы, специалисты, должны хранить лицо и отвечать со своего уровня.

Звучит хорошо, но как бы этому следовать!
Правда

Mar 10, 2024 life, truth

Есть фееричное по своей глупости выражение: на правду не обижаются. Это верно: обижаются не на правду, а на бестактность, грубость, фамильярность, непрошенное мнение и все то, что сопутствует правде.

Если собеседник обижается на правду, стоит подумать, как лучше ее донести и вообще — стоит ли ее доносить.

Table of Content

Basic usage

Json Wrapper

Custom Object Mapper

Utility pg.json namespace

Reading JSON

Writing JSON

Ring HTTP middleware

Table of Content

Installation

Concepts

Naming

SQL

No Code-Driven Migrations

Migration Resources

Migration Table

CLI Interface

Config

Commands

Create

List

Migrate

Rollback

Lein examples

Deps.edn examples

API Interface

Conflicts

← Страница 11 из 84 →