Migration from Postgres to Datomic

Recently, I migrated my Clojure-driven pet project from PostgreSQL to Datomic. This is Queryfeed, a web application to fetch data from social networks. I’ve been running it for several years considering it as a playground for some experiments. Long ago, it was written in Python, then I ported it to Clojure.

It was a great experience when I just finished reading “Clojure for True and Brave” book and was full of desire to apply new knowledge to something practical rather than solving Euler problems in vain.

This time, I’ve made another effort to switch the database backend to Datomic. Datomic is a modern, fact-driven database developed in Cognitect to be used in conjunction with Clojure. It really differs for classical RDBS such as MySQL or PostgreSQL. For a long time, I’ve been thinking whether I should try it. Meanwhile, more and more Clojure/conj talks have been publishing on YouTube. At my work, we use vast PostgreSQL database and the code base is tied to close to it. There is no an option to perform a switch on weekends. So I decided to port my pet project to Datomic in my spare time.

Surely, before doing this, I googled for a while and was really wondered about how few information I found on the Internet. There were just three posts that did not cover the subject in details. So I decided to share my experience here. Maybe it would help somebody with their migration duties.

Of cause, I cannot guarantee the steps described below will meet your requirements as well. Each database is different, so it’s impossible to develop a final tool that could handle all the cases. But at least you may borrow some of those.

Introduction

Before we begin, let’s talk about what is the reason to switch to Datomic. That question cannot be answered just in one or two points. Before Datomic, I’ve been working with PostgreSQL for several years and reckon it as a great software. There is no such a task that Postgres cannot deal with. Here are just some of them:

streaming replication, smart sharding;
geo-spatial data, PostGIS;
full-text search, trigrams;
JSON(b) data structures;
typed arrays;
recursive queries;
and tons of other benefits.

So if Postgres is really so great, why switching then? In my point of view, it brings the following benefits into a project:

Datomic is simple. In fact, it has only two operations: read (querying) and write (transaction).
It supports joins as well. Once you have a reference, it can be resolved into a nested map. References may be recursive. In PostgreSQL or any other RDBS, you always have a plain result with possibly duplicated rows. The ORM logic that may deal with parsing raw SQL response might be too complicated to understand.
Datomic was developed in the same terms as Clojure was. These are simplicity, immutability and declarative style. Datomic shares Clojure’s values.
It accumulates changes through time like Git or any other control version system. With Datomic, you may always roll-back in time to get a history of an order or collect audit logs.

Let’s highlight some general steps we should pass through to complete the migration. These are:

dump you Postgres data;
add Datomic into your project;
load the data into Datomic;
rewrite the code that operates on data;
rewrite your HTML templates;
update or add unit tests;
remote JDBC/Postgres from your project;
setup infrastructure (backups, console, etc)

As you see, it is not as simple as it could be thought even for a small project. In my case, migrating Queryfeed took about a week working by nights. It includes:

two days to read the whole Datomic documentation;
one day to migrate the data;
two days to fix the business logic code and templates;
two days to deploy everything to the server.

Regarding to the documentation, I highly recommend you to read it first before doing anything. Please do not rely on random Stack Overflow snippets. Datomic is completely different than classical SQL databases, so your long-term Postgres or MySQL experience won’t work.

Quick tip here, since it could be difficult to read lots of text from a screen, I just download any page I wish to read into my Kindle using the official Amazon extension for Chrome. The paper appears on my Kindle in a minute and I read it.

Once you’ve finished with the docs, feel free to the next step: dumping your PostgreSQL data.

Dump Postgres database

Exporting you data into a set of files won’t be so difficult I believe. I may guess your project has projectname.db module that handles the most of database stuff. It should have clojure.java.jdbc module imported and *db* or db-spec variables declared. Your goal is for every table you have in the database, run a query something like select * from <table_name> against it and save the result into a file.

What file format to use depends on your own preferences, but I highly recommend the standard EDN files rather than JSON, CSV or whatever. The main point in favor of EDN is it handles extended data types such as dates and UUIDs. In my case, every table has at least one date field, for example created_at that is not null and is set with the current time automatically. When using JSON or YAML, the dates will be just strings so you need to write extra code to restore a native java.util.Date class from a string. So are unique identifiers, UUIDs.

In addition, since EDN files represent native Clojure data structures, you don’t need to add org.clojure/data.json dependency into your project. Everything can be made with out-from-the-box functions. The next snippet dumps all the users from your Postgres database into a users.edn file:

(def *db* {... your JDBC spec map...})

(def query (partial jdbc/query db-spec))

(spit "users.edn" (with-out-str (-> "select * from users" query prn)))

And that is! With only one line of code, you’ve just dumped the whole table into a file. Repeat it several times substituting a name of an *.edn file and a table. If you have many tables, wrap it with a function:

(defn dump [table]
  (spit (format "%s.edn" table)
    (with-out-str (-> (format "select * from %s" table)
                      query
                      prn))))

Then run it against a vector of table names but not a set since an order is important. For example, if you have a user has a foreign key to orders table, it should be loaded first.

To check whether your dump is correct, try to restore it from a file as follows:

(-> "users.edn" slurp read-string first)

Again, it is so simple to perform such things in Clojure. Within one line of code, you have just read the file, restored the Clojure data from it and took the first map from a list. In REPL, you should see something like:

{:id 1
 :name "Ivan"
 :email "test@test.com"
 ... other fields
 }

That means the dump step was done as well.

Adding Datomic into your project

Here, I won’t discuss on that step so long since it is highlighted as well in the official documentation. Briefly, you need to:

register on Datomic site, it is free;
set up your GPG credentials;
add Datomic repository and the library into your project;
(optional) if you use Postgres-driven backend for Datomic, create a new Postgres database using SQL scripts from sql folder. Then run a transactor.

Below, here is a brief example of my setup:

;; project.clj
(defproject my-project "0.x.x"
  ...
  :repositories {"my.datomic.com" {:url "https://my.datomic.com/repo"
                                   :creds :gpg}}

  :dependencies [...
                 [com.datomic/datomic-pro "0.9.5561.50"]
                 ...]
  ...)

Run lein deps to download the library. You will be probably prompted to input your GPG key.

A quick try in REPL:

(require '[datomic.api :as d])
(def conn (d/connect "datomic:mem://test-db"))

Loading the data into Datomic

In this step, we will load the previously dumped data into your Datomic installation.

First, we need to prepare the schema before loading the data. A schema is a collection of attributes. Each attribute by itself is a small piece of information, for example a :user/name attribute keeps a string value and indicates a user’s name.

An entity is a set of attributes linked together by system identifier. Thinking in RDBS terms, an attribute is a DB column whereas an entity is a row of a table. That really differs Datomic from such schema-less databases as MongoDB for example. In Mongo, every entity may have any structure you wish even across the same collection. In Datomic, you cannot write a string value into a number or a boolean into a date. One note, an entity may own an arbitrary number of attributes.

For example, in Postgres if you did not set default values for a column and it is not null, you just cannot skip it when inserting a row. In Datomic, you may submit as many attributes as you want when performing a transaction. Imagine we have a user model with ten attributes: a name, email, etc. When creating a user, I may pass only a name and there won’t be an error. So pay attention you submit all the required attributes.

Datomic schema is represented by native Clojure data structures: maps, keywords and vectors. That’s why they are stored in EDN files as well. A typical initial schema for fresh Datomic installation may look as follows:

[
 ;; Enums
 {:db/ident :user.gender/male}
 {:db/ident :user.gender/female}

 {:db/ident :user.source/twitter}
 {:db/ident :user.source/facebook}

 ;; Users

 {:db/ident       :user/pg-id
  :db/valueType   :db.type/long
  :db/cardinality :db.cardinality/one
  :db/unique      :db.unique/identity}

 {:db/ident       :user/source
  :db/valueType   :db.type/ref
  :db/cardinality :db.cardinality/one
  :db/isComponent true}

 {:db/ident       :user/source-id
  :db/valueType   :db.type/string
  :db/cardinality :db.cardinality/one}
 ...
]

The first four ones are special attributes that are proposed as enum values. I will discuss more on them later.

Again, check for the official documentation that describes schema usage.

Now that we prepared a schema, let add some boilerate code in our db namespace:

(ns project.db
  (:require [clojure.java.io :as io]
            [datomic.api :as d]))

;; in-memory database for test purposes
(def db-uri "datomic:mem://test-db")

;; global Datomic connection wrapped in atom
(def *conn (atom nil))

;; A function to initiate the global state
(defn init-db []
  (d/create-database db-uri)
  (reset! *conn (d/connect db-uri)))

;; reads an EDN file located in `resources` folder
(defn read-edn
  [filename]
  (-> filename
      io/resource
      slurp
      read-string))

;; reads and loads a schema from EDN file
(defn load-schema
  [filename]
  @(d/transact @*conn (read-edn filename)))

I hope the comments highlight the meaning of the code as well. I just declared a database URL, a global connection, a function to connect to the DB and two helper functions.

The first function just reads a EDN file and returns a data structure. Since our files a stored in resources folder, there is a io/resource wrapper here in the threading chain.

The second function also read a file but also performs a Datomic transaction passing data as a schema.

The db-uri variable is represented with URL-like string. Currently, we use in-memory storage for test purposes. I really doubt you can load the data directly to SQL-driven storage without errors so let’s just practice for a while. Later, when the import step will be ready, we will just switch db-uri variable to production-ready URL.

With the code above, we are ready to load the schema. I put my initial schema into a file resources/schema/0001-init.edn so I may load it as follows:

(init-db)
(load-schema "schema/0001-init.edn")

Now that we have a schema, let’s load the previously saved Postgres data. We need to add more boilerate code. Unfortunately, there cannot be a common function that may map your Postgres fields into Datomic attributes. The functions to convert your data might look a bit ugly, but they are one-time-purpose only so please don’t mind.

For each EDN file that contains data of a specific table, we should:

read a proper file, get a list of maps;
convert each PostgreSQL map into Datomic map;
perform Datomic transaction passing a vector of Datomic maps.

Below, here is an example of my pg-user-to-datomic function that accepts a Postgres-driven map and turns it into a set of Datomic attributes:

(defn pg-user-to-datomic
  [{:keys [email
           first_name
           timezone
           source_url
           locale
           name
           access_token
           access_secret
           source
           token
           status
           id
           access_expires
           last_name
           gender
           source_id
           is_subscribed
           created_at]}]

  {:user/pg-id id
   :user/email (or email "")
   :user/first-name (or first_name "")
   :user/timezone (or timezone "")
   :user/source-url (URI. source_url)
   :user/locale (or locale "")
   :user/name (or name "")
   :user/access-token (or access_token "")
   :user/access-secret (or access_secret "")

   :user/source (case source
                       "facebook" :user.source/facebook
                       "twitter" :user.source/twitter)

   :user/source-id source_id

   :user/token (UUID/fromString token)
   :user/status (case status
                  "normal" :user.status/normal
                  "pro" :user.status/pro)

   :user/access-expires (or access_expires 0)
   :user/last-name (or last_name "")
   :user/gender (case gender
                  "male" :user.gender/male
                  "female" :user.gender/female)

   :user/is-subscribed (or is_subscribed false)
   :user/created-at (or created_at (Date.))})

Yes, it looks ugly a bit annoying, but you have to write something like this for every table your have.

Here is the code to load a table into Datomic:

(->> "users.edn" slurp read-string (map pg-user-to-datomic) transact!)

Before we go further, let’s discuss some important notes on importing the data.

Avoid nils

Datomic does not support nil values for attributes. When you do not have a value for an attribute, you should either skip it or pass an empty value: a zero, an empty string, etc. That’s why the most of expressions have (or "") at the end of threading macro.

Shrink your tables

Migrating to the new datastore backend is a good chance to refactor your schema. For those who has spent years working with relational database it is not a secret that typical SQL applications suffer from lots of tables. In SQL, it is not enough to keep just “entities” tables: users, orders, etc. Often, you need to associate a product with colors, a blog post with tags or a user with permissions. That leads to product_colors, post_tags and other bridge tables. You join them in a query to “go through” from a user to their orders, for example.

Datomic is free from bridge tables. It supports reference attributes that are linked to any other entity. In addition, each attribute may carry multiple values. For example, if we want to link a blog post with a set of tags, we’d rather declare the following schema:

[
 ;; Tag

 {:db/ident       :tag/name
  :db/valueType   :db.type/string
  :db/cardinality :db.cardinality/one
  :db/unique      :db.unique/identity}

 ;; Post

 {:db/ident       :post/title
  :db/valueType   :db.type/string
  :db/cardinality :db.cardinality/one}

 {:db/ident       :post/text
  :db/valueType   :db.type/string
  :db/cardinality :db.cardinality/one}

 {:db/ident       :post/tags
  :db/valueType   :db.type/ref
  :db/cardinality :db.cardinality/many}
]

In Postgres, you will need post_tags bridge table with post_id and tag_id foreign keys. In datomic, you simply pass a vector of IDs in :post/tags field when creating a post.

Migrating to Datomic is a great chance to get rid of those tables.

Use enums

Both Postgres and Datomic provide support of enum types. A enum type is a set of values. An instance of enum type may have only one of those values.

In Postgres, I use enum types a lot. They are fast, reliable and provide strong consistency of you data. For example, if you have an order with possible “new”, “pending” and “paid” states, please don’t use varchar type for that. Somehow you may write something wrong there, for example mix up the register or make a misprint. So you’d better to declare the schema as follows:

create type order_state as enum (
  'order_state/new',
  'order_state/pending',
  'order_state/paid'
);

create table orders (
  id serial primary key,
  state order_state not null default 'order_state/new'::order_state,
  ...
);

Now you cannot submit an unknown state for an order.

Although Postgres enums are great, JDBC library makes our life a bit more difficult by forcing us to wrap enum values into PGObject when querying or inserting data. For example, to submit a new state for an order, you cannot just pass a string "order_state/paid". You’ll get an error saying you are trying to submit a string for order_state type column. So you have to wrap your string into a special object:

(defn get-pg-obj [type value]
  (doto (PGobject.)
    (.setType type)
    (.setValue value)))

(def get-order-state
  (partial get-pg-obj "order_state"))

;; now, composing parameters for a query
{:order_id 42
 :state (get-order-state "order_state/paid")}

Another disadvantage here is inconsistency between select and insert queries. When you just read the data, you get the enum value as a string. But when you pass a enum as a parameter, you still need to wrap it with PGObject. That is a bit annoying.

Datomic also has nice support of enums. There is no a special syntax for them. Enums are special attributes that do not have values but only names. Above, I have already highlighted them:

[
 {:db/ident :user.gender/male}
 {:db/ident :user.gender/female}

 {:db/ident :user.source/twitter}
 {:db/ident :user.source/facebook}
]

Later, you may reference a enum value passing just a keyword :user.source/twitter. It’s quite simple, fast and keeps your database consistent.

JSON data

Personally, I try to avoid using JSON in Postgres as long as it is possible. Adding JSON fields everywhere turns your Postgres installation into MongoDB. It becomes quite easy to make a mistake or corrupt the data and fall into a situation when one half or your JSON data has a particular key and the rest half does not.

Sometimes you really need to keep JSON in your DB. A good example might be Paypal Instant Notifications. These are HTTP requests that Paypal sends to your server when a customer buys something. IPN’s body keeps about 30 fields and its structure may vary depending on transaction type. Splitting that data into separate fields and storing all of them across separate columns will be a mess. A solution will be to fetch only the most sensible ones (date, email, sum, order number) and write the rest data into a jsonb column. Then, once you need to fetch any additional information from an IPN, for example a tax sum, you may query it as well:

select
  data->'tax_sum'::numeric as tax
from
  ipn
where
  order_number = '123456';

In Datomic, there is no JSON type for attributes. I’m not sure I made a proper decision, but I just put those JSON data into a text attribute. Sure, where is no a way to access separate fields in a datalog query or apply roles to them. But at least I can restore the data when selecting a single entity:

;; local handler to parse JSON with keywords in keys
(defn parse-json [value]
  (json/parse-string v true))

(defn get-last-ipn [user-id]
  (let [query '[:find [(pull ?ipn [*]) ...]
                :in $ ?user
                :where
                [?ipn :ipn/user ?user]]

        result (d/q query (d/db @*conn) user-id)]

    (when (not-empty result)
      (let [item (last (sort-by :ipn/emitted-at result))]
        (update item :ipn/data parse-json)))))

Foreign keys

In RDBS, a typical table has auto-incremental id field that marks a unique number of that row. When you need to refer to another table, an order or a user’s profile, you declare a foreign key that just keeps a value for those id. Since they are auto-generated, you should never bother on their real values, but only consistency.

In Datomic, you do not have possibility to have auto-incremented values. When you import your data, it’s important to handle foreign keys (or references in terms of Datomic) properly. During the import, we populate :<entity>/pg-id field that holds the legacy Postgres value. Once you import a table with foreign keys, you may resolve a reference as follows:

{... ;; other order fields
 :order/user [:user/pg-id user_id]}

A reference attribute may be represented as vector of two where the first value is an attribute name and the second is its value.

For new entities created in production after migration to Datomic, you do not need to submit .../pg-id value. You may either delete it (retract) once the migration process has been finished or just keep it in the database as an indicator that marks legacy data.

Update the code

This step would be the most boring, I believe. You need to scan the whole project and fix those fragments where you access the data from the database.

Since it is a good practice to prepend attributes with a namespace, the most common change would be attribute renaming I believe:

;; before
(println (:name user))

;; after
(println (:user/name user))

You will face less problems by organizing special functions that wraps the underlying logic. A good example might be to add get-user-by-id, get-orders-by-user-id and so on.

If you use HugSQL or YeSQL Clojure libraries than you already have such functions created dynamically from *.sql files. That is quite better than having naked SQL everywhere. Porting such a project to Datomic will be much easier.

HTML templates

Another dull step that cannot be automated is to scan your Selmer templates (if you have them in your project, of course) and to update those fragments where you touch entities’ attributes. For example:


;; before
<p>{{ user.first_name}} {{ user.last_name}}</p>

;; after
<p>{{ user.user/first-name}} {{ user.user/last-name}}</p>

You may access nested entities as well. Imagine a user has a reference to their social profile:


<p>{{ user.user/profile.profile/name }}</p> ;; "twitter", "facebook" etc

Datomic encourages us to use enums which values are just keywords. Sometimes, you need to implement case...then pattern in your Selmer template and render any content depending on enum value. This may be a bit tricky since Selmer does not support keyword literals. In the example above, a user has :user/source attribute that references a enum with possible values :user.source/twitter or :user.source/facebook. Here is how I figured out switching on them:

{% ifequal request.user.user/source.db/ident|str ":user.source/twitter" %}
  <a href="https://twitter.com/{{ user.user/username }}">Twitter page</a>
{% endifequal %}
{% ifequal request.user.user/source.db/ident|str ":user.source/facebook" %}
  <a href="{{ user.user/profile-url }}">Facebook page</a>
{% endifequal %}

In the example above, we have to turn a keyword into a string using |str filter to compare both values as strings.

To find all the Selmer variables or operators in Selmer, just grep your templates folder by {{ or {% literals.

Remove JDBC/Postgres

Now that your project is Datomic-powered and does not need JDBC drivers anymore, you may either remove them from the project or at least decrease them to the dev dependencies needed only for development purposes.

Scan you project grepping it with jdbc, postgres terms to find those namespaces that still use legacy DB backend. Remove any that still present. Open your root project.clj file, remove jdbc and postgresql packages from :dependencies vector. Ensure you may run and build the application and unit tests as well.

Update unit test

Datomic is a great tool in those aspect you may use in-memory backend when running tests. That makes them pass quite faster and without needing setting up Postgres installation on you machine.

I believe your project is able to detect whether it is in dev, test or prod mode. If it’s not, take a look at Luminus framework. It’s done quite well in that meaning. For each type of environment, you specify its own database URL. For test, it will be in-memory storage.

Using the standard clojure.test namespace, you wrap each test with a fixture that does the following steps:

creates a new database in memory and connects to it;
runs all the schemas against it (migrations);
populates it with predefined test data (users, orders etc; also know as “fixtures”);
runs the test itself
drops the database and closes and disconnects from it.

These steps should be run for each test. In that case, we can guarantee what every test has its own environment and does not depend on other tests. It’s a good practice when a test accepts a fresh installation not being touched by previous tests.

Some preparation steps are:

(ns your-project.test.users
  (:require [clojure.test :refer :all]
            [your-project.db :as db]))

(defn migrate []
  "Loads all the migrations"
  (doseq [file ["schema/0001-init.edn"
                "schema/0002-user-updated.edn"]]
    (db/load-schema file))

(defn load-fixtures []
  "Loads all the fixtures"
  (db/load-schema "fixtures/test-data.edn"))

(defn test-fixture [f]
  (db/init) ;; this function reads the config,
            ;; creates the DB and populates
            ;; the global Datomic connection

  (migrate)
  (load-fixtures)
  (f)         ;; the test is run here
  (db/delete) ;; deletes a database
  (db/stop))  ;; stops the connection

(use-fixtures
  :each
  test-fixture)

Now you may write your tests as well:

(deftest user-may-login
  ...)

(deftest user-proceed-checkout
  ...)

For every test, you will have a database running with all the migrations and test data loaded.

If you still do not have any tests in your project, I urge you to add them soon. Without tests, you cannot be sure you do not break anything when changing the code.

Infrastructure (final touches)

In the final section, I will highlight several important points that relate to the server management.

Setting up production Postgres-driven backend

Running in-memory Datomic database is fun since it really costs nothing. In production, you would better set up more reliable backend. Datomic supports Postgres storage system out from the box. To prepare the database, run the following SQL scripts:

sudo su postgres # switch to postgres user
cd /path/to/datomic/bin/sql
psql < postgres-user.sql
psql < postgres-db.sql
psql datomic < postgres-table.sql

The scripts above create a user datomic with the password datomic, then the database datomic with the owner datomic. The last script creates a special table to keep Datomic blocks.

Please do not forget to change the standard datomic password to something more complicated.

Running the transactor

The following page describes how to run a transactor needed by peer library when you use non-memory data storage. I’m not going to retell it here. Instead, I will share a bit of config to run it automatically using the standard init.d Linux daemon.

Create a file named datomic.conf in your my-project/conf directory. Put a symlink to /etc/init.d/ folder that references that file. Add the following lines into it:

description "Datomic transactor"

start on runlevel startup
stop on runlevel shutdown

respawn

setuid <your user here>
setgid <your group here>

chdir /path/to/datomic

script
    exec bin/transactor sql-project.properties
end script

There, /path/to/datomic is a directory where unzipped Datomic installation is located. sql-project.properties is a transactor configuration file where you should specify your Datomic key sent to your email.

Now that you have put a symlink, try the following commands:

sudo start datomic

status datomic
# datomic start/running, process 5281

sudo stop datomic

Console

Most of RDBS have UI applications to manage the data. Datomic comes with built-in console that is run as web application. Within those console, you can examine the schema, perform queries and transactions.

The following template runs a console:

/path/to/datomic/bin/console -p 8088 <some-alias> <datomic-url-without-db>

In my example, the command is:

$(DATOMIC_HOME)/bin/console -p 8888 datomic \
"datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=xxxxx&password=xxxxx"

Opening a browser at http://your-domain:8888/browser will show you a dashboard.

Some security issues may be mentioned here. The console does not support any login/password authentication, so it is quite unsafe to run the console on production server as-is. Implement at least some of the following steps:

Proxy the console with Nginx. It must not be reachable by itself.
Limit access by a list of IPs. These may be your office or your home only.
There should be only secure SSL connection allowed, no plain HTTP. Let’s encrypt would be a great choice (see my recent post).
Add basic/digest authentication to your Nginx config.

To run a console as a service, create another console.conf file in /etc/init.d/ directory. Use the datomic.conf file as template. Substitute the primary command with those one shown above. Now you can run the console only when you really need it:

sudo start console

Backups

Making backups regularly is highly important. Datomic installation carries a special utility to take care of it. You won’t need to make your backups manually by running pgdump against Postgres backend. Datomic provides a high-level backing up algorithm that performs in several threads. In addition, it supports AWS S3 service as a destination point.

A typical backup command looks as follows:

/path/to/datomic/bin/datomic -Xmx4g -Xms4g backup-db <datomic-url> <destination>

To access AWS servers, you need to export both AWS_ACCESS_KEY_ID and AWS_SECRET_KEY variables first or prepend a command with them. In my case, the full command looks something like:

AWS_ACCESS_KEY_ID=xxxxxx AWS_SECRET_KEY=xxxxxxx \
/path/to/datomic/bin/datomic -Xmx4g -Xms4g backup-db \
datomic:sql://xxxxxxxx?jdbc:postgresql://localhost:5432/datomic?user=xxxxxx&password=xxxxxxx" \
s3://secret-bucket/datomic/2017/07/04

The date part in the end is substituted automatically using $(shell date +\%Y/\%m/\%d) expression in Makefile or the following in bash:

date_path=`date +\%Y/\%m/\%d` # 2017/07/04

Add that command into your crontab file to make backups regularly.

Backups as a way to deploy the data

The good news are backup’s structure does not depend on the backend type. No matter you dump in-memory storage or Postgres cluster, the backup can be restored everywhere as well. It gives us possibility to migrate the data on our local machine, make a backup and then restore it into production database.

Once you finished migrating you data, launch the backup command described above. The backup should go to S3. On the server, run the restore command:

AWS_ACCESS_KEY_ID=xxxxx AWS_SECRET_KEY=xxxxx
/path/to/datomic/bin/datomic -Xmx4g -Xms4g restore-db \
s3://secret-bucket/datomic/2017/07/04 \
"datomic:sql://xxxxxxx?jdbc:postgresql://localhost:5432/datomic?user=xxxx&password=xxxxx"

When everything is done without mistakes, the server will catch the new data.

Conclusion

After spending about a week on moving from Postgres to Datomic I can say it really worths it. Although Datomic does not support most of the Postgres smart features like geo-spatial data or JSON structures, it is much closer to Clojue after all. Since it was made by the same authors, Datomic looks like as a continuation of Clojure. And that is a huge benefit that may overweight disadvantages mentioned above.

Surfing the Internet, I found the next links that may also be helpful:

I hope you enjoyed reading this material. You are welcome to share your thoughts in the commentary section.

← Let's encrypt

Weekly links #26 →

Conan, 13th Jul 2017, link

Great article! I'll be referring to this if we decide to migrate. It's worth poonting out that Datomic is not free from "bridge" or "join" tables. It has ref value types which are equivalent to foreign keys in Postgres, but these kinds of tables serve a different purpose, which is to allow metadata to be added to a relationship.

Say I'm keeping info about how people make tea. Some people do the add-teabag step first, some people do the add-sugar step, and some crazy people even add the milk before the water. So I have Person entities and Step entities, but to store the order in which each person performs the steps, we create a join entity which has refs to the other two (just like a join table in SQL), but also contains an index for the ordering.

In this way we can store attributes about the relationship itself.

Ivan Grishaev, 13th Jul 2017, link , parent

Well, in your example you really need an extra entity to keep an order. It might be named as Recipe to track an order of steps. Then a user will reference his/her own recipe.

Roman, 28th May 2024, link

Hi Ivan. Thank you for the article. I’m worried about creating/deleting db with use-fixture and :each flag. It seems to be inefficient with lots of tests.

What do you think might be a better configuration when I use integrant to start a system and use-fixture with :once flag? I want to avoid start/stop a system for :each test

Ivan Grishaev, 29th May 2024, link

Hi Roman! Yes, starting and stopping the system in each test is slow. Thus, we usually bootstrap the system using the :once keyword. But you still need to cleanup the database from temporary data so each test doesn’t affect the others. Just add the second :each fixture that truncates all the known tables, and that will be enough.

Roman, 1st Jun 2024, link

Ok, thanks a lot But I’m not sure what do you mean by “truncate” in datomic? I see three options right now:

Use of Excision to remove everything with :each
Delete db, create db, apply schema with :each (~60ms on my mac m1)
Use :once but keep track of data collision in the current ns

Can you recommend anything?

Ivan Grishaev, 1st Jun 2024, link

Roman,

I think that Excision is not an option as it’s really slow. The better choice would be run your tests against in-memory version of Datomic, so at the end, the entire DB just gets wiped.

Yes your fixtures will spend some time to apply the schema and fill the database with test data, but on the other hand, it will guarantee isolation of tests.

If you use not in-memory db but any physical backend like Postgres or Mysql, you can just delete the database. Or, for each test, you can generate a random database and then clean them once a day.

Roman, 2nd Jun 2024, link

Thanks so much for the helpful and practical ideas

Table of Contents