• Пять лет блогу

    То, что я давненько ничего не писал, связано с тем, что залип в конкурсе Highload Cup. Две недели ни о чем другом думать не мог, все пытался улучшить решение. И тут вспомнил, что у блога юбилей: первая запись была сделана пять лет назад, 8 августа 2012 года.

    Речь идет о наивной заметке под названием “Рассылка смс в Питоне”. Там я популярно рассказываю, как слать смс через апишку уже несуществующего сервиса. В те дни я работал в читинском Энергосбыте, писал на 1С, Дельфях и Питоне, а лиспы и ФП были влажными мечтами.

    Пять лет по меркам интернета все-таки срок. Много ребят, которых я читал, со временем забили на блоги, не продлили домены. А мне нравится: блог стал настоящим хобби, которое, надеюсь, продлится всю жизнь.

    За прошедшее время я написал 320 постов. Не так много, но я не гонюсь за количеством. Некоторые заметки, первоначально задуманные как пара абацев, разрастались до нескольких экранов.

    В один момент я поймал себя на мысли, что злоупотребляю низкими приемами: жалуюсь на сервисы, высмеиваю тех, кто не согласен со мной. Это работает, но не в том направлении, в котором бы мне хотелось. Теперь стараюсь писать только на важные темы: программирование, образование, карьера, книги.

    Было неприятно видеть, как любимые мной блоги вырождались в комментарии политических новостей. Эту тему я тоже решил пресечь.

    Решил двигаться в сторону англоязычной аудитории. В России на Кложе пишут три с половиной анонимуса, поэтому нет смысла писать о ней на русском.

    Технические заметки на английском порой дают сильный отклик. После публикации статьи о миграции с Постргреса на Датомик в первый же день ко мне зашли 9000 янглоязычных посетителей.

    После нескольких удачных статей про Кложу меня добавили в агрегатор Планета Кложи. Если в посте будет слово “Кложа” на английском, статья попадет в агрегатор. Поэтому если я не хочу, чтобы такое случилось, пишу на русском.

    В планах сделать на сайте теги, рубрикацию и прочие ништяки. Благо, блог статичный на Руби-движке, исходники на Гитхабе, и я знаю, у кого можно накопипастить. А пока вот просто список постов, заслуживающих внимания:

  • Conceptual languages

    This is a short note on what I’m thinking about programming languages.

    Briefly, I reckon some languages as being conceptual. In a conceptual language, every part of it obeys some global idea. This idea forms design of a language, the way it behaves and develops. If affects libraries, community and ecosystem.

    Take Lisp, for example. The idea is, when we store our code as data, it brings incredible possibilities to process code as data or turn the data into code. No other language may offer something like that, only Lisp does.

    In Erlang, every task is solved within a cascade of lightweight processes named actors. Two actors may communicate directly being run on different servers. Actors do not create Unix threads. They are managed by OTP rather than operation system.

    With Haskell, you’ve got quite strong and flexible type system. Types are the most important part of a typical Haskell program. Once it compiles, it will work for sure.

    Clojure, a modern Lisp dialect, provides immutable data structures and powerful abstractions for concurrency.

    Respectively, non-conceptual languages do not have such a global approach. They try to implement as many features as possible to satisfy every domain: OOP, anonymous functions (lambdas), lazy evaluation, etc. As a result, we’ve got everything but nothing: each part of such a language is not as powerful as its analogies form those ones I mentioned above.

    Take Python, for example. Although it has such basic functional blocks as map, reduce and lambdas, programming with it in a functional way would be a mess.

    Every part of classical Javascript is just ugly.

    Java, a language with static type system, allows you to pass Null instead of any object and end up with NPE exception.

    Although conceptual languages are not perfect, they seem to be easier for me to learn because they have some common rules that could not be broken. Say, in Haskell, you just do not have plain Null value. In Clojure, you cannot modify a dictionary, and so on.

    They cannot be substituted with other languages. Really, how can you substitute Lisp or Erlang? There aren’t any alternatives for them.

    I believe, the future is about conceptual languages. To develop AI or distributed systems, we need something more sensible than yet another language with classes and syntactic sugar.

    I’m not sure it could be Lisp, but something that borrows most of its features.

  • Weekly links #28

  • Weekly links #27

    I’ve just returned from EuroClojure, it was amazing! Thank you everybody who brought it to us.

    Here are some interesting links I found on Medium:

  • Берлин. Городские детали

    Побывал на Евро-Кложе в Берлине и хочу рассказать про город. Я не ахти какой путешественник: езжу мало, не особо наблюдателен. Читая Лебедева или Варламова, я долгое время не понимал, зачем они без конца пишут про урны и тротуары. Подумаешь, плитка лучше – и у нас можно пройти!

    А в этот раз меня проперло на городскую среду. Гулял по городу и офигевал с того, как классно устроен город. Еще больше я недоумевал, как раньше не понимал разницы между плохой городской средой и хорошей. Видимо, после тридцати все-таки проснулась черточка.

    Прежде всего, великолепно уложены тротуары. Это может быть плитка, массивные плиты, булыжник или мелкие камешки, но идти по ним и любоваться паттерном можно бесконечно. Все детали пригнаны как в лего, все по паттерну. Когда плитка не квадратная (или если ее кладут под углом), крайние плитки не режут, а изготавливают специальной формы, чтобы стыковаться с соседними поверхностями.

    Далее, нигде нет открытой земли. О вреде открытых клумб те же Варламов с Лебедевым писали сто раз. Любой клочок земли, необходимый растениям, во-первых, утоплен ниже уровня тротуара, а во-вторых, накрыт сеткой или замурован галькой. У нас же с тротуаров и клумб круглый год сыпется грязь и песок, а дворники соскребают обратно на клумбу.

    Шокировало, что нет борюров. Тротуар полностью бесшовный. Меняется размер и паттерн плитки, булыжник чередуется с плитами, но уровень нигде не перепадает. На дорожных переходах, в выездах с прилегающих территорий, рядом с подъездами, словом – нет бордюров. Я прошел в одном направлении восемь кварталов и не встретил ни одного на пути.

    Все дома вровень с землей. Я не знаю, в чем загадка, но в России не бывает дверей без крыльца. Вход в любой магазин, офис или подъезд начинается со ступеней. А рядом пандус с углом в 60 градусов. Почему нельзя вырыть котлован еще на метр глубже? В каких стандартах прописано, что дверь должна быть в метре от земли?

    Это бизнес-центр:

    А это жилой подъезд:

    Если не ясно, в чем вред от бордюров, одолжите инвалидную коляску. Сядьте в нее дома и попробуйте добраться до магазина. Или в обычную коляску уложите ребенка, в нижнюю корзину продукты и вперед через подземный переход, я на вас посмотрю.

    На фотографии ниже видно, что порой действительно случается перепад между тротуаром и зданием. Но оцените, как плавно сделали переход! Высота в 20 сантиметров приходится на длину в 15 метров, угол градусов пять. И как красиво получилось.

    Это тот случай, когда инвалид действительно сможет заехать на коляске, а человек с больным сердцем – подняться без учащения пульса.

    Теперь урны. То ли меня Лебедев укусил, то ли что, но в каждой урне кроется сто важных мелочей, теперь я это понял. В районе, где я жил, урны одинаковы, имеют те же габариты и цвет. Срабатывает визуальное распознавание: чтобы найти урну, достаточно окинуть взором пространство, и где-то обязательно мелькнет оранжевое пятно. В каждой точке видна хотя бы одна урна. Ни разу не было так, чтобы мне понадобилось, а вокруг нет.

    Далее, урна не стоит на земле, а закреплена на столбе на уровне метра. Это позволит увидеть урону на большом расстоянии или в толпе. Отверстие не как у ведра, а меньше и под углом. Здесь тоже умыслел: в такое отверстие не затечет дождь, невозможно заткнуть урну пакетом мусора, как у нас порой делают ленивые уроды.

    В такую урну невозможно что-то кидать, нужно подойти и положить. Это резко снижает желание швырять окурки или бутылки ради понтов. У нас же как: офисный планктон курит на высоком крыльце, где-то внизу ведро. Каждый бросает окурок, это такой вид спорта. Вокруг урны Хи-квадрат-распределение бычков и плевков.

    А попробуй кинь, когда урна на уровне груди. Можно попасть в кого-то и нарваться на неприятности.

    Урны очень аккуратны и чисты, покрашены в оранжевый. В первый раз я принял их за почтовые ящики. Урны никогда не стоят рядом с лавочками, их всегда отделяет метров десять минимум. Пример с автобусной остановкой:

    Странные люди эти немцы, почему не хотят сидеть рядом с урной, как принято у нас? Чтобы справа девушка, а слева мусорное ведро. Удобно же, все рядом.

    В середине дорожных переходов островки безопасности, где пешеход может переждать красный сигнал не рискуя быть сбитым. Островки огорожены бордюрами. Тот редкий случай, когда они действительно нужны.

    Люки преимущественно квадратные. Ливневые решетки спроектированы так, чтобы туда не попало колесо велика или коляски. Направление щелей для воды перпендикулярно движению на этом участке. Если место проходное, делают просто круглые дыры.

    Везде велодорожки и велопарковки, на великах ездят от мала до велика, даже беременные и пенсионеры. У меня нет статистики, но полагаю, это резко снижает дорожный трафик. К примеру, на небольшом пятачке возле магазина припарковано 15 велосипедов. Теперь представим, что каждый приехал на машине. Сколько места понадобиться? В десять раз больше, плюс трафик.

    Дорожные пробки смехотворны по сравнению с нашими. 26 километров по городу от центра до аэропорта – свободно, остановки только на светофорах. В будни после обеда, Карл.

    Многие женщины ходят без лифчика. Не поголовно, конечно, но больше, чем у нас. Часто видно татуировки. Это не фрики с пирсингом, просто небольшие узоры или надписи. Наряжаются кто как хочет, порой провакационно, всем пофиг.

    И это я только про урны и про тратуары. А еще же парки, скверы, дворы и миллион других вещей. Можно днями гулять и выпитывать. В плане городской среды Берлин на высоте.

  • Educational startups

    Today, educational startups are everywhere. The number of them is growing daily. There hardly might be a week when no new HTML/CSS/Javascript educational site appears on the Internet.

    Sorry if I offend someone, but I don’t believe in remote education through special sites that offer video lessons. More precisely, there might be some effect of course but quite poorer than standard education with people and books.

    About ten years ago, the Internet was full of paid video courses on DVDs. These were home-made lessons made by students where they spoke on basics of PHP+HTML. Today, the startups remind me those DVDs. The only difference is you do not need to order a disk anymore but watch lessons online right after you have paid.

    I’m not running an educational site. I also don’t want to reduce someone’s reputation or business. Let’s just discuss some points that seem important to me.

    Let’s switch to a browser right now and google for “educational startup” phrase. For me, the first link is “Are Education Startups The New Dot Com?” In that article, there is another link titled “Edtech Is The Next Fintech”. Read them, they highlight my concerns.

    Below, there is a bunch of list-like posts titled “N best educational startups”:

    And so on. It was only the two first pages. Let’s check for Angel List then:

    14,615 COMPANIES    2,646 INVESTORS    19,127 FOLLOWERS    2,675 JOBS
    

    14615 companies. There are about 250 countries in the world. I’ll take rather 200 due to wars or lack of development in some of them. Dividing a number of educational startups on the number of counties gives 14615 / 200 = 73 per country.

    Don’t you think it’s bit more than we need?

    Look, we have about 3 search engines to find anything on the Internet. There are probably 2-5 mobile operators in your country. We’ve got one Wikipedia. But there are 14615 educational companies who want to make money on educational market.

    I really doubt they are about real education

    In fact, there is one goal that a startup tries to reach. It forces users to buy lessons making them think it could really raise their level. Site developers bring challenging factors to manipulate users. These are top-ranked tables, a scale of education decorated as a route with milestones and a crown or a goblet in the end. Any startup brings a chat-room to let users share their success to feel proud.

    I’ve seen several educational sites and must confess they don’t bring any revolutionary ideas. Yes, some of them provide smart environment when you have an IDE into a browser. But the truth is there are only two ways of education that really work. These are people and books.

    Sometimes, people ask me how to improve their experience. Usually they’ve spent some time solving educational tasks but they cannot start a real project. I always answer the same: join a project where professionals work. Being in a team of high-qualified people, you will grow up in a quite short term.

    Reading books helps you to systematize fragments of random knowledge that you’ve got from Twitter, StakOverflow, Wired and so on. Everything that educational sites try to sell you has already been published in books. Really, there are tons of books about PHP, HTML, Java, Python or whatever. You may borrow them for free.

    On the internet, you can by any used book for several dollars. A lesson is usually paid for subscription. In a month, you will lose your access unless you pay again. The book will be yours forever.

    Yes, reading a book is a bit more difficult than watching a video. It’s hard and boring. You even need to interpret code in mind. There is no widgets and chats. Although, it really works.

    A book is really important today since our knowledge is not arranged. We are getting random fragments missing important details. I may compare a book with an asphalt paver that moves slowly but removes any roughness and holes in your mind.

    People and books are the only way to learn

    Recently, I finished reading “Web-development with Clojure” book. Although I thought I knew Clojure pretty well, the book turned into discovery for me. I’ve got plenty of hints and technics that I’m willing to try in the future.

    Education in wide meaning is hard process. Educational startups make you think it has become easy. That is wrong.

    Once I finished reading my first Clojure book, I could solve any primitive task like sorting or finding min/max element in a list using recursion. But it took huge effort to write an HTTP server that manages database connection, renders templates, writes logs, calls Twitter API and so on.

    The reason was all the lessons and tutorials miss some special knowledge about how to manage with complexity. Hot to connect parts of your application together.

    A computer is not a best tool for education. It’s a great tool but also an entertainment center. Ideally, you should turn off all the messagers, close YouTube, Twitter… On your laptop, there a lot of things may interrupt you. Being tired, your brain can always find a pretext to switch on something funny.

    IT-education is not about coding, it also includes negotiations. Sometimes, you might be 100% right but would not be able to deliver your ideas. You may offend your customer with non-suitable manner of speaking. Only people who work close to you may correct that mistakes, not online lessons. By the way, I have never seen a lesson that highlights anything from those mentioned above.

    Of course, I do not blame startups for spreading across the Internet. The reason it happens is a lack of professionals. This is reaction of market. Did you want more programmers? Here you’ve got them. Young people know that IT companies are interested in hiring more people. Oftenly, watching just several videos enough to get PHP/Wordpress job paid in US dollars.

    Universities are not in charge of your further employment. There is a common situation when you’ve just finished the last course but cannot find a job because the industry changes so fast. Nor the government will support you. Education seems to be the last thing they are interested in. Today, my country wages two wars, plays geopolitics and eradicates imported cheese while education level goes down-wise year after year.

    Educational sites might be a hope. But they devalue the real meaning of education.

    The process of self-education is hard. Only you is interested in it, not startups

    OK I’m about to finish and I’ve got a question. I heard, the most powerful language is Lisp. At least Alan Kay said so (a guy who invented OOP). So did Stallman, Dijkstra and other great programmers. Do you know any educational site which offers Lisp lessons?

    I googled for a bit to find any. Nothing on Coursera – the most known lesson hub. Nothing on Netology. At least Hexlet has SICP section where they retell it in Russian (see my note on books).

    No, they won’t teach you the most powerful language. So what is the final goal then? Do we need more Python or Java programmers? We have already got lots of them but we still suffer from weird interfaces and buggy applications. It won’t work in such a way.

    Let me summarize.

    1. I’m not against someone’s business. I even believe some people who watched those videos really made progress in their career. Maybe, they could not find proper books or their brain feels good with such a way of education.
    2. But I’d like to name things properly. Educational startups are the business by themselves. The goal of the business is to make money but not to make you cleverer. Instead, the longer you keep your monthly subscription active, the better a startup develops.
    3. There is definitely overheat on the educational market. It seems to be like a bubble.
    4. Startups tell you the education is easy and funny. It’s not.
    5. People and books help a lot. Online lessons – well, yes but quite less.
  • Weekly links #26

  • Migration from Postgres to Datomic

    Recently, I migrated my Clojure-driven pet project from PostgreSQL to Datomic. This is Queryfeed, a web application to fetch data from social networks. I’ve been running it for several years considering it as a playground for some experiments. Long ago, it was written in Python, then I ported it to Clojure.

    It was a great experience when I just finished reading “Clojure for True and Brave” book and was full of desire to apply new knowledge to something practical rather than solving Euler problems in vain.

    This time, I’ve made another effort to switch the database backend to Datomic. Datomic is a modern, fact-driven database developed in Cognitect to be used in conjunction with Clojure. It really differs for classical RDBS such as MySQL or PostgreSQL. For a long time, I’ve been thinking whether I should try it. Meanwhile, more and more Clojure/conj talks have been publishing on YouTube. At my work, we use vast PostgreSQL database and the code base is tied to close to it. There is no an option to perform a switch on weekends. So I decided to port my pet project to Datomic in my spare time.

    Surely, before doing this, I googled for a while and was really wondered about how few information I found on the Internet. There were just three posts that did not cover the subject in details. So I decided to share my experience here. Maybe it would help somebody with their migration duties.

    Of cause, I cannot guarantee the steps described below will meet your requirements as well. Each database is different, so it’s impossible to develop a final tool that could handle all the cases. But at least you may borrow some of those.

    Table of Contents

    Introduction

    Before we begin, let’s talk about what is the reason to switch to Datomic. That question cannot be answered just in one or two points. Before Datomic, I’ve been working with PostgreSQL for several years and reckon it as a great software. There is no such a task that Postgres cannot deal with. Here are just some of them:

    • streaming replication, smart sharding;
    • geo-spatial data, PostGIS;
    • full-text search, trigrams;
    • JSON(b) data structures;
    • typed arrays;
    • recursive queries;
    • and tons of other benefits.

    So if Postgres is really so great, why switching then? In my point of view, it brings the following benefits into a project:

    1. Datomic is simple. In fact, it has only two operations: read (querying) and write (transaction).
    2. It supports joins as well. Once you have a reference, it can be resolved into a nested map. References may be recursive. In PostgreSQL or any other RDBS, you always have a plain result with possibly duplicated rows. The ORM logic that may deal with parsing raw SQL response might be too complicated to understand.
    3. Datomic was developed in the same terms as Clojure was. These are simplicity, immutability and declarative style. Datomic shares Clojure’s values.
    4. It accumulates changes through time like Git or any other control version system. With Datomic, you may always roll-back in time to get a history of an order or collect audit logs.

    Let’s highlight some general steps we should pass through to complete the migration. These are:

    • dump you Postgres data;
    • add Datomic into your project;
    • load the data into Datomic;
    • rewrite the code that operates on data;
    • rewrite your HTML templates;
    • update or add unit tests;
    • remote JDBC/Postgres from your project;
    • setup infrastructure (backups, console, etc)

    As you see, it is not as simple as it could be thought even for a small project. In my case, migrating Queryfeed took about a week working by nights. It includes:

    • two days to read the whole Datomic documentation;
    • one day to migrate the data;
    • two days to fix the business logic code and templates;
    • two days to deploy everything to the server.

    Regarding to the documentation, I highly recommend you to read it first before doing anything. Please do not rely on random Stack Overflow snippets. Datomic is completely different than classical SQL databases, so your long-term Postgres or MySQL experience won’t work.

    Quick tip here, since it could be difficult to read lots of text from a screen, I just download any page I wish to read into my Kindle using the official Amazon extension for Chrome. The paper appears on my Kindle in a minute and I read it.

    Once you’ve finished with the docs, feel free to the next step: dumping your PostgreSQL data.

    Dump Postgres database

    Exporting you data into a set of files won’t be so difficult I believe. I may guess your project has projectname.db module that handles the most of database stuff. It should have clojure.java.jdbc module imported and *db* or db-spec variables declared. Your goal is for every table you have in the database, run a query something like select * from <table_name> against it and save the result into a file.

    What file format to use depends on your own preferences, but I highly recommend the standard EDN files rather than JSON, CSV or whatever. The main point in favor of EDN is it handles extended data types such as dates and UUIDs. In my case, every table has at least one date field, for example created_at that is not null and is set with the current time automatically. When using JSON or YAML, the dates will be just strings so you need to write extra code to restore a native java.util.Date class from a string. So are unique identifiers, UUIDs.

    In addition, since EDN files represent native Clojure data structures, you don’t need to add org.clojure/data.json dependency into your project. Everything can be made with out-from-the-box functions. The next snippet dumps all the users from your Postgres database into a users.edn file:

    
    (def *db* {... your JDBC spec map...})
    
    (def query (partial jdbc/query db-spec))
    
    (spit "users.edn" (with-out-str (-> "select * from users" query prn)))
    
    

    And that is! With only one line of code, you’ve just dumped the whole table into a file. Repeat it several times substituting a name of an *.edn file and a table. If you have many tables, wrap it with a function:

    (defn dump [table]
      (spit (format "%s.edn" table)
        (with-out-str (-> (format "select * from %s" table)
                          query
                          prn))))
    

    Then run it against a vector of table names but not a set since an order is important. For example, if you have a user has a foreign key to orders table, it should be loaded first.

    To check whether your dump is correct, try to restore it from a file as follows:

    (-> "users.edn" slurp read-string first)
    

    Again, it is so simple to perform such things in Clojure. Within one line of code, you have just read the file, restored the Clojure data from it and took the first map from a list. In REPL, you should see something like:

    {:id 1
     :name "Ivan"
     :email "test@test.com"
     ... other fields
     }
    

    That means the dump step was done as well.

    Adding Datomic into your project

    Here, I won’t discuss on that step so long since it is highlighted as well in the official documentation. Briefly, you need to:

    1. register on Datomic site, it is free;
    2. set up your GPG credentials;
    3. add Datomic repository and the library into your project;
    4. (optional) if you use Postgres-driven backend for Datomic, create a new Postgres database using SQL scripts from sql folder. Then run a transactor.

    Below, here is a brief example of my setup:

    ;; project.clj
    (defproject my-project "0.x.x"
      ...
      :repositories {"my.datomic.com" {:url "https://my.datomic.com/repo"
                                       :creds :gpg}}
    
      :dependencies [...
                     [com.datomic/datomic-pro "0.9.5561.50"]
                     ...]
      ...)
    

    Run lein deps to download the library. You will be probably prompted to input your GPG key.

    A quick try in REPL:

    (require '[datomic.api :as d])
    (def conn (d/connect "datomic:mem://test-db"))
    

    Loading the data into Datomic

    In this step, we will load the previously dumped data into your Datomic installation.

    First, we need to prepare the schema before loading the data. A schema is a collection of attributes. Each attribute by itself is a small piece of information, for example a :user/name attribute keeps a string value and indicates a user’s name.

    An entity is a set of attributes linked together by system identifier. Thinking in RDBS terms, an attribute is a DB column whereas an entity is a row of a table. That really differs Datomic from such schema-less databases as MongoDB for example. In Mongo, every entity may have any structure you wish even across the same collection. In Datomic, you cannot write a string value into a number or a boolean into a date. One note, an entity may own an arbitrary number of attributes.

    For example, in Postgres if you did not set default values for a column and it is not null, you just cannot skip it when inserting a row. In Datomic, you may submit as many attributes as you want when performing a transaction. Imagine we have a user model with ten attributes: a name, email, etc. When creating a user, I may pass only a name and there won’t be an error. So pay attention you submit all the required attributes.

    Datomic schema is represented by native Clojure data structures: maps, keywords and vectors. That’s why they are stored in EDN files as well. A typical initial schema for fresh Datomic installation may look as follows:

    [
     ;; Enums
     {:db/ident :user.gender/male}
     {:db/ident :user.gender/female}
    
     {:db/ident :user.source/twitter}
     {:db/ident :user.source/facebook}
    
     ;; Users
    
     {:db/ident       :user/pg-id
      :db/valueType   :db.type/long
      :db/cardinality :db.cardinality/one
      :db/unique      :db.unique/identity}
    
     {:db/ident       :user/source
      :db/valueType   :db.type/ref
      :db/cardinality :db.cardinality/one
      :db/isComponent true}
    
     {:db/ident       :user/source-id
      :db/valueType   :db.type/string
      :db/cardinality :db.cardinality/one}
     ...
    ]
    

    The first four ones are special attributes that are proposed as enum values. I will discuss more on them later.

    Again, check for the official documentation that describes schema usage.

    Now that we prepared a schema, let add some boilerate code in our db namespace:

    (ns project.db
      (:require [clojure.java.io :as io]
                [datomic.api :as d]))
    
    ;; in-memory database for test purposes
    (def db-uri "datomic:mem://test-db")
    
    ;; global Datomic connection wrapped in atom
    (def *conn (atom nil))
    
    ;; A function to initiate the global state
    (defn init-db []
      (d/create-database db-uri)
      (reset! *conn (d/connect db-uri)))
    
    ;; reads an EDN file located in `resources` folder
    (defn read-edn
      [filename]
      (-> filename
          io/resource
          slurp
          read-string))
    
    ;; reads and loads a schema from EDN file
    (defn load-schema
      [filename]
      @(d/transact @*conn (read-edn filename)))
    

    I hope the comments highlight the meaning of the code as well. I just declared a database URL, a global connection, a function to connect to the DB and two helper functions.

    The first function rust reads a EDN file and returns a data structure. Since our files a stored in resources folder, there is a io/resource wrapper here in the threading chain.

    The second function also read a file but also performs a Datomic transaction passing data as a schema.

    The db-uri variable is represented with URL-like string. Currently, we use in-memory storage for test purposes. I really doubt you can load the data directly to SQL-driven storage without errors so let’s just practice for a while. Later, when the import step will be ready, we will just switch db-uri variable to production-ready URL.

    With the code above, we are ready to load the schema. I put my initial schema into a file resources/schema/0001-init.edn so I may load it as follows:

    (init-db)
    (load-schema "schema/0001-init.edn")
    

    Now that we have a schema, let’s load the previously saved Postgres data. We need to add more boilerate code. Unfortunately, there cannot be a common function that may map your Postgres fields into Datomic attributes. The functions to convert your data might look a bit ugly, but they are one-time-purpose only so please don’t mind.

    For each EDN file that contains data of a specific table, we should:

    1. read a proper file, get a list of maps;
    2. convert each PostgreSQL map into Datomic map;
    3. perform Datomic transaction passing a vector of Datomic maps.

    Below, here is an example of my pg-user-to-datomic function that accepts a Postgres-driven map and turns it into a set of Datomic attributes:

    (defn pg-user-to-datomic
      [{:keys [email
               first_name
               timezone
               source_url
               locale
               name
               access_token
               access_secret
               source
               token
               status
               id
               access_expires
               last_name
               gender
               source_id
               is_subscribed
               created_at]}]
    
      {:user/pg-id id
       :user/email (or email "")
       :user/first-name (or first_name "")
       :user/timezone (or timezone "")
       :user/source-url (URI. source_url)
       :user/locale (or locale "")
       :user/name (or name "")
       :user/access-token (or access_token "")
       :user/access-secret (or access_secret "")
    
       :user/source (case source
                           "facebook" :user.source/facebook
                           "twitter" :user.source/twitter)
    
       :user/source-id source_id
    
       :user/token (UUID/fromString token)
       :user/status (case status
                      "normal" :user.status/normal
                      "pro" :user.status/pro)
    
       :user/access-expires (or access_expires 0)
       :user/last-name (or last_name "")
       :user/gender (case gender
                      "male" :user.gender/male
                      "female" :user.gender/female)
    
       :user/is-subscribed (or is_subscribed false)
       :user/created-at (or created_at (Date.))})
    

    Yes, it looks ugly a bit annoying, but you have to write something like this for every table your have.

    Here is the code to load a table into Datomic:

    (->> "users.edn" slurp read-string (map pg-user-to-datomic) transact!)
    

    Before we go further, let’s discuss some important notes on importing the data.

    Avoid nils

    Datomic does not support nil values for attributes. When you do not have a value for an attribute, you should either skip it or pass an empty value: a zero, an empty string, etc. That’s why the most of expressions have (or "") at the end of threading macro.

    Shrink your tables

    Migrating to the new datastore backend is a good chance to refactor your schema. For those who has spent years working with relational database it is not a secret that typical SQL applications suffer from lots of tables. In SQL, it is not enough to keep just “entities” tables: users, orders, etc. Often, you need to associate a product with colors, a blog post with tags or a user with permissions. That leads to product_colors, post_tags and other bridge tables. You join them in a query to “go through” from a user to their orders, for example.

    Datomic is free from bridge tables. It supports reference attributes that are linked to any other entity. In addition, each attribute may carry multiple values. For example, if we want to link a blog post with a set of tags, we’d rather declare the following schema:

    [
     ;; Tag
    
     {:db/ident       :tag/name
      :db/valueType   :db.type/string
      :db/cardinality :db.cardinality/one
      :db/unique      :db.unique/identity}
    
     ;; Post
    
     {:db/ident       :post/title
      :db/valueType   :db.type/string
      :db/cardinality :db.cardinality/one}
    
     {:db/ident       :post/text
      :db/valueType   :db.type/string
      :db/cardinality :db.cardinality/one}
    
     {:db/ident       :post/tags
      :db/valueType   :db.type/ref
      :db/cardinality :db.cardinality/many}
    ]
    

    In Postgres, you will need post_tags bridge table with post_id and tag_id foreign keys. In datomic, you simply pass a vector of IDs in :post/tags field when creating a post.

    Migrating to Datomic is a great chance to get rid of those tables.

    Use enums

    Both Postgres and Datomic provide support of enum types. A enum type is a set of values. An instance of enum type may have only one of those values.

    In Postgres, I use enum types a lot. They are fast, reliable and provide strong consistency of you data. For example, if you have an order with possible “new”, “pending” and “paid” states, please don’t use varchar type for that. Somehow you may write something wrong there, for example mix up the register or make a misprint. So you’d better to declare the schema as follows:

    create type order_state as enum (
      'order_state/new',
      'order_state/pending',
      'order_state/paid'
    );
    
    create table orders (
      id serial primary key,
      state order_state not null default 'order_state/new'::order_state,
      ...
    );
    

    Now you cannot submit an unknown state for an order.

    Although Postgres enums are great, JDBC library makes our life a bit more difficult by forcing us to wrap enum values into PGObject when querying or inserting data. For example, to submit a new state for an order, you cannot just pass a string "order_state/paid". You’ll get an error saying you are trying to submit a string for order_state type column. So you have to wrap your string into a special object:

    (defn get-pg-obj [type value]
      (doto (PGobject.)
        (.setType type)
        (.setValue value)))
    
    (def get-order-state
      (partial get-pg-obj "order_state"))
    
    ;; now, composing parameters for a query
    {:order_id 42
     :state (get-order-state "order_state/paid")}
    

    Another disadvantage here is inconsistency between select and insert queries. When you just read the data, you get the enum value as a string. But when you pass a enum as a parameter, you still need to wrap it with PGObject. That is a bit annoying.

    Datomic also has nice support of enums. There is no a special syntax for them. Enums are special attributes that do not have values but only names. Above, I have already highlighted them:

    [
     {:db/ident :user.gender/male}
     {:db/ident :user.gender/female}
    
     {:db/ident :user.source/twitter}
     {:db/ident :user.source/facebook}
    ]
    

    Later, you may reference a enum value passing just a keyword :user.source/twitter. It’s quite simple, fast and keeps your database consistent.

    JSON data

    Personally, I try to avoid using JSON in Postgres as long as it is possible. Adding JSON fields everywhere turns your Postgres installation into MongoDB. It becomes quite easy to make a mistake or corrupt the data and fall into a situation when one half or your JSON data has a particular key and the rest half does not.

    Sometimes you really need to keep JSON in your DB. A good example might be Paypal Instant Notifications. These are HTTP requests that Paypal sends to your server when a customer buys something. IPN’s body keeps about 30 fields and its structure may vary depending on transaction type. Splitting that data into separate fields and storing all of them across separate columns will be a mess. A solution will be to fetch only the most sensible ones (date, email, sum, order number) and write the rest data into a jsonb column. Then, once you need to fetch any additional information from an IPN, for example a tax sum, you may query it as well:

    select
      data->'tax_sum'::numeric as tax
    from
      ipn
    where
      order_number = '123456';
    

    In Datomic, there is no JSON type for attributes. I’m not sure I made a proper decision, but I just put those JSON data into a text attribute. Sure, where is no a way to access separate fields in a datalog query or apply roles to them. But at least I can restore the data when selecting a single entity:

    ;; local handler to parse JSON with keywords in keys
    (defn parse-json [value]
      (json/parse-string v true))
    
    (defn get-last-ipn [user-id]
      (let [query '[:find [(pull ?ipn [*]) ...]
                    :in $ ?user
                    :where
                    [?ipn :ipn/user ?user]]
    
            result (d/q query (d/db @*conn) user-id)]
    
        (when (not-empty result)
          (let [item (last (sort-by :ipn/emitted-at result))]
            (update item :ipn/data parse-json)))))
    

    Foreign keys

    In RDBS, a typical table has auto-incremental id field that marks a unique number of that row. When you need to refer to another table, an order or a user’s profile, you declare a foreign key that just keeps a value for those id. Since they are auto-generated, you should never bother on their real values, but only consistency.

    In Datomic, you do not have possibility to have auto-incremented values. When you import your data, it’s important to handle foreign keys (or references in terms of Datomic) properly. During the import, we populate :<entity>/pg-id field that holds the legacy Postgres value. Once you import a table with foreign keys, you may resolve a reference as follows:

    {... ;; other order fields
     :order/user [:user/pg-id user_id]}
    

    A reference attribute may be represented as vector of two where the first value is an attribute name and the second is its value.

    For new entities created in production after migration to Datomic, you do not need to submit .../pg-id value. You may either delete it (retract) once the migration process has been finished or just keep it in the database as an indicator that marks legacy data.

    Update the code

    This step would be the most boring, I believe. You need to scan the whole project and fix those fragments where you access the data from the database.

    Since it is a good practice to prepend attributes with a namespace, the most common change would be attribute renaming I believe:

    ;; before
    (println (:name user))
    
    ;; after
    (println (:user/name user))
    

    You will face less problems by organizing special functions that wraps the underlying logic. A good example might be to add get-user-by-id, get-orders-by-user-id and so on.

    If you use HugSQL or YeSQL Clojure libraries than you already have such functions created dynamically from *.sql files. That is quite better than having naked SQL everywhere. Porting such a project to Datomic will be much easier.

    HTML templates

    Another dull step that cannot be automated is to scan your Selmer templates (if you have them in your project, of course) and to update those fragments where you touch entities’ attributes. For example:

    
    ;; before
    <p>{{ user.first_name}} {{ user.last_name}}</p>
    
    ;; after
    <p>{{ user.user/first-name}} {{ user.user/last-name}}</p>
    
    

    You may access nested entities as well. Imagine a user has a reference to their social profile:

    
    <p>{{ user.user/profile.profile/name }}</p> ;; "twitter", "facebook" etc
    
    

    Datomic encourages us to use enums which values are just keywords. Sometimes, you need to implement case...then pattern in your Selmer template and render any content depending on enum value. This may be a bit tricky since Selmer does not support keyword literals. In the example above, a user has :user/source attribute that references a enum with possible values :user.source/twitter or :user.source/facebook. Here is how I figured out switching on them:

    
    {% ifequal request.user.user/source.db/ident|str ":user.source/twitter" %}
      <a href="https://twitter.com/{{ user.user/username }}">Twitter page</a>
    {% endifequal %}
    {% ifequal request.user.user/source.db/ident|str ":user.source/facebook" %}
      <a href="{{ user.user/profile-url }}">Facebook page</a>
    {% endifequal %}
    
    

    In the example above, we have to turn a keyword into a string using |str filter to compare both values as strings.

    To find all the Selmer variables or operators in Selmer, just grep your templates folder by {{ or {% literals.

    Remove JDBC/Postgres

    Now that your project is Datomic-powered and does not need JDBC drivers anymore, you may either remove them from the project or at least decrease them to the dev dependencies needed only for development purposes.

    Scan you project grepping it with jdbc, postgres terms to find those namespaces that still use legacy DB backend. Remove any that still present. Open your root project.clj file, remove jdbc and postgresql packages from :dependencies vector. Ensure you may run and build the application and unit tests as well.

    Update unit test

    Datomic is a great tool in those aspect you may use in-memory backend when running tests. That makes them pass quite faster and without needing setting up Postgres installation on you machine.

    I believe your project is able to detect whether it is in dev, test or prod mode. If it’s not, take a look at Luminus framework. It’s done quite well in that meaning. For each type of environment, you specify its own database URL. For test, it will be in-memory storage.

    Using the standard clojure.test namespace, you wrap each test with a fixture that does the following steps:

    1. creates a new database in memory and connects to it;
    2. runs all the schemas against it (migrations);
    3. populates it with predefined test data (users, orders etc; also know as “fixtures”);
    4. runs the test itself
    5. drops the database and closes and disconnects from it.

    These steps should be run for each test. In that case, we can guarantee what every test has its own environment and does not depend on other tests. It’s a good practice when a test accepts a fresh installation not being touched by previous tests.

    Some preparation steps are:

    (ns your-project.test.users
      (:require [clojure.test :refer :all]
                [your-project.db :as db]))
    
    (defn migrate []
      "Loads all the migrations"
      (doseq [file ["schema/0001-init.edn"
                    "schema/0002-user-updated.edn"]]
        (db/load-schema file))
    
    (defn load-fixtures []
      "Loads all the fixtures"
      (db/load-schema "fixtures/test-data.edn"))
    
    (defn test-fixture [f]
      (db/init) ;; this function reads the config,
                ;; creates the DB and populates
                ;; the global Datomic connection
    
      (migrate)
      (load-fixtures)
      (f)         ;; the test is run here
      (db/delete) ;; deletes a database
      (db/stop))  ;; stops the connection
    
    (use-fixtures
      :each
      test-fixture)
    
    

    Now you may write your tests as well:

    
    (deftest user-may-login
      ...)
    
    (deftest user-proceed-checkout
      ...)
    

    For every test, you will have a database running with all the migrations and test data loaded.

    If you still do not have any tests in your project, I urge you to add them soon. Without tests, you cannot be sure you do not break anything when changing the code.

    Infrastructure (final touches)

    In the final section, I will highlight several important points that relate to the server management.

    Setting up production Postgres-driven backend

    Running in-memory Datomic database is fun since it really costs nothing. In production, you would better set up more reliable backend. Datomic supports Postgres storage system out from the box. To prepare the database, run the following SQL scripts:

    sudo su postgres # switch to postgres user
    cd /path/to/datomic/bin/sql
    psql < postgres-user.sql
    psql < postgres-db.sql
    psql datomic < postgres-table.sql
    

    The scripts above create a user datomic with the password datomic, then the database datomic with the owner datomic. The last script creates a special table to keep Datomic blocks.

    Please do not forget to change the standard datomic password to something more complicated.

    Running the transactor

    The following page describes how to run a transactor needed by peer library when you use non-memory data storage. I’m not going to retell it here. Instead, I will share a bit of config to run it automatically using the standard init.d Linux daemon.

    Create a file named datomic.conf in your my-project/conf directory. Put a symlink to /etc/init.d/ folder that references that file. Add the following lines into it:

    description "Datomic transactor"
    
    start on runlevel startup
    stop on runlevel shutdown
    
    respawn
    
    setuid <your user here>
    setgid <your group here>
    
    chdir /path/to/datomic
    
    script
        exec bin/transactor sql-project.properties
    end script
    

    There, /path/to/datomic is a directory where unzipped Datomic installation is located. sql-project.properties is a transactor configuration file where you should specify your Datomic key sent to your email.

    Now that you have put a symlink, try the following commands:

    sudo start datomic
    
    status datomic
    # datomic start/running, process 5281
    
    sudo stop datomic
    

    Console

    Most of RDBS have UI applications to manage the data. Datomic comes with built-in console that is run as web application. Within those console, you can examine the schema, perform queries and transactions.

    The following template runs a console:

    /path/to/datomic/bin/console -p 8088 <some-alias> <datomic-url-without-db>
    

    In my example, the command is:

    
    $(DATOMIC_HOME)/bin/console -p 8888 datomic \
    "datomic:sql://?jdbc:postgresql://localhost:5432/datomic?user=xxxxx&password=xxxxx"
    

    Opening a browser at http://your-domain:8888/browser will show you a dashboard.

    Some security issues may be mentioned here. The console does not support any login/password authentication, so it is quite unsafe to run the console on production server as-is. Implement at least some of the following steps:

    1. Proxy the console with Nginx. It must not be reachable by itself.
    2. Limit access by a list of IPs. These may be your office or your home only.
    3. There should be only secure SSL connection allowed, no plain HTTP. Let’s encrypt would be a great choice (see my recent post).
    4. Add basic/digest authentication to your Nginx config.

    To run a console as a service, create another console.conf file in /etc/init.d/ directory. Use the datomic.conf file as template. Substitute the primary command with those one shown above. Now you can run the console only when you really need it:

    sudo start console
    

    Backups

    Making backups regularly is highly important. Datomic installation carries a special utility to take care of it. You won’t need to make your backups manually by running pgdump against Postgres backend. Datomic provides a high-level backing up algorithm that performs in several threads. In addition, it supports AWS S3 service as a destination point.

    A typical backup command looks as follows:

    /path/to/datomic/bin/datomic -Xmx4g -Xms4g backup-db <datomic-url> <destination>
    

    To access AWS servers, you need to export both AWS_ACCESS_KEY_ID and AWS_SECRET_KEY variables first or prepend a command with them. In my case, the full command looks something like:

    AWS_ACCESS_KEY_ID=xxxxxx AWS_SECRET_KEY=xxxxxxx \
    /path/to/datomic/bin/datomic -Xmx4g -Xms4g backup-db \
    datomic:sql://xxxxxxxx?jdbc:postgresql://localhost:5432/datomic?user=xxxxxx&password=xxxxxxx" \
    s3://secret-bucket/datomic/2017/07/04
    

    The date part in the end is substituted automatically using $(shell date +\%Y/\%m/\%d) expression in Makefile or the following in bash:

    date_path=`date +\%Y/\%m/\%d` # 2017/07/04
    

    Add that command into your crontab file to make backups regularly.

    Backups as a way to deploy the data

    The good news are backup’s structure does not depend on the backend type. No matter you dump in-memory storage or Postgres cluster, the backup can be restored everywhere as well. It gives us possibility to migrate the data on our local machine, make a backup and then restore it into production database.

    Once you finished migrating you data, launch the backup command described above. The backup should go to S3. On the server, run the restore command:

    AWS_ACCESS_KEY_ID=xxxxx AWS_SECRET_KEY=xxxxx
    /path/to/datomic/bin/datomic -Xmx4g -Xms4g restore-db \
    s3://secret-bucket/datomic/2017/07/04 \
    "datomic:sql://xxxxxxx?jdbc:postgresql://localhost:5432/datomic?user=xxxx&password=xxxxx"
    

    When everything is done without mistakes, the server will catch the new data.

    Conclusion

    After spending about a week on moving from Postgres to Datomic I can say it really worths it. Although Datomic does not support most of the Postgres smart features like geo-spatial data or JSON structures, it is much closer to Clojue after all. Since it was made by the same authors, Datomic looks like as a continuation of Clojure. And that is a huge benefit that may overweight disadvantages mentioned above.

    Surfing the Internet, I found the next links that may also be helpful:

    I hope you enjoyed reading this material. You are welcome to share your thoughts in the commentary section.

  • Let's encrypt

    I’ve just tried Let’s encrypt service and may say it works like a charm! I am really impressed by it’s simplicity and robustness. It really works as it’s promised within several lines in shell. That’s how a good software should be made.

    Let’s encrypt is an SSL authority service that issues short-term SSL certificates for you. A typical certificate expires in 90 days and then you request for a new one.

    What’s the point to use exactly Let’s encrypt? There are some other SSL providers who also offer free certificates, just google for “free SSL cert”. The main reason is Let’s encrypt is totally automated. You don’t even need to open their site. The whole setup might be done in bash session in 5 minutes.

    Here is a quick example of setting up a SSL certificate on outdated Ubuntu 12.04 LTS:

    1. Download certbot script. Certbot is an open source software to communicate with Let’s encrypt service via secure ACME protocol:

      wget https://dl.eff.org/certbot-auto
      chmod a+x certbot-auto
      
    2. Backup your Nginx config by copying your *.conf files from /etc/nginx/conf.d/ somewhere. Then run:

      sudo /path/to/certbot-auto --nginx
      

      This command will ask you several questions. In most cases, the default choice would be enough. It scans your current Nginx config and makes required changes. Finally, you will be prompted for submitting your email. Please enter an existing one since it requires confirmation. In a minute, check your inbox and follow the secret link to submit your account.

    3. Reload Nginx service with something like

      sudo service nginx restart
      

      Open your site in Chrome, go to Developer console, “Security” tab, “View certificate” below the green label:

      SSL green label

      First, all the labels should be green but not red or orange. Second, “Let’s encrypt” authority should be noticed in the certificate’s details:

      SSL issued by

    4. You have gone through the main steps so far, although it would be great to setup automatic update for your certificate. Add the following into crontab config:

      0 */12 * * * /path/to/certbot-auto renew --no-self-upgrade
      

      This job tries to update the certificate twice a day as the official guide recommends.

    To find out more, please examine the Certbot documentation. It has nice setup wizard with step-by-step algorithms for all the operation systems. You may also automate Let’s encrypt not with bash script but within your favorite language. See the “Client options” page to observe existing libraries.

    Finally, I urge you to enable SSL for your project right now if you haven’t done this yet. Nowadays, there cannot be an excuse for sending your client’s data as-is without encryption. Please respect your visitors. Setting up SSL has never been so easy as it is with Let’s encrypt nowadays.

  • Weekly links #25

    Пишу вам из Турции. Держитесь, всего одно платье осталось!

Страница 53 из 84