JSam is a lightweight, zero-deps JSON parser and writer. Named after Jetstream Sam.

  • Small: only 14 Java files with no extra libraries;
  • Not the fastest one but is pretty good (see the chart below);
  • Has got its own features, e.g. read and write multiple values;
  • Flexible and extendable.

Installation

Requires Java version at least 17. Add a new dependency:

;; lein
[com.github.igrishaev/jsam "0.1.0"]

;; deps
com.github.igrishaev/jsam {:mvn/version "0.1.0"}

Import the library:

(ns org.some.project
  (:require
    [jsam.core :as jsam]))

Reading

To read a string:

(jsam/read-string "[42.3e-3, 123, \"hello\", true, false, null, {\"some\": \"map\"}]")

[0.0423 123 "hello" true false nil {:some "map"}]

To read any kind of a source: a file, a URL, a socket, an input stream, a reader, etc:

(jsam/read "data.json") ;; a file named data.json
(jsam/read (io/input-stream ...))
(jsam/read (io/reader ...))

Both functions accept an optional map of settings:

(jsam/read-string "..." {...})
(jsam/read (io/file ...) {...})

Here is a table of options that affect reading:

option default comment
:read-buf-size 8k Size of a buffer to read
:temp-buf-scale-factor 2 Scale factor for an innter buffer
:temp-buf-size 255 Inner temp buffer initial size
:parser-charset UTF-8 Must be an instance of Charset
:arr-supplier jsam.core/sup-arr-clj An object to collect array values
:obj-supplier jsam.core/sup-obj-clj An object to collect key-value pairs
:bigdec? false Use BigDecimal when parsing numbers
:fn-key keyword A function to process keys

If you want keys to stay strings, and parse large numbers using BigDecimal to avoid infinite values, this is what you pass:

(jsam/read-string "..." {:fn-key identity :bigdec? true})

We will discuss suppliers a bit later.

Writing

To dump data into a string, use write-string:

(jsam/write-string {:hello "test" :a [1 nil 3 42.123]})

"{\"hello\":\"test\",\"a\":[1,null,3,42.123]}"

To write into a destination, which might be a file, an output stream, a writer, etc, use write:

(jsam/write "data2.json" {:hello "test" :a [1 nil 3 42.123]})

;; or

(jsam/write (io/file ...))

;; or

(with-open [writer (io/writer ...)]
  (jsam/write writer {...}))

Both functions accept a map of options for writing:

option default comment
:writer-charset UTF-8 Must be an instance of Charset
:pretty? false Use indents and line breaks
:pretty-indent 2 Indent growth for each level
:multi-separator \n How to split multiple values

This is how you pretty-print data:

(jsam/write "data3.json"
            {:hello "test" :a [1 {:foo [1 [42] 3]} 3 42.123]}
            {:pretty? true
             :pretty-indent 4})

This is what you’ll get (maybe needs some further adjustment):

{
    "hello": "test",
    "a": [
        1,
        {
            "foo": [
                1,
                [
                    42
                ],
                3
            ]
        },
        3,
        42.123
    ]
}

Handling Multiple Values

When you have 10.000.000 of rows of data to dump into JSON, a regular approach is not developer friendly. It leads to a single array with 10M items that you read into memory at once. Only few libraries provide facilities to read arrays lazily.

It’s much better to dump rows one by one into a stream and then read them one by one without saturating memory. Here is how you do it:

(jsam/write-multi "data4.json"
                  (for [x (range 0 3)]
                    {:x x}))

The second argument is a collection that might be lazy as well. The content of the file is:

{"x":0}
{"x":1}
{"x":2}

Now read it back:

(doseq [item (jsam/read-multi "data4.json")]
  (println item))

;; {:x 0}
;; {:x 1}
;; {:x 2}

The read-multi function returns a lazy iterable object meaning it won’t read everything at once. Also, both write- and read-multi functions are pretty-print friendly:

;; write
(jsam/write-multi "data5.json"
                  (for [x (range 0 3)]
                    {:x [x x x]})
                  {:pretty? true})

;; read
(doseq [item (jsam/read-multi "data5.json")]
  (println item))

;; {:x [0 0 0]}
;; {:x [1 1 1]}
;; {:x [2 2 2]}

The content of the data5.json file:

{
  "x": [
    0,
    0,
    0
  ]
}
{
  "x": [
    1,
    1,
    1
  ]
}
{
  "x": [
    2,
    2,
    2
  ]
}

Type Mapping and Extending

This chapter covers how to control type mapping between Clojure and JSON realms.

Writing is served using a protocol named jsam.core/IJSON with a single encidng method:

(defprotocol IJSON
  (-encode [this writer]))

The default mapping is the following:

Clojure JSON Comment
nil null  
String string  
Boolean bool  
Number number  
Ratio string e.g. (/ 3 2) -> "3/2"
Atom any gets deref-ed
Ref any gets deref-ed
List array lazy seqs as well
Map object keys coerced to strings
Keyword string leading : is trimmed

Anything else gets encoded like a string using the .toString invocation under the hood:

(extend-protocol IJSON
  ...
  Object
  (-encode [this ^JsonWriter writer]
    (.writeString writer (str this)))
  ...)

Here is how you override encoding. Imagine you have a special type SneakyType:

(deftype SneakyType [a b c]

  ;; some protocols...

  jsam/IJSON
  (-encode [this writer]
    (jsam/-encode ["I used to be a SneakyType" a b c] writer)))

Test it:

(let [data1 {:foo (new SneakyType :a "b" 42)}
      string (jsam/write-string data1)]
  (jsam/read-string string))

;; {:foo ["I used to be a SneakyType" "a" "b" 42]}

When reading the data, there is a way to specify how array and object values get collected. Options :arr-supplier and :obj-supplier accept a Supplier instance where the get method returns instances of IArrayBuilder or IObjectBuilder interfaces. Each interface knows how to add a value into a collection how to finalize it.

Default implementations build Clojure persistent collections like PersistentVector or PersistenHashMap. There is a couple of Java-specific suppliers that build ArrayList and HashMap, respectively. Here is how you use them:

(jsam/read-string "[1, 2, 3]"
                  {:arr-supplier jsam/sup-arr-java})

;; [1 2 3]
;; java.util.ArrayList

(jsam/read-string "{\"test\": 42}"
                  {:obj-supplier jsam/sup-obj-java})

;; {:test 42}
;; java.util.HashMap

Here are some crazy examples that allow to modify data while you build collections. For an array:

(let [arr-supplier
      (reify java.util.function.Supplier
        (get [this]
          (let [state (atom [])]
            (reify org.jsam.IArrayBuilder
              (conj [this el]
                (swap! state clojure.core/conj (* el 10)))
              (build [this]
                @state)))))]

  (jsam/read-string "[1, 2, 3]"
                    {:arr-supplier arr-supplier}))

;; [10 20 30]

And for an object:

(let [obj-supplier
      (jsam/supplier
        (let [state (atom {})]
          (reify org.jsam.IObjectBuilder
            (assoc [this k v]
              (swap! state clojure.core/assoc k (* v 10)))
            (build [this]
              @state))))]

  (jsam/read-string "{\"test\": 1}"
                    {:obj-supplier obj-supplier}))

;; {:test 10}

Benchmarks

Jsam doesn’t try to gain as much performance as possible; tuning JSON reading and writing is pretty challenging. But so far, the library is not as bad as you might think! It’s two times slower that Jsonista and slightly slower than Cheshire. But it’s times faster than data.json which is written in pure Clojure and thus is so slow.

The chart below renders my measures of reading a 100MB Json file. Then the data read from this file were dumped into a string. It’s pretty clear that Jsam is not the best nor the worst one in this competition. I’ll keep the question of performance for further work.

Measured on MacBook M3 Pro 36Gb.

Another benchmark made by Eugene Pakhomov. Reading:

size jsam mean data.json cheshire jsonista jsoniter charred
10 b 182 ns 302 ns 800 ns 230 ns 101 ns 485 ns
100 b 827 ns 1 µs 2 µs 1 µs 504 ns 1 µs
1 kb 5 µs 8 µs 9 µs 6 µs 3 µs 5 µs
10 kb 58 µs 108 µs 102 µs 58 µs 36 µs 59 µs
100 kb 573 µs 1 ms 968 µs 596 µs 379 µs 561 µs

Writing:

size jsam mean data.json cheshire jsonista jsoniter charred
10 b 229 ns 491 ns 895 ns 185 ns 2 µs 326 ns
100 b 2 µs 3 µs 2 µs 540 ns 3 µs 351 ns
1 kb 14 µs 14 µs 8 µs 3 µs 8 µs 88 ns
10 kb 192 µs 165 µs 85 µs 29 µs 96 µs 10 µs
100 kb 2 ms 2 ms 827 µs 325 µs 881 µs 88 µs

Measured on i7-9700K.

On Tests

One can be interested in how this library was tested. Although being considered as a simple format, JSON has got plenty of surprises. Jsam has tree sets of tests, namely:

  • basic cases written by me;
  • a large test suite borrowed from the Charred library. Many thanks to Chris Nuernberger who allowed me to use his code.
  • an extra set of generative tests borrowed from the official clojure.data.json library developed by Clojure team.

These three, I believe, cover most of the cases. Should you face any weird behavior, please let me know.