Reference types: separating identities and values

January 20, 2014

I’ve been learning Clojure on and off for about a year (focusing especially hard on it these past three months). One key discovery was how great a job it does separating identities and values and why that’s important. The philosophy is a useful one no matter what language you’re using.

A value is an unchanging fact, like the date “January 20, 2014”. An identity is a reference that can change, like “today”. As I write this post, “today” refers to January 20, 2014. But by the time you read this, “today” may refer to a different day entirely, while “January 20, 2014” will always mean the same thing.

In most languages, collection types like arrays and hashmaps conflate value and identity. Take JavaScript for example. If you ask whether a number is equal to itself, of course it is:

> 42 == 42
true

But an array, not so much:

> [1,2,3] == [1,2,3]
false

There’s no theoretical reason [1,2,3] can’t be treated as a value and compared as a value, just like 42. However, for efficiency reasons, JavaScript arrays can be modified in place. So it becomes important to compare by identity and say, “this [1,2,3] is different from that [1,2,3]”, because if you change one of them, you don’t want the other one to also change.

In essence, the array in a language like JavaScript is an abstraction over a place in memory, not a value.

Since this is also how arrays work in Java, Python, Ruby, C, C++, Perl, PHP, Go, and most other languages you or I could think to name, it’s easy to take it for granted. After all, the implementation of this kind of array is simple. Semantically, however, a mutable array is somewhat confused. You have this identity that, at any particular point in time, holds a value… but there’s no way to isolate the “value” part.

With Clojure’s carefully designed data structures, we get collections that are truly values:

user=> (= [1 2 3] [1 2 3])
true

That = is truly comparing the values of the two collections, not their location in memory… something that takes a for-loop to do in JavaScript. And if you pass the value [1 2 3] into a function, when the function returns, the value will still be [1 2 3]. It can’t be changed. You can only obtain a new value that reflects the change, much like you obtain a new number, 43, that reflects adding one to 42 (but 42 is still 42).

Unlike the linked lists used by default in other Lisps (and Haskell), this persistent vector structure supports efficient random access, and hashmaps are available with the same properties.

Bringing identities back

If we had only these immutable values, we might have a theoretically interesting language, but it wouldn’t be great for getting much done. Real programs have side effects and state — otherwise all they’d do is heat up your computer. We need a way to say “today”, or “list of currently active users”, or “list of open database connections”, or “current state of the Sokoban puzzle”. We need to keep track of each of those identities’ values.

This is where reference types come in. Clojure has several reference types, but the general idea is the same: reintroduce state in a deliberate, explicit way.

The most basic type is the atom. It’s basically a mutable container for an immutable value.

(def a (atom [1 2 3]))    ; create an atom

@a                        ; dereference it. yields [1 2 3]

(def first-value-of-a @a) ; save the atom's current value

(reset! a [4 5 6])        ; change what the atom points to

@a                        ; [4 5 6]

first-value-of-a          ; still [1 2 3]

So, did we just reinvent a mutable array in a roundabout fashion? Not exactly. We’ve isolated the mutability in one place. Instead of being able to change any element of the vector at any time, code can only change the whole atom (to point to a new value), or nothing at all. And it’s trivial to take a snapshot. By dereferencing the atom we get a value that will never change on us.

The world in an atom

One neat trick this enables is to store all your state in a single atom, with a nested data structure inside. You can imagine for, say, a web-based game of tic-tac-toe, an atom whose mid-game value might be:

user=> @game-state
{:board [[ :X nil nil]
         [ :X  :O  :O]
         [nil nil  :X]]
 :turn :O
 :history [[[nil nil nil]
            [nil nil nil]
            [nil nil nil]]
           [[ :X nil nil]
            [nil nil nil]
            [nil nil nil]]
           [[ :X nil nil]
            [nil  :O nil]
            [nil nil nil]]
           [[ :X nil nil]
            [nil  :O nil]
            [nil nil  :X]]
           [[ :X nil nil]
            [nil  :O  :O]
            [nil nil  :X]]]}

To play a new move, you update the state by pointing it to a new state value, where

:board is modified by replacing the appropriate cell with the previous value of :turn
:turn is modified by replacing :O with :X or vice versa
:history is appended the previous value of :board (to allow the players to undo a move)

The update can be written as a pure function:

(defn make-move
    "Return an updated state value that reflects the current player
    moving at row x, column y."
    [prev-state x y]
    (let [{:keys [board turn history]} prev-state]

      (assoc prev-state
             :board (assoc-in board [x y] turn)
             :turn (if (= turn :O) :X :O)
             :history (conj history board))))

I’ve omitted error checking to keep it short. Don’t worry about the details here. The important part is that this is trivial to unit-test because while it operates on a state value, there’s no implicit state that you have to mock or otherwise deal with.

Then, separate from your logic — possibly in a click handler — you can use the above function to update the actual state:

(swap! game-state make-move x y)

Rendering? Simply write a function that takes a game state value, and returns a description of how the UI should look. I’ve used Clojure in my examples, but this is close to how the JavaScript library React works — and it turns out it’s blazing fast. However, use Clojure’s data structures with React, and your app becomes even faster.

Thinking carefully about identities helps you isolate mutation, yielding programs that are easier to understand and test — in any language. It also brings big wins when you deal with concurrency, as I’ll discuss in my next post, focusing on the ways atoms are accessed and updated.

Next post: Atoms by experiment.

You can follow me on Mastodon or Bluesky, or this blog via RSS.

Original text and images (not attributed to others) on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.