Playing with refs

February 9, 2014

I’ve written before about Clojure’s atoms and how they work. Atoms provide atomic, consistent, and (if you only update them with swap!) isolated access to a single identity.

Sometimes, though, we need multiple identities we can deal with in a synchronized way. That’s where refs come into play. In this post, I’ll provide a broken example you can copy-paste into a REPL, then show how refs fix it. Finally, I’ll go into a few neat add-on features refs offer.

Transferring money

A classic example of where you would need refs is a bank transfer. If we represent two bank accounts as atoms, and transfer between them, we’re going to have problems. Let’s try it and see.

(def account-a (atom 1000))
(def account-b (atom 1000))

To make it more obvious that this code is buggy, our transfer function will have a built-in delay. But basically its job is to check that the sending account has a high enough balance to cover the transfer, then debit the sender and credit the receiver.

(defn transfer-money [from to amount]
  (when (>= @from amount)
    (swap! from - amount)
    (Thread/sleep (rand-int 15))  ; To make effects more noticeable.
    (swap! to + amount)))

To test this code we can make a bunch of random transactions:

(defn make-random-transactions [acc1 acc2 n]
  (dotimes [_ n]
    (let [[from to] (shuffle [acc1 acc2])]
      (transfer-money from to (rand-int @from)))))

Without threads

If we only run the random transactions in one thread, we’re fine:

user=> (make-random-transactions account-a account-b 1000)
nil
user=> @account-a
1616
user=> @account-b
384
user=> (+ @account-a @account-b)
2000

The individual numbers will vary for you, obviously, but the sum should still be 2000, and neither account should be negative. That’s because we didn’t use multiple threads. (For this reason, ClojureScript, which targets a single-threaded environment, lacks refs. It doesn’t need them.)

With threads

Here’s some test code to run random transactions in 10 threads simultaneously. The meat of it is in the test-threaded-transfers function, but I have broken out reset-accounts and sum-accounts so they can be redefined when we switch to using refs.

(defn reset-accounts []
  (reset! account-a 1000)
  (reset! account-b 1000))

(defn sum-accounts [& accts]
  (reduce + (map deref accts)))

(defn test-threaded-transfers []
  (reset-accounts)

  (let [futures
        (doall
          (for [_ (range 10)]
            (future (make-random-transactions account-a account-b
                                              1000))))]
    (while (not (every? future-done? futures))
      (println "Sum of accounts:" (sum-accounts account-a account-b))
      (Thread/sleep 1000))
    (println "Final sum:" (sum-accounts account-a account-b))))

Every second, this will print out the sum of the two accounts. The sum of the two accounts should always be 2000, but as you’ll see if you run test-threaded-transfers, the sum jumps all over the place:

user=> (test-threaded-transfers)
Sum of accounts: 54
Sum of accounts: 35
Sum of accounts: 964
Sum of accounts: 240
Sum of accounts: 1097
Sum of accounts: -683
Sum of accounts: 449
Sum of accounts: 1634
Final sum: 2000
nil

In a second, we’ll fix that using refs.

Refs: setting and altering

Refs are created like atoms:

user=> (def r (ref 400))
#'user/r

They’re modified using ref-set, which works like reset!, and alter, which works like swap!. But there’s a catch. All modifications to a ref must be wrapped in the (dosync ...) macro.

user=> (alter r + 100)

IllegalStateException No transaction running
  clojure.lang.LockingTransaction.getEx
  (LockingTransaction.java:208)

user=> (dosync (alter r + 100))
500

dosync creates a transaction. Inside a transaction, Clojure keeps a log of all refs that were read or modified, and if one of them is altered before the transaction completes, the whole transaction will be retried. In addition, none of the in-transaction changes are visible to other threads until the transaction completes (“commits”). Then all of them are.

Refs are dereferenced using @, just like atoms. As with atoms, @r is reader sugar for (deref r), and deref can be spelled out if you’re using a higher-order function (e.g., (map deref list-of-refs)).

Transferring money with refs

Let’s redefine our accounts to be refs and rewrite transfer-money accordingly:

(def account-a (ref 1000))
(def account-b (ref 1000))

(defn transfer-money [from to amount]
  (dosync
    (when (>= @from amount)
      (alter from - amount)
      (Thread/sleep (rand-int 15))
      (alter to + amount))))

We’ll also rewrite reset-accounts from before.

(defn reset-accounts []
  (dosync
    (ref-set account-a 1000)
    (ref-set account-b 1000)))

With sum-accounts, there’s only one minor change, and that is to wrap the function body from before in a (dosync ...). This ensures we dereference all the refs at the same instant in time.

(defn sum-accounts [& accts]
  (dosync
    (reduce + (map deref accts))))

There’s no need to change test-threaded-transfers. If you’re copy-pasting all this into a REPL, it will still work.

Now:

user=> (test-threaded-transfers)
Sum of accounts: 2000
Sum of accounts: 2000
Sum of accounts: 2000
[ ... repeated many more times ... ]
Final sum: 2000

Much better!

So, what changed here? By wrapping our modifications in a dosync, we’ve given Clojure enough information to know that the changes to from and to are connected, and should happen — from the view of other threads — at the same instant in time.

However, you’ll notice the transfers now take much longer to complete. Let’s see if we can speed them up by replacing alter with commute.

Commuting instead of altering

commute is a special version of alter for when your update function is commutative. Suppose you want to alter a thing and also increment a counter:

(dosync
  (alter thing-a expensive-function-1 ...)
  (commute counter inc))

; in another thread:

(dosync
  (alter thing-b expensive-function-2 ...)
  (commute counter inc))

These transactions alter completely different things. thing-a for the first, thing-b for the second. No conflict there. But because they both increment the same counter, if you used alter one transaction would “lose” and have to be retried in its entirety. Such a waste.

The only thing we really need to do over in case of conflict is increment the counter. The increment is cheap and can be done independently.

And that’s exactly how commute works. If the commuted ref is changed by another thread, commute simply re-runs the update function with the ref’s new value at commit time. No need to repeat the expensive changes to thing-a and thing-b.

We can apply this to our bank transfer example, but we have to be careful. Using commute on the addition is fine. Using it on the subtraction would be a bug. Can you see why?

(defn transfer-money [from to amount]
  (dosync
    (when (>= @from amount)
      (alter from - amount)
      (Thread/sleep (rand-int 15))
      (commute to + amount))))

On my computer, this version actually makes (test-threaded-transfers) take slightly longer — 44 vs. 41 seconds. So this may not be the greatest example to try commute on.

Validators

A bank balance should never be negative. We can express that restriction in code using a validator:

(defn valid-account-balance? [bal]
  (>= bal 0))

(set-validator! account-a valid-account-balance?)
(set-validator! account-b valid-account-balance?)

Now if we try to set an account balance to a negative number, an exception is thrown and the ref’s value is unchanged:

user=> (dosync (ref-set account-a 1000))
1000
user=> (dosync (ref-set account-a -1))

IllegalStateException Invalid reference state
  clojure.lang.ARef.validate (ARef.java:33)
user=> @account-a
1000

You can use validators with atoms too.

Breaking commute?

With this validator in place, we should now be equipped to answer the question I asked above, as to why commute can’t be used when we subtract from a balance. Try it:

(defn transfer-money [from to amount]
  (dosync
    (when (>= @from amount)
      (Thread/sleep (rand-int 150))
      (commute from - amount)  ;; BAD
      (Thread/sleep (rand-int 15))
      (commute to + amount))))

We’ll modify our make-random-transactions function slightly:

(defn make-random-transactions [acc1 acc2 n]
  (dotimes [_ n]
    (let [[from to] (shuffle [acc1 acc2])]
      (transfer-money from to (rand-int 500)))))

I changed (rand-int @from) to (rand-int 500). So this code will attempt to make transfers that are in excess of the sending account’s balance, and only the when clause in transfer-money prevents that from happening.

Confession time:

I was not able to get the above code, or any variations I tried, to actually break. No matter what I did, it kept working. On my computer, running (test-threaded-transfers) at a REPL with the above definitions still works fine.

But here’s how I hoped it would break, and why I believe it’s broken in theory. One of the accounts would go negative, making validation fail, like so:

@account-a is 20
Thread 1 begins a transaction: (transfer-money account-a account-b 20)
Thread 1 verifies (> @account-a 20), and decides to proceed with the transaction
Thread 2 begins a transaction: (transfer-money account-a account-b 10)
Thread 2 verifies (> @account-a 10), and decides to proceed with the transaction
Thread 1 subtracts 20 from account A, adds 20 to account B
Thread 1 commits. Globally, @account-a is 0
Thread 2 subtracts 10 from account A, adds 10 to account B. Inside of the transaction, thread 2 thinks @account-a is now 10
Thread 2 tries to commit. The subtraction from account-a is repeated. @account-a is now -10. Validation fails.

Despite many tries, I haven’t gotten this scenario to actually happen. But it might! Maybe with more threads, or on a different computer, or if you ran the test a million times. I promise the above code is bad and you better not write it if you have real money at stake.

Or maybe I goofed and there’s some reason, unknown to me, why my scenario will never happen. I’ll update this post if I find out.

Why no bang?

As a total side note, you might wonder why the functions to modify refs don’t have a bang at the end. swap! and reset! have the bang, alter and ref-set don’t.

It turns out the official meaning of a bang in Clojure is “this function is not safe to use in a transaction”. It doesn’t mean the function has a side effect, like in Scheme. It means if you call it in a dosync something bad will happen.

Every function that modifies a ref has to happen in a dosync, by definition, so they don’t have the bang.

Conclusion

Refs implement software transactional memory. This essentially means they provide similar guarantees to databases. Transactions on refs are atomic, consistent, and isolated — the A, C, and I in ACID.

Atoms can provide the same guarantees as long as you store all relevant state in a single atom. But refs generalize atoms, allowing you to express multiple identities in a more natural way, and allowing more concurrency: if you modify different, unrelated bits of state, you don’t get any unnecessary retries.

For further reading on refs, I recommend Neale Swinnerton’s post Clojure STM: What, Why, How?, which goes more into the nitty-gritty of how they’re implemented.

Thanks to Lindsey Kuper for inspiring this post and the previous “Atoms by experiment”. Lindsey urged me to take a more experimental, demonstrative approach after I showed her a very dry, early draft post which was basically my notes from watching a Rich Hickey talk. Not that there is anything wrong with Rich’s talks.

I don’t have any immediate plans to write about agents or vars, so this is the end for now. Thanks for reading!

You can follow me on Mastodon or Bluesky, or this blog via RSS.

Original text and images (not attributed to others) on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.