Implementing immutable vectors in JavaScript

March 24, 2014

Update: The library described in this post has now been publicly released.

This weekend I took a stab at implementing immutable vectors in JavaScript. Immutable vectors are arrays guaranteed never to be changed. Instead, you “change” them by obtaining a new copy with the change made:

// Create an immutable vector
var v1 = new ImmutableVector(1, 2, 3);
console.log(v1.toArray()); // [ 1, 2, 3 ]

// "Change" it by pushing 4 to the end
var v2 = v1.push(4);
console.log(v2.toArray()); // [ 1, 2, 3, 4 ]

// The original is still unchanged:
console.log(v1.toArray()); // [ 1, 2, 3 ]

It’s like dealing with numbers. You can update a numeric variable if you want: x = x + 1. But you can also compute something and assign the result to a different variable: y = x + 1. In that case, x stays the same. With immutable vectors, update methods like .push(val) and .set(index, val) work just like +.

You may have seen these in Clojure, where they’re the default array-like data structure, or in Scala. Clojure calls them “persistent vectors”, but that confuses everyone, so let’s go with Scala’s choice of “immutable”.

My approach was:

Write a bunch of tests using Mocha. General pattern: Do an operation, assert that return value is correct, assert that vector passed in remains unchanged.
Implement an “immutable vector” that just wraps an array and copies the whole thing when you change anything.
Tests pass! But it’s unusably slow.
“Wait, how slow is it really?” Write a few benchmarks (with Matcha).
Reread Jean Niklas L’orange’s post about Clojure’s implementation a couple times.
Based on that, write a “vector trie” implementation with a branching factor of 2, gradually making tests pass.
Fix all the bugs the tests didn’t catch (but that broke the benchmark). Add more tests.

On a 10,000-element vector, the naive, copy-everything approach performed like this:

82K changes/sec (to an existing element)
20K appends/sec

A vector trie with a branching factor of 2 sped things up nicely:

630K changes/sec
540K appends/sec

Then, simply setting the branching factor to 32 brought us to a 24-70x speedup over the naive approach:

2.0M changes/sec
1.3M appends/sec

I tried every power of two from 2 to 128. 128-way tries were slightly faster on in-place changes but worse on appends. Other than that, 32 won.

Bit shifting

When I implemented the last part of the Understanding Clojure’s persistent vector article, things got a little disappointing. “Bit partitioning” takes advantage of the fact that my branching factors are all powers of 2 by using bit shifting and bit masks to look up elements — rather than division and modular arithmetic. It’s really just a micro-optimization, and it didn’t help much:

2.0M changes/sec
1.4M appends/sec

The conventional wisdom on JavaScript is not to bother with bit operations since all numbers are floating-point anyway, but I thought V8 would be smart enough to know these numbers only take on integer values and optimize accordingly. Apparently not. That, or division and modulo weren’t the bottleneck to begin with.

Still to do

This is a hack I’m not ready to show off yet. If you dig a little you can find it on my GitHub page, but I’m not going to link directly.

The worst part is I cheated on implementing .slice() (not benchmarked). It simply copies the whole array. Clojure’s subvec is O(1), so there is a way to achieve that, but I know that will take longer to figure out. Maybe store some kind of a view on the original vector, with an offset. I’m worried about performance degrading when you attempt to operate on the vector slice.

Also, .pop() simply calls .slice(), so removing elements from the end of the array will be horrendously inefficient.

Finally, I’d like to beef up the benchmarking, testing random-access and sequential reads and perhaps finding a way to simulate realistic use in an application.

Why I’m doing this

It might seem silly to go up against Mori, a highly performant implementation of immutable collections which can be used from JavaScript. But Mori is a straight port of the standard lib from ClojureScript, and using it in JS feels weird. A solid, pure-JS implementation could feel truly native and be easier for JavaScript programmers to contribute to and use.

I’ve gone with JavaScript Array-like method names (push, pop, slice, as well as get and set), and I’m still not sure whether that’s a good idea. Pro: It’s familiar. Con: Since the semantics are different from pushing to or popping from an array, those names could be confusing. I’d appreciate feedback on this.

You can follow me on Mastodon or Bluesky, or this blog via RSS.

Original text and images (not attributed to others) on this page are licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.