Implementing immutable vectors in JavaScript
March 24, 2014
Update: The library described in this post has now been publicly released.
This weekend I took a stab at implementing immutable vectors in JavaScript. Immutable vectors are arrays guaranteed never to be changed. Instead, you “change” them by obtaining a new copy with the change made:
// Create an immutable vector
var v1 = new ImmutableVector(1, 2, 3);
console.log(v1.toArray()); // [ 1, 2, 3 ]
// "Change" it by pushing 4 to the end
var v2 = v1.push(4);
console.log(v2.toArray()); // [ 1, 2, 3, 4 ]
// The original is still unchanged:
console.log(v1.toArray()); // [ 1, 2, 3 ]
It’s like dealing with numbers. You can update a numeric variable if
you want: x = x + 1
. But you can also compute something and assign
the result to a different variable: y = x + 1
. In that case, x
stays the same. With immutable vectors, update methods like
.push(val)
and .set(index, val)
work just like +
.
You may have seen these in Clojure, where they’re the default array-like data structure, or in Scala. Clojure calls them “persistent vectors”, but that confuses everyone, so let’s go with Scala’s choice of “immutable”.
My approach was:
- Write a bunch of tests using Mocha. General pattern: Do an operation, assert that return value is correct, assert that vector passed in remains unchanged.
- Implement an “immutable vector” that just wraps an array and copies the whole thing when you change anything.
- Tests pass! But it’s unusably slow.
- “Wait, how slow is it really?” Write a few benchmarks (with Matcha).
- Reread Jean Niklas L’orange’s post about Clojure’s implementation a couple times.
- Based on that, write a “vector trie” implementation with a branching factor of 2, gradually making tests pass.
- Fix all the bugs the tests didn’t catch (but that broke the benchmark). Add more tests.
On a 10,000-element vector, the naive, copy-everything approach performed like this:
- 82K changes/sec (to an existing element)
- 20K appends/sec
A vector trie with a branching factor of 2 sped things up nicely:
- 630K changes/sec
- 540K appends/sec
Then, simply setting the branching factor to 32 brought us to a 24-70x speedup over the naive approach:
- 2.0M changes/sec
- 1.3M appends/sec
I tried every power of two from 2 to 128. 128-way tries were slightly faster on in-place changes but worse on appends. Other than that, 32 won.
Bit shifting
When I implemented the last part of the Understanding Clojure’s persistent vector article, things got a little disappointing. “Bit partitioning” takes advantage of the fact that my branching factors are all powers of 2 by using bit shifting and bit masks to look up elements — rather than division and modular arithmetic. It’s really just a micro-optimization, and it didn’t help much:
- 2.0M changes/sec
- 1.4M appends/sec
The conventional wisdom on JavaScript is not to bother with bit operations since all numbers are floating-point anyway, but I thought V8 would be smart enough to know these numbers only take on integer values and optimize accordingly. Apparently not. That, or division and modulo weren’t the bottleneck to begin with.
Still to do
This is a hack I’m not ready to show off yet. If you dig a little you can find it on my GitHub page, but I’m not going to link directly.
The worst part is I cheated on implementing .slice()
(not
benchmarked). It simply copies the whole array. Clojure’s subvec
is
O(1), so
there is a way to achieve that, but I know that will take longer to
figure out. Maybe store some kind of a view on the original vector,
with an offset. I’m worried about performance degrading when you
attempt to operate on the vector slice.
Also, .pop()
simply calls .slice()
, so removing elements from the
end of the array will be horrendously inefficient.
Finally, I’d like to beef up the benchmarking, testing random-access and sequential reads and perhaps finding a way to simulate realistic use in an application.
Why I’m doing this
It might seem silly to go up against Mori, a highly performant implementation of immutable collections which can be used from JavaScript. But Mori is a straight port of the standard lib from ClojureScript, and using it in JS feels weird. A solid, pure-JS implementation could feel truly native and be easier for JavaScript programmers to contribute to and use.
I’ve gone with JavaScript Array-like method names (push
, pop
,
slice
, as well as get
and set
), and I’m still not sure whether
that’s a good idea. Pro: It’s familiar. Con: Since the semantics are
different from pushing to or popping from an array, those names could
be confusing. I’d appreciate feedback on this.