One of my more practical projects is Cobalt, a Java implementation of the Lua 5.1 virtual machine. I’m often working on it, improving the code base or bringing it closer to PUC Lua. One recent improvement I’ve made is with string concatenation.
Before we get into the weeds, let’s talk about how strings are implemented in Lua (both PUC1 and Cobalt). While there’s some small optimisations2, in essence strings are just a byte array. This makes concatenation very easy - you just make a new array and copy the two across:
Apologies for the Java, but it seemed the easiest to get the point across. Now, it’s not hard to see that concatenating two strings of length and takes time.
This naive approach is a little problematic when performing multiple concatenations though -
a .. b .. c would first compute
b .. c3, and then
a .. $temp. Thankfully Lua isn’t that silly - it detects these chains of concatenations and builds the final string in one sweep.
However, it’s not quite as rosy when working with loops. For instance, let’s write a function which converts a table to a string:
Lua can’t see that all these concatenations can be merged into one, and so this code ends up being running in - each iteration of the loop needs to allocate a new string of size . Doing this with a table of size takes 13 seconds on my machine, takes 111 seconds.
The common solution to this is simple - append these substrings to a table, and get use
table.concat to build a final result at the end. This is standard practice in Java or C#, and is incredibly effective - the same table takes a little over a second to
The insight here is that we don’t actually need to allocate a full byte array right now. We’ll need to at some point, when we come to operate on the string, but that’s only the final result. Instead, let’s represent our intermediate string as a Rope.
Ropes are effectively just a binary tree, with the leaves containing the strings which make up the full rope. As concatenation is just building a new tree from two smaller ones, it’s simple. And, more importantly, fast - it runs in .
Now, building an byte array from a rope still takes time, but it (generally) only needs to happen once for the final string, rather than every iteration of the loop.
After applying this optimisation to Cobalt, our above examples now run in under a second. The code is now linear, so tables of size are still relatively speedy - taking 19 seconds. That said, there’s still room for improvement -
table.concat is still faster, albeit no longer quite as dramatically.