What is Ruby String.length time complexity?
I’ve been asking myself this question: is Ruby String.length
like C strlen
which is O(n)
? In order to find it I’ve decided to dive into the
Ruby MRI source code (which is written in C).
Here’s the source code of ruby String.length
(at the time of writing this article):
VALUE
rb_str_length(VALUE str)
{
return LONG2NUM(str_strlen(str, NULL));
}
As you can see it a calls str_strlen
and then calls LONG2NUM
on the results, what LONG2NUM
does
is just to convert a C long int
into a ruby Numeric
class.
At this point let’s see what str_strlen
does:
static long str_strlen(VALUE str, rb_encoding *enc) { const char *p, *e; int cr; if (single_byte_optimizable(str)) return RSTRING_LEN(str); if (!enc) enc = STR_ENC_GET(str); p = RSTRING_PTR(str); e = RSTRING_END(str); cr = ENC_CODERANGE(str); if (cr == ENC_CODERANGE_UNKNOWN) { long n = rb_enc_strlen_cr(p, e, enc, &cr); if (cr) ENC_CODERANGE_SET(str, cr); return n; } else { return enc_strlen(p, e, enc, cr); } }
What you should focus on is just the bold text, which is what usually runs (except if the string is encoded in a
multi-byte encoding). If you take a look all is does is to call RSTRING_LEN(str)
:
#define RSTRING_LEN(string) RSTRING(string)->len
The above is a C macro which just fetches len from a RString
struct. In fact every Ruby string is
stored in a RString
struct, which contains meaningful informations such as the current length of the
String:
struct RString {
struct RBasic basic;
union {
struct {
long len;
char *ptr;
union {
long capa;
VALUE shared;
} aux;
} heap;
char ary[RSTRING_EMBED_LEN_MAX + 1];
} as;
};
Now that you’ve seen how it works I think you already know the answer: the time complexity to read a
String length in ruby is O(1)
constant time, unless we are saving a string in a multi byte
encoding which cannot benefit from the single_byte_optimizable
feature. In that case we have a
performance degradation to O(n)
.
Below some benchmarks which shows this:
require 'benchmark/ips'
optimizable = "this is a string" * 10_000
not_optimizable = "this is â string" * 10_000
Benchmark.ips do |x|
x.report("optimizable") { optimizable.length }
x.report("not optimizable") { not_optimizable.length }
x.compare!
end
result:
Calculating -------------------------------------
optimizable 268.816k i/100ms
not optimizable 7.570k i/100ms
-------------------------------------------------
optimizable 28.189M (± 4.8%) i/s - 140.322M
not optimizable 76.836k (± 5.8%) i/s - 386.070k
Comparison:
optimizable: 28188582.0 i/s
not optimizable: 76836.2 i/s - 366.87x slower
As you can see just by using a 160000 length string the optimized O(1)
version is 366 times faster,
the longer is the string compared the faster will be the optimized version due to the different time
complexity.