Performance where not expected or gsub vs sub string
If I were to be asked what is faster, regular sub string operation or a regular expression, I would without hesitation answer that sub string is.
This was the assumption that I approached a simple task with – stripping slashes from beginning and end of a string. Here’s the code:
path = path[1..-1] if path[0, 1] == '/' path = path[0..-2] if path[-1, 1] == '/'
I would naturally assume that the block above would be faster than path.gsub!(/^\/|\/$/, ''). But just in case, lets benchmark to be sure.
require 'benchmark'
original = '/hello/somewhat/long/path/here/'
max = 1_000_000
puts Benchmark.measure {
1.upto(max) do
path = original
path = path[1..-1] if path[0, 1] == '/'
path = path[0..-2] if path[-1, 1] == '/'
end
}
puts Benchmark.measure {
1.upto(max) do
path = original
path.gsub!(/^/|/$/, '')
end
}
# prints out
4.212000 0.000000 4.212000 ( 4.270000)
2.418000 0.000000 2.418000 ( 2.435000)
I don’t understand why, but gsub is 57% faster. I find it hard to believe that a few extra Ruby statements introduce so much overhead that it becomes slower than entire regular expressions engine. Anyone has any explanation for this?
2 comments.
gsub! updates the string (including original), while regular stringoperations creates a new copy.
Use gsub to compare apples with apples.
I got following result:
# prints out
5.547000 0.000000 5.547000 ( 5.406000)
4.937000 0.016000 4.953000 ( 5.031000)
@@ -14,7 +14,7 @@ puts Benchmark.measure {
puts Benchmark.measure {
1.upto(max) do
path = original
- path.gsub!(/^\/|\/$/, ”)
+ path = path.gsub(/^\/|\/$/, ”)
end
}
@M
Good point…
I’m trying to recall my C++ days and how strings work there. I know it’s a null terminated pointer to the memory. I’m curious if it’s actually possible to trim a string without having to reallocate the memory. Technically you could move a pointer a bit forward and move null a bit closer, but wouldn’t that result in a memory leak?
On the other hand, if a second pointer is kept which points inside the actual string… this is getting pretty complicated, and it’s assuming that regex is smart enough to recognize that string hasn’t increased in length and operate within the same memory block.
Leave a Reply