SameShirtEveryDay.com

Personal blog of the one called Alex Gorbatchev, from Toronto, Canada.

Performance where not expected or gsub vs sub string

Posted on September 29th, 2007 by Alex Gorbatchev. In Ruby. 2 comments!

If I were to be asked what is faster, regular sub string operation or a regular expression, I would without hesitation answer that sub string is.

This was the assumption that I approached a simple task with – stripping slashes from beginning and end of a string. Here’s the code:

path = path[1..-1] if path[0, 1] == '/'
path = path[0..-2] if path[-1, 1] == '/'

I would naturally assume that the block above would be faster than path.gsub!(/^\/|\/$/, ''). But just in case, lets benchmark to be sure.

require 'benchmark'

original = '/hello/somewhat/long/path/here/'
max = 1_000_000

puts Benchmark.measure {
  1.upto(max) do
    path = original
    path = path[1..-1] if path[0, 1] == '/'
    path = path[0..-2] if path[-1, 1] == '/'
  end
}

puts Benchmark.measure {
  1.upto(max) do
    path = original
    path.gsub!(/^/|/$/, '')
  end
}

# prints out
  4.212000   0.000000   4.212000 (  4.270000)
  2.418000   0.000000   2.418000 (  2.435000)

I don’t understand why, but gsub is 57% faster. I find it hard to believe that a few extra Ruby statements introduce so much overhead that it becomes slower than entire regular expressions engine. Anyone has any explanation for this?

You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

2 comments.

  1. M From

    gsub! updates the string (including original), while regular stringoperations creates a new copy.

    Use gsub to compare apples with apples.

    I got following result:

    # prints out
    5.547000 0.000000 5.547000 ( 5.406000)
    4.937000 0.016000 4.953000 ( 5.031000)

    @@ -14,7 +14,7 @@ puts Benchmark.measure {
    puts Benchmark.measure {
    1.upto(max) do
    path = original
    - path.gsub!(/^\/|\/$/, ”)
    + path = path.gsub(/^\/|\/$/, ”)
    end
    }

  2. Alex Gorbatchev

    @M

    Good point…

    I’m trying to recall my C++ days and how strings work there. I know it’s a null terminated pointer to the memory. I’m curious if it’s actually possible to trim a string without having to reallocate the memory. Technically you could move a pointer a bit forward and move null a bit closer, but wouldn’t that result in a memory leak?

    On the other hand, if a second pointer is kept which points inside the actual string… this is getting pretty complicated, and it’s assuming that regex is smart enough to recognize that string hasn’t increased in length and operate within the same memory block.

Leave a Reply

Allowed tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> , rel="nofollow" in use - no link dropping, no keywords or domains as names; do not spam, and do not advertise!

home
Subscribe to this blog Follow me on Twitter My bookmarks on Delicious My photography on Flickr