Performance of Ruby Array, Set, and Hash
In reviewing some Ruby code, the data were all in Sets, rather than Arrays. I asked the original author why they used Sets instead of Arrays. The answer was "because sets are faster than arrays."
Ruby documentation implies this as well:
"Set implements a collection of unordered values with no duplicates. This is a hybrid of Array's intuitive inter-operation facilities and Hash's fast lookup."
I rewrote the code to use arrays, and found the whole program was faster with arrays. Odd.
I decided to test it out. I wrote a quick test program. It adds 10 million numbers to a Set, an Array, and a Hash. And then to test the speed of the structure over the speed of the cpu, simply square the value.
require 'set'
testarray = []
testset = Set[]
testhash = {}
10000000.times do |i|
testarray.push(i)
testset.add(i)
testhash[:i] = i
end
array_start = Time.now
testarray.each do |num|
num*num
end
array_finish = Time.now
set_start = Time.now
testset.each do |num|
num*num
end
set_finish = Time.now
hash_start = Time.now
testhash.each do |k,v|
v*v
end
hash_finish = Time.now
puts "Array timing: #{array_finish - array_start}
Set timing: #{set_finish - set_start}
Hash timing: #{hash_finish - hash_start}"
The results, in seconds, for an average of 10 runs each are:
Array timing: 0.273671
Set timing: 0.314005
Hash timing: 2.0e-06
The hash value is 0.0000020 seconds if you don't like the scientific notation. I'm using wall clock to set the start/finish times, which is not as accurate as cpu clock time, but it's good enough for the experiment. These were run on a 2019 Macbook Pro:
For this experiment, sets are slower than arrays. In looking to re-write the code, I'd use a hash. It's vastly faster than both arrays and sets.