With Bare Metal: Benchmarking Ruby with GCC (4.6, 4.7, 4.8, 4.9) and Clang (3.3, 3.4, 3.5)

Recently, Peter Wilmott presented his benchmark results of Ruby compiled with different versions and optimization levels of GCC and Clang. However, the benchmarks were ran on an Amazon EC2 instance where CPU allocation may not be consistent while running the whole benchmark suite. Therefore, I suspected that the benchmarks might have been somewhat inaccurate and instead ran all benchmarks on a Softlayer Bare Metal Server:

SystemType Component
Operating SystemUbuntu14.04-64 Minimal for Bare Metal
RAM4x4GB Kingston 4GB DDR3 1Rx8
Processor3GHz Intel Xeon-IvyBridge (E5-2690-V2-DecaCore)
MotherboardSuperMicro X9DRI-LN4F+_R1.2A
Power SupplySuperMicro PWS-605P-1H

With references to Peter’s work, I re-implemented the scripts with the following limitations imposed:

  • Ruby 2.2.0 compiled from Ruby’s 2_2 branch.
  • No CFLAGS were set which defaults optimization to “-O3” since this is what most people will have installed.
  • Instead of running https://github.com/acangiano/ruby-benchmark-suite, I ran the benchmark suite available in Ruby’s repository.
  • Repeat count for Ruby’s benchmark suite was set to 5.

Unlike what Peter did, I chose not to collate the data using the points system as small differences will be unfairly amplified. Instead, I followed Peter’s updated version where percentage deviation from the baseline for each benchmark type is presented. The percentage deviation is calculated by taking the current value and dividing it by the average value for all the compilers.

Percentage faster/slower (Lower is better)

The graph is best viewed by only comparing two compilers at a time.

These are the things I noticed:
* Benchmark results for each compiler type is pretty similar if you compare the shape of the graphs.

  • Performance do deviate between Clang and GCC but that could also be noise. Certain benchmark types deviate alot more (vm_ensure, vm_mutex1).
  • Ruby compiled with GCC 4.6 performs better overall.
  • ¯\(ツ)/¯ I can’t reach a decent conclusion about the overall performance. Feel free to let me know how you might interpret the data in the comments below.

As I couldn’t reach a conclusion with Ruby’s benchmarks, I decided to dig deeper and run the compiled Rubies against Discourse’s benchmark script.

Discourse Benchmarks (Lower is better)

Update: I included all percentiles as I realized that the 50th percentile is pretty stable even across different Ruby releases

As you can see from the graph above, your Rails application is probably not going to be affected too much by the compiler that was used to compile Ruby. While the compiler used might not matter, the version of Ruby that you’re using matters. If you’re interested in the long term performance of Ruby, head over to RubyBench.org and find out how each Ruby release is performing.

To replicate the scripts, follow the instructions on my GitHub respository.

Thoughts, comments, mistakes? Let me know below 🙂