Compilers and Performance

Some Background

For those who may not be familiar with it, COSBI (Comprehensive Open Source Benchmark Initiative) is Van Smith’s effort to wrest control of benchmarking design from the hands of corporate interests and into the hands of the user community (please refer to Van’s Hardware Journal for more information). While this is certainly a laudable goal, it is also fraught with potential problems. Thus, when Van’s Hardware Journal published some preliminary results of the Quick CPU Test (here and here), a question that has been nagging me for quite some time was brought to the fore… just how much of an effect do different compilers have on the performance of a program, particularly across platforms?

The issue for me is that I need to have some kind of frame of reference in order to really understand something. Without a foundation, or basis, on which to compare things, I am left wondering what it all means. For example, if a manufacturer tells me that their new electric car will run for 6 hours on a single charge, I don’t really know what that means for me. Is that good? How does it compare to my gasoline powered car? I can answer the question by thinking about how much time I spend driving my car each week, and how many times I fill up with gas. This gives me a common frame of reference, and now I can feel comfortable in evaluating what ‘6 hours on a charge’ really means in terms of cost and convenience.

With my very limited experience with Windows programming and cross-platform compilers I found myself wondering how to interpret the results. Those with a lot of experience likely have a good ‘gut feel’ for such things, but asking around resulted in some differences in opinion. Some said Delphi is as good as any C compiler, and others said not. Some suggested that certain compilers would produce better performing code on specific platforms than others.

I am not comfortable with assertions that cannot be verified with some hard facts, and this seemed to be one of those situations. Many people have opinions, but nobody seemed to be able to back them up with solid evidence… so I continued to press for answers.

I did find a few references on the web on Delphi vs. C, and there seemed to be little difference in the resulting executables when the code was optimized by the programmer properly (see this page for an example) – but the question of whether certain compilers favor one architecture over another was still in question. Thanks to Brian Neal (see this post in Ace’s Hardware Forum), this second question could be answered – at least to some degree.

On the website referenced in Brian’s post, there are executables generated from a variety of compilers – all solving the same problem with essentially the same algorithm. For those interested in seeing whether these are really optimally implemented, the source is provided for each – but that wasn’t really my interest. I just wanted to see if different executables would result in the same relative performance as the other compilers across all platforms.

Test Setup, Results and Conclusion

What I did was to set up three systems using as many of the same components as possible:

  • Matrox G550
  • Matrox 5.12.01.1200 driver
  • 1024×768, 16-bit color
  • IBM 75GXP 45GB HDD
  • Windows 2000 SP2

The platform specific components were:

  • AOpen AK77Pro(A)-133, Duron 1.2GHz (133MHz x 9), Crucial Technology PC2100 DDR, CL2
  • AOpen AX34-U, PIII 1.2GHz (CuMine, 133MHz x 9), Crucial Technology PC133 SDRAM, CL2
  • AOpen AX4BS, P4 1.2GHz (Willamette, 100MHz x 12), Crucial Technology PC133 SDRAM, CL2

The point of making these systems as similar as possible was not to identify whether one platform performed better than another on a per-clock basis, but to simply make things as ‘equal’ as possible. I did run a test with each executable on an Athlon XP (at 1.2GHz), and they were almost identical to the Duron results, so I didn’t bother to duplicate the entire set of results. I also might have used a DDR or DRDRAM based P4, but again, my intention here was not to compare the platforms, but the compilers. It just so happens that I had this P4 system set up and ready to perform the tests (as part of another benchmark analysis I am performing).

I then ran 6 of the executables provided using the external timer program available from the same site. The parameter I used for all runs was to generate 10,000 digits of pi. The executables I chose were from the following compilers:

  • Borland C++ Build 6 (bcbcpp.exe)
  • Borland Delphi 6 Update 2 (delphipi.exe)
  • Metroworks Code Warrior 6 (cwc.exe and cwcpp.exe)
  • Microsoft Visual Studio .NET (vsc.exe and vscpp.exe)

I ran each one 10 times to determine the accuracy of the timer, threw out the highest and lowest scores, and averaged the 8 remaining scores. Generally, these 8 scores were pretty close. I also included the complete table of results at the bottom of this article.

Here are the results

bcbcpp.exe

cwc.exe

cwcpp.exe

DelphiPi.exe

vsc.exe

vscpp.exe

PIII

17316

21347

23448

23062

14525

15633

Duron

19668

24281

23260

28044

18075

17870

P4

35209

42283

40949

37123

27199

33328

Several things seem evident to me…

  • Every compiler except one produced executables that performed better on the PIII than the other two platforms. The question is whether this is a compiler issue, or because the programmer used a coding style that favors the PIII over the others.
  • The Delphi scores are generally worse than the Microsoft C and Borland C results, but this may also be due to the programmer being more familiar with optimizing C code than Delphi. One of the links earlier in this article seems to show that good programming style is more important than which compiler is used, at least between Microsoft C and Borland Delphi. It would be interesting to see some similar tests between various C compilers.
  • The Metroworks compiler seems to produce code that the P4 really doesn’t like at all. The question, of course, is whether this is inherent in the compiler, or if there were some differences in flag settings when making the executables.
  • The P4 scores are much worse for all compilers using this code. What seems interesting is that Delphi has the smallest delta between the three platforms, has the smallest delta between PIII and P4, but has the largest delta between PIII and K7.
  • The Microsoft compiler seems to produce the best performing code for all platforms -and in most cases, significantly faster. This may explain why Microsoft’s compiler seems to be the one used by the majority of commercial developers. Though Intel’s compiler is regarded as the best in this regard, it also seems to have many problems – however, recently it has been suggested that version 6 is much better. In the meantime, there seems to be a few complaints that Microsofts compilers are getting a little worse, performance wise.

As is the case in many intellectual endeavors, answers seem to beget more questions.

With only one example, it is difficult to come to any definite conclusions about the relative performance of executables produced by these 6 compilers. I wonder how many would be interested in participating in an ‘optimization’ contest to prove their programming skills, and to promote their favorite compiler? It does appear that a good programmer can wring about the same performance from a Delphi program as one written in C, though we still don’t know how such optimizations affect the cross-platform performance. Another interesting contest, perhaps?

As for COSBI, only time will tell whether the initiative will result in its intended goal, but the question in my mind of whether Delphi is good enough to create a reasonable benchmark seems to have been answered. Also, while I have seen a few other compiler comparisons, all of them ignore the cross-platform issue. This wasn’t an issue even two years ago, but it certainly is today.

Complete Table of Results

Results sorted from lowest to highest (not in the order they were actually generated)

bcbcpp.exe

cwc.exe

cwcpp.exe

DelphiPi.exe

vsc.exe

vscpp.exe

Duron

19658

24275

23253

28021

18056

17865

19658

24275

23253

28030

18065

17865

19658

24275

23253

28031

18066

17866

19668

24275

23253

28040

18075

17866

19668

24285

23263

28040

18076

17866

19668

24285

23263

28041

18076

17866

19669

24285

23264

28051

18076

17875

19669

24285

23264

28060

18076

17876

19688

24285

23264

28060

18086

17876

19728

24315

23274

28071

18086

17915

P4

35172

42250

40875

37109

27187

33313

35188

42250

40891

37109

27187

33313

35203

42281

40922

37110

27187

33328

35203

42281

40938

37110

27188

33328

35203

42281

40953

37125

27188

33328

35218

42282

40953

37125

27188

33328

35219

42297

40968

37125

27203

33328

35219

42297

40984

37125

27219

33329

35219

42297

40985

37156

27234

33344

35265

42312

41015

37156

27235

33359

PIII

17305

21321

23433

23053

14520

15623

17315

21321

23434

23053

14520

15623

17315

21321

23443

23053

14521

15632

17315

21330

23443

23053

14521

15632

17315

21341

23444

23063

14521

15632

17315

21361

23444

23063

14521

15633

17315

21361

23444

23064

14521

15633

17315

21371

23444

23073

14521

15633

17325

21371

23484

23073

14551

15643

17405

21381

23504

23103

14561

15673


Discuss (18 comments)