This discussion revolves around an email from a technically astute ExtremeTech reader. I'm including it as part of my weekly column rather than as a standard ExtremeTech story, mostly to get people thinking about the issue. Keep in mind that it comes from a single source, discussing a single game. But we thought it was interesting enough to open up to wider discussion.
Early last week, we received an email from Igor Levicki, commenting about Jason Cross's feature article, Real Gaming Challenge: Intel vs. AMD. Levicki wasn't disputing Jason's conclusion—that AMD beats Intel by wide margins in gaming tests. But he apparently decided to dig a little deeper. Here's what he did, in his own words:
It intrigued me why Intel CPUs have inferior performance in some games and in others they are on par with AMD.
Therefore, I have reverse-engineered Battlefield 2 game executable and come to the following conclusions:
1. It was compiled using Visual Studio 2003 C++ compiler. 2. It was compiled in blended mode almost without any optimizations.
We headed over to Microsoft's MSDN web site and obtained this little tidbit about blended mode:
"When no /Gx option is specified, the compiler defaults to /GB, "blended" optimization mode. In both the 2002 and 2003 releases of Visual C++ .NET, /GB is equivalent to /G6, which is said to optimize code for the Intel Pentium Pro, Pentium II, and Pentium III."
But Microsoft recommends that code writers use /G7 when designing code for Pentium 4's and AMD Athlon systems. Again, here's more from the MSDN web site on the topic:
"The performance improvement achieved by compiling an application with /G7 varies, but when comparing to code generated by Visual C++ .NET 2002, it's not unusual to see 5-10 percent reduction in execution time for typical programs, and even 10-15 percent for programs that contain a lot of floating-point code. The range of improvement can vary greatly, and in some cases users will see over 20 percent improvement when compiling with /G7 and running on the latest generation processors. Using /G7 does not mean that the compiler will produce code that only runs on the Intel Pentium 4 and AMD Athlon processors. Code compiled with /G7 will continue to run on older generations of these processors, although there might be some minor performance penalty. In addition, we've observed some cases where compiling with /G7 produces code that runs slower on the AMD Athlon."
This is a little unclear at this point. Microsoft's reference to "AMD Athlon" may refer to the older line of 32-bit Athlon CPUs (K7 generation)—the Athlon XP and earlier. Current 90nm Athlon 64s fully support Intel's SSE, SSE2, and SSE3 instructions.
The MSDN document linked above goes on to suggest that the /G7 switch will produce sequences that may have more instructions, but run more efficiently on the Pentium 4 by avoiding high-latency instructions, such as IMUL it would be logical to at least compile the game code with /G6 and /arch:SSE switches. That however, is not the case. I have checked it and the code uses only FPU, which is known to work slower on Pentium 4s. Moreover it uses pretty inefficient integer code too. Even /G6 would help a lot by enabling the compiler to generate conditional moves instead of many conditional branches, which are known to penalize NetBurst architecture so much.
Levicki goes on to speculate about the reasons game developers might do this, and leans towards conspiracy theories about pushing people to buy faster systems. Me, I tend to believe it's more laziness, coupled with extremely tight game-development schedules. Once we skip past this, Levicki returns to some technical advice:
Why not using at least SSE instead of FPU code? It is easy. They don't even have to spend time optimizing by hand. They only have to flip a switch to make the difference (or to kill it, depending on your viewpoint). They don't even have to use Intel compiler: Visual C++ will do for that basic step.
Why not, indeed?
Going back to Visual Studio C++ for a moment, Microsoft's online docs suggest that using the /ARCH:SSE and /ARCH:SSE2 switches allow code to automatically take advantage of the presence of SSE/SSE2 instructions. This is unlikely to penalize AMD specifically, though unrolling loops and other P4-specific operations might possibly penalize the Athlon 64, but it's hard to know without actually trying it. But using SSE/SSE2 shouldn't adversely affect AMD. Even Fred Weber, AMD's former chief technology officer, acknowledged that SIMD was the way to go with floating point as we move into the future.
Let's assume for a moment that Igor is correct in his technical analysis. In discussions with game developers over the past few years, I've learned that they tend to be pretty wary of automatic optimizations generated by simple use of compiler switches. Sometimes a large software build will break when certain automatic optimizations are turned on. Some of this is likely institutional memory, as compilers have improved over the years. Some of it is likely laziness coupled with tight schedules, as alluded to above. If you're a game developer on constant 80-hour a week crunch mode, experimenting with compiler switches is probably the last thing on your mind.
Still, it's an interesting thought. And the issue may not simply reside in the game code itself. Many game developers use third-party libraries and game engines, including physics engines, AI engines, audio processing libraries and more. If that's the case, then the optimizing the core game code may not have as large an impact as it might seem.
So I'd like to hear from the game and middleware developers, if you're reading this. Are your games optimized for all CPU architectures? Do you use automatic optimizations, or do you avoid them? If you do avoid using compiler optimization switches, let us know why. Inquiring minds would like to know.
This Week on ExtremeTech
This week, we've got a review on ATI's midrange X1600 GPU going up, although finding one in the retail channel may be a problem. Also up is Jim Lynch's write-up on the new forum software. Victor Loh covers an innovative new CPU cooler, as well as plunging into his first motherboard review. We also have a review of ATI's latest All-in-Wonder card as well.