It's been a while since my last entry, so I'll quickly update on what I've done.
Right now, basic runtime detection for Windows, x86 linux and PPC are done. I've changed quite significantly the original plan for that, now I have a base class with the inline bool HasMMX() type functions and the bool hasMMX; type vars. I've used a template on that, so I can pass the correct platform specific class to it when creating an object instance of it, then I use another class as an access point for the outside world which has it's own Has*() functions (which call the specific equivalent in the base class).
When a check for one instruction set is done, checks for all of them are done and a bitmask is returned. Then the correct instruction is fetched from this result.
So a check for MMX on windows would do something like this:
I think this is quite a nice solution. It allows us to easily add new checks in the future.
While writing some configure checks for xmmintrin.h and __m128 I ran into a problematic problem. GCC requires -msse to be enabled for me to access builtin intrinsic functions. However, -msse also tells the compiler to optimize non-floating point code with sse instructions :) To quote from the GCC manual:
"These options will enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications which perform runtime CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options."
To me, this is not a great option. I'm not sure why the GCC devs decided to force compiler optimizations upon us if we want to use intrinsics at all, but that's the way it is... maybe. I'm going to experiment on defining what the xmmintrin.h header requires to be defined.. maybe that will work. If not then we'll have to try what the manual suggests, making each file which uses intrinsics compile with the required flags. The third option is to say "screw this" and write my own versions of the intrinsics using asm. I'll still need to use the builtin stuff for x86_64, but that's okay because -msse and crew are defined by default on that platform. My hope is that I can trick the headers that all is good without giving the compiler an 'okay' to optimize.
Once a solution for this is done, I need to work out a code path for using these optimizations. Right now I'm favouring either using templates along with my own functions, or having a function like blah(SIMDcode, C++Code, arg1, arg2, argn); I haven't decided. Obviously I need to keep the overhead and code duplication down to a minimum. More on this later.
|<< <||> >>|