It's been a busy two weeks since my last entry here. With moving out of uni accom back home, my VM holding half my SoC stuff failing for whatever reason and other random things keeping me busy I've had little coding time to get anything worth committing done. However, I've had plenty of thinking time and progress has been made. Right now, I'm updating my branch from trunk and installing MinGW here so I can get some more testing done. I'm going to quickly lay out my plan and what progress I've made in each area:
It seems that -msse etc. is required to use SIMD instructions on gcc. This is a bit of an inconvenience, but not a serious issue. The way I've decided to handle the problem is to force users (that's you) to put their SIMD code in a separate cpp file to c++ code, as suggested by the gcc docs. Then, all SIMD code will have to be compiled with those compiler flags. What I need to do is to make this as painless as possible, so I'm going to run configure checks (AX_CHECK_COMPILER_FLAGS()) to see if the flags are supported by the compiler, then save the results in COMPILER.CFLAGS.SIMD or something of the kind and/or COMPILER.HAS.SSE = "yes/no" etc. Next, I add something which allows me to specify in a jam file the compiler flags (those ones I saved) which will be applied to a specific cpp. It might be an idea to put SIMD code in a subdir to the main folder I think. That way we know that *.cpp will all be SIMD, which means we don't really need to worry about specific files, we can apply to everything (so to the whole Jamfile). Of course, we need to be able to #define out any code which isn't supported by the compiler too (or do this in the Jamfile). Any suggestions on how to refine this idea are welcome of course!
The next area I'm working in is that fairly important code path selector. I've decided on using a function which takes in the functions, arguments, types and selects the correct route to take. It looks like this:
CallSIMDVersion (Arrow) ReturnType, ArgumentTypes (Arrow) (SIMDInstructionSetEnum, SIMDFunction, C++Function, Arguments);
This works quite nicely, but it does have some limitations which I'm working towards removing. Right now, it only supports one SIMD path and a c++ fallback. It needs to be able to take several possible SIMD paths and a fallback (MMX, SSE3, AltiVec, C++ for example).
I chose this method mainly for it's lack of overhead and simplicity. Only a single call to check for capabilities is done (the results are cached), most internal functions to the check are inlined so I have few function calls and only one check per SIMD/c++ function call is made. A small benchmark I ran showed an overhead which was too small to be measured (<1us). The SIMD code itself ran 8x faster than the C++ which is a good sign. :) I'll probably commit that test as part of the simdtest app.
Also, I've started to define some CS types to be used for SIMD work. AltiVec and SSE use different methods of declaring the __m128 (SSE :)) type, but the CS version needs to be as platform independent as possible so the user can just 'use' it and not worry about maintaining compatibility. I'll add more details on this when I've written them. :)
Finally (as far as I can think), more testing is needed. As I said earlier, I'm installing MinGW to see if my code compiles fine there. Hopefully I won't run into many problems.
If anyone could point out how to have arrow brackets without this thing spitting errors at me, that'd be cool :P
It's been a while since my last entry, so I'll quickly update on what I've done.
Right now, basic runtime detection for Windows, x86 linux and PPC are done. I've changed quite significantly the original plan for that, now I have a base class with the inline bool HasMMX() type functions and the bool hasMMX; type vars. I've used a template on that, so I can pass the correct platform specific class to it when creating an object instance of it, then I use another class as an access point for the outside world which has it's own Has*() functions (which call the specific equivalent in the base class).
When a check for one instruction set is done, checks for all of them are done and a bitmask is returned. Then the correct instruction is fetched from this result.
So a check for MMX on windows would do something like this:
I think this is quite a nice solution. It allows us to easily add new checks in the future.
While writing some configure checks for xmmintrin.h and __m128 I ran into a problematic problem. GCC requires -msse to be enabled for me to access builtin intrinsic functions. However, -msse also tells the compiler to optimize non-floating point code with sse instructions :) To quote from the GCC manual:
"These options will enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications which perform runtime CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options."
To me, this is not a great option. I'm not sure why the GCC devs decided to force compiler optimizations upon us if we want to use intrinsics at all, but that's the way it is... maybe. I'm going to experiment on defining what the xmmintrin.h header requires to be defined.. maybe that will work. If not then we'll have to try what the manual suggests, making each file which uses intrinsics compile with the required flags. The third option is to say "screw this" and write my own versions of the intrinsics using asm. I'll still need to use the builtin stuff for x86_64, but that's okay because -msse and crew are defined by default on that platform. My hope is that I can trick the headers that all is good without giving the compiler an 'okay' to optimize.
Once a solution for this is done, I need to work out a code path for using these optimizations. Right now I'm favouring either using templates along with my own functions, or having a function like blah(SIMDcode, C++Code, arg1, arg2, argn); I haven't decided. Obviously I need to keep the overhead and code duplication down to a minimum. More on this later.
|<< <||Current||> >>|