Select blog: [sueastside] [futuredib] [Xordan] [iceeey] [Welcome] [jwir3] [Jorrit] [BlackPanther] [Krajcevski] [OllieBrown] [res] [SamD] [KlausM] [Mohit] [Leonardo RD] [baribal] [Christian] [RlyDontKnow] [Antony] [Naman22]
Select skin: [basic] [custom] [natural_pink] [nifty_corners] [originalb2] [skinners_guide] [wpc_default]

Xordan's blog


17:08:46 Permalink Final(?) syntax for cross-thread messaging.   English (EU)

Categories: Thread Communication, 337 words

Now that I've got something (as far as I've tested) working, I'll explain what the final system is turning out to be like.

Firstly; two reference files..*checkout*/crystal/CS/branches/soc2008/threadcomm/apps/tests/threadtest/threadtest.h*checkout*/crystal/CS/branches/soc2008/threadcomm/apps/tests/threadtest/threadtest.cpp

Start off in the header file.

Here you will see an interface; iThreadTest, and the implementation csThreadTest.

iThreadTest looks like a normal interface. Nothing new there.

You will see that csThreadTest implements ThreadedCallable. This is important, it basically allows the object instance of the class to be stored inside an event message. This message is what is sent to the thread manager and queued as a job to be executed by a thread.

In csThreadTest the functions you wish to be executed in other threads are declared with the THREADED_CALLABLE_DECX macro. You pass the class, the function name, the argument types and the argument names separately. This macro then does two things;
1) Creates a function called functionnameTC(). This is what your function will really be called.
2) Creates a function called functionname, to satisfy the implementation of the interface. This function will create a thread event based on the arguments passed to the macro and pass that thread event message to the thread manager for processing. It stores the arguments passed by copying them onto the heap and storing pointers to that data in an array.

Now look at the .cpp file.

The functions are implemented using the THREADED_CALLABLE_IMP macro. You again pass the class and function name, but this time you pass the argument type and name as a single argument, like you would declare a normal function.
This macro simply creates the function header for class::functionnameTC().

Go to the bottom where the main function is. See that the functions are called normally via the interface.

I will provide a more detailed look at how it works behind the scenes shortly.

218 feedbacks PermalinkPermalink


05:23:57 Permalink Small update   English (EU)

Categories: Thread Communication, 211 words

This is going to be a quick update on what's happening now.

I'm investigating and trying out better ways to pass the data between threads, I'll write something more detailed once I'm successful in finding something good.

On the csparser side of things I've started writing the threaded loader. I'm writing it in a separate file to the current one, along with a new loader context which is also in its own file. I'm going to be writing it 'from scratch with some copy/paste', basically I'll be taking anything useful from the current loader (bottom node parsing and loading) and giving it some refactoring, then rewriting everything else which I don't think is usable.

My reasoning for doing this is that the current loader seems quite coupled, it has multiple ways of doing the same thing, functions which do more than the name suggests (and more than they should do, see last point), and lots of functions which I want to rename or change the return type of. I'm also hoping I can speed things up and make everything a little nicer on the eye while I'm at it. I won't have to worry about API breakage or altering the semantics of methods when I'm fixing resource sharing conflicts either.

93 feedbacks PermalinkPermalink


12:36:09 Permalink Aim and Progress   English (EU)

Categories: Thread Communication, 584 words

For this first entry I'll quickly go over what I'm aiming to achieve and then fill you in on what I've got done so far.

The idea is to provide an easy to use method to run member functions inside threads.
There's got to be some central management providing a queue, access methods and the framework to create the right number of threads.
There's got to be some base class to inherit from, to provide the methods needed to interact with the management.

As as test case I'm working with csLoader, with a goal to give CS some multi-threaded loading capabilities.

I've committed my test ideas to supplement this entry, so you can see exactly what I'm experimenting with at the moment.
You can see this here:


In these files I've provided a singleton based thread manager. This is pretty small and simple, it combines processor detection with a ThreadedJobQueue, giving a globally accessible threadsafe queue with 'cpu/core count' number of worker threads. The only publicly visible method to interact with this queue is to enqueue jobs to it. Each 'job' is in the form of a ThreadEvent.


Here I have three things;
A macro for queueing an event.
The ThreadedCallable class which is to be inherited by classes which are to be threaded.
The ThreadEvent class which is the 'job' package containing all the information needed for the method call.

The macro simply creates a new ThreadEvent (filling it with the given information) and adds it to the global thread queue.

The ThreadedCallable class contains two methods:
GetThreadedCallable() - This is to simply query the ThreadedCallable type.
RunMethod(); - This needs to be implemented by the inheriting class. When this method is executed it should find the correct method to run based on the methodIndex and extract the arguments for this method from the args array.

The ThreadEvent class contains the object who's method should be called, the methodIndex which represents which method is to be called and the args array, which contains a copy of all the arguments to that method. When a thread pops this off the queue, it will execute the Run() method, which in turn will call the RunMethod() of the appropriate object, which will then call the desired method and call that, passing the contents of args to it.

So at the moment this is a simple callback system. I've tested it out with two methods of csLoader and it works fairly well. There are currently issues with calling multiple methods of csLoader at the same time (in some cases), which I'll deal with in future once I've finalised the system. I'm not ecstatic with how it works right now, it doesn't seem as developer-friendly as it could be (or as fast). But it's a possible implementation that I'd be content with if nothing better is found.

On the csLoader side of things, you can see that I've added on a threaded loader object. I'm unsure if this is the best way to go about it, or if I should make it an SCF query-able object (so people can do csQueryRegistry for iThreadedLoader, instead of querying for iLoader and calling GetThreadedLoader() as it is now). The way I do it now is kind of nice because technically they are the same system, the threaded bit is just a small extension. I'll need some feedback on the 'correct' way to do this.

2 feedbacks PermalinkPermalink


06:27:15 Permalink Delays   English (EU)

Categories: Optimisation Framework, 639 words

It's been a busy two weeks since my last entry here. With moving out of uni accom back home, my VM holding half my SoC stuff failing for whatever reason and other random things keeping me busy I've had little coding time to get anything worth committing done. However, I've had plenty of thinking time and progress has been made. Right now, I'm updating my branch from trunk and installing MinGW here so I can get some more testing done. I'm going to quickly lay out my plan and what progress I've made in each area:

It seems that -msse etc. is required to use SIMD instructions on gcc. This is a bit of an inconvenience, but not a serious issue. The way I've decided to handle the problem is to force users (that's you) to put their SIMD code in a separate cpp file to c++ code, as suggested by the gcc docs. Then, all SIMD code will have to be compiled with those compiler flags. What I need to do is to make this as painless as possible, so I'm going to run configure checks (AX_CHECK_COMPILER_FLAGS()) to see if the flags are supported by the compiler, then save the results in COMPILER.CFLAGS.SIMD or something of the kind and/or COMPILER.HAS.SSE = "yes/no" etc. Next, I add something which allows me to specify in a jam file the compiler flags (those ones I saved) which will be applied to a specific cpp. It might be an idea to put SIMD code in a subdir to the main folder I think. That way we know that *.cpp will all be SIMD, which means we don't really need to worry about specific files, we can apply to everything (so to the whole Jamfile). Of course, we need to be able to #define out any code which isn't supported by the compiler too (or do this in the Jamfile). Any suggestions on how to refine this idea are welcome of course!

The next area I'm working in is that fairly important code path selector. I've decided on using a function which takes in the functions, arguments, types and selects the correct route to take. It looks like this:

CallSIMDVersion (Arrow) ReturnType, ArgumentTypes (Arrow) (SIMDInstructionSetEnum, SIMDFunction, C++Function, Arguments);

This works quite nicely, but it does have some limitations which I'm working towards removing. Right now, it only supports one SIMD path and a c++ fallback. It needs to be able to take several possible SIMD paths and a fallback (MMX, SSE3, AltiVec, C++ for example).

I chose this method mainly for it's lack of overhead and simplicity. Only a single call to check for capabilities is done (the results are cached), most internal functions to the check are inlined so I have few function calls and only one check per SIMD/c++ function call is made. A small benchmark I ran showed an overhead which was too small to be measured (<1us). The SIMD code itself ran 8x faster than the C++ which is a good sign. :) I'll probably commit that test as part of the simdtest app.

Also, I've started to define some CS types to be used for SIMD work. AltiVec and SSE use different methods of declaring the __m128 (SSE :)) type, but the CS version needs to be as platform independent as possible so the user can just 'use' it and not worry about maintaining compatibility. I'll add more details on this when I've written them. :)

Finally (as far as I can think), more testing is needed. As I said earlier, I'm installing MinGW to see if my code compiles fine there. Hopefully I won't run into many problems.

If anyone could point out how to have arrow brackets without this thing spitting errors at me, that'd be cool :P

2 feedbacks PermalinkPermalink


12:11:49 Permalink Changes and a problematic problem.   English (EU)

Categories: Optimisation Framework, 479 words

It's been a while since my last entry, so I'll quickly update on what I've done.

Right now, basic runtime detection for Windows, x86 linux and PPC are done. I've changed quite significantly the original plan for that, now I have a base class with the inline bool HasMMX() type functions and the bool hasMMX; type vars. I've used a template on that, so I can pass the correct platform specific class to it when creating an object instance of it, then I use another class as an access point for the outside world which has it's own Has*() functions (which call the specific equivalent in the base class).

When a check for one instruction set is done, checks for all of them are done and a bitmask is returned. Then the correct instruction is fetched from this result.

So a check for MMX on windows would do something like this:

I think this is quite a nice solution. It allows us to easily add new checks in the future.

While writing some configure checks for xmmintrin.h and __m128 I ran into a problematic problem. GCC requires -msse to be enabled for me to access builtin intrinsic functions. However, -msse also tells the compiler to optimize non-floating point code with sse instructions :) To quote from the GCC manual:

"These options will enable GCC to use these extended instructions in generated code, even without -mfpmath=sse. Applications which perform runtime CPU detection must compile separate files for each supported architecture, using the appropriate flags. In particular, the file containing the CPU detection code should be compiled without these options."

To me, this is not a great option. I'm not sure why the GCC devs decided to force compiler optimizations upon us if we want to use intrinsics at all, but that's the way it is... maybe. I'm going to experiment on defining what the xmmintrin.h header requires to be defined.. maybe that will work. If not then we'll have to try what the manual suggests, making each file which uses intrinsics compile with the required flags. The third option is to say "screw this" and write my own versions of the intrinsics using asm. I'll still need to use the builtin stuff for x86_64, but that's okay because -msse and crew are defined by default on that platform. My hope is that I can trick the headers that all is good without giving the compiler an 'okay' to optimize.

Once a solution for this is done, I need to work out a code path for using these optimizations. Right now I'm favouring either using templates along with my own functions, or having a function like blah(SIMDcode, C++Code, arg1, arg2, argn); I haven't decided. Obviously I need to keep the overhead and code duplication down to a minimum. More on this later.

53 feedbacks PermalinkPermalink


22:02:44 Permalink Overview of project   English (EU)

Categories: Optimisation Framework, 525 words

Hey, my name is Mike Gist (aka Xordan). Currently I'm a first year student studying Computing at Imperial College London. I'll be working on the optimisation framework project for the next few months and I'll be keeping note of my progress here and explain a bit about what I'll be doing now.

At the moment there are various optimisations that could be done using SIMD instructions, but there is no way to properly detect support and use the correct code path. Obviously an Athlon XP won't be able to use SSE3 instructions, and a PPC processor will be able to use AltiVec only.. assuming it supports it. My job is to add runtime and compile time detection of the supported instruction sets (both for the processor and the OS), add a method for the correct code path to be selected and used, and then to make use of this in various places in the existing CS code.

Within the scope of this project I'll be concentrating on MMX, SSE, SSE2, SSE3 and AltiVec. Later I will expand and include SSSE3 and SSE4, but those aren't priorities for now.

Currently there is a class called csProcessorCapability which contains some MMX detection code. I plan to rewrite all of this, but keep the name :) There are different ways of detecting supported instructions (asm or inbuilt compiler functions). VC has a function called isProcessorFeaturePresent() which makes detection very simple. For gcc it's a bit harder, we will need to use cpuid to get the info. We will know the minimum instruction set supported by an architecture so we can short cut some checks as well (amd64 all support from MMX through to SSE2 for example).

The csProcessorCapability class will look something like this:

class csProcessorCapability

csProcessorCapability() {}

~csProcessorCapability() {}

static inline bool HasMMX() {}

static inline bool HasSSE() {}

static inline bool HasSSE2() {}

static inline bool HasSSE3() {}

static inline bool HasAltiVec() {}


static inline void Initialise() {}
static bool isInitialised;
static bool supportsMMX;
static bool supportsSSE;
static bool supportsSSE2;
static bool supportsSSE3;
static bool supportsAltiVec;

static inline void CheckSupportedInstructions(bool MMX, bool SSE, bool SSE2, bool SSE3, bool altiVec) {}

The CheckSupportedInstructions function is run once by the Initialise() function, and contains all the detection code. If it's already known that an instruction is supported or isn't supported then this will be passed to that function and it won't be checked. I'll go into more detail as I progress.

Compile time checks will be for __m128 support (checking for xmmintrin.h), so we know if the compiler actually supports this stuff :) No compiler support, no extra cpu juice. Some other things will be used, like target arch, which will allow us to rule out some instruction sets. I'll add an entry just on this when I get to it. First thing is getting that class written. I'm giving myself two weeks to finish the detection, then I'll move on to making use of it. I'm hoping to get a good chunk of that done by the mid-term evaluation, leaving me about a month to finish it and optimise some areas (to be chosen) of code. I'll detail that a bit more very soon.

834 feedbacks PermalinkPermalink

:: Archives


Powered by b2evolution