I’m working away on the new test and benchmark application.
I need to support creating processes, to test position-independent data structures.
That means I need to pin processes to particular logical cores.
Know what?
That’s what’s written on the sign that points the way into hell.
Let me put this bluntly : Windows has no API to set process affinity beyond the first processor group, which has a maximum of 64 logical cores.
You read that right.
So if you have say 128 cores, and let’s say Windows has split these up into two 64 core groups - you can only set process affinity to be on cores 0 to 63.
You can set thread affinity to be on any core - but this is not the same as process affinity, and is less performant - but it looks like this is the best you can do.
It’s problematic to do this remotely (from another process). To do so you’d need to call CreateRemoteThreadEx(). In my case, I’m spawning new processes and I want them to quit when the benchmark work is done, so I need to co-ordinate between the main thread (which begins when the process is spawned) and the thread created by CreateRemoteThreadEx(), which will be created at some point after the main thread… it’s hard to wait on things in the main thread which haven’t yet been created. I could busy wait on a global variable…. but this is stomach-twistingly bad. I don’t want to write code like this.
You can set thread affinity from within the process itself by calling SetThreadGroupAffinity(). Obviously to use this you have to pass in information about which logical core in which processor group. I’m passing in some information already to the child process, through the command line (shared memory name and length in bytes), so I’ll have to add this.
It’s still not what I actually want. I want to set process affinity, from the parent process.
Windows thread/process affinity APIs are Civil Service quality - and I don’t mean the British Civil Service. I mean the Egyptian Civil Service.
Next step, finding out how bad it is under Linux. It’ll be bad, but it won’t be as bad, even if it’s just by not having processor groups, which are the worst single concept I’ve encountered since MS-DOS was designed with a 640kb RAM limit.
Tzo!
Coded all day.
Have the new test and benchmark app to the point it compiles.
Still need to do some key work, but it’s an important step.
Importantly, I realised I’d made a huge blunder all along in test and benchmark - I use in test and in benchmark liblfds data structures, the list in particular.
I can’t do that, because liblfds is designed to offer data structures to the extent your system offers atomics; so you might not have the list.
In fact, the test and benchmark code needs to use single threaded data structures throughout.
This means I need to put some of the single-threaded data structure (stds) library data structures in the test and benchmark library.
I also need to introduce versioning on the stds code in liblfds, so multiple released can be compiled in the same project.
I finish my current contract work on Tuesday, and I’ll be taking a few years off, so the next release will come reasonable soon - few months tops.