Finally had a full day coding.
Got the new topology code working.
Have realised both the benchmark and test both need actually to be libraries, with all the functionality operated by what is ideally one function call, and with a trivial wrapper binary provided for command line use on command line supporting platforms.
The reason for this of course is that many platforms do not have command lines, or stdout (or malloc) and so all the functionality needs to be in a library, with a porting abstraction layer.
Benchmark, got to now figure out how to generate thread sets, and then what to do about NUMA mallocs in tests.
Test, I gotta take a look at it and see what’s involved in moving it to a library.
I’ve refactored the test code into a library, plus a thin wrapper command line executable.
The library offers the test functionality in a platform independet way - in a Standard Library independent way, not using stdout, malloc, etc, so it can easily be run on embedded platforms, in kernels etc.
The overall work is done, I’ve got two tests running, now I need to bring over all the tests to the new standard (they’re passed a big block of memory which they use for everything they need, where-as before, they internally called malloc).
Once I’ve done that, then it’s back to the benchmark app.
With vital assistance from Sebastiano Vigna, of PRNG shootout fame;
http://xorshift.di.unimi.it/
I have rewritten the liblfds PRNG code to be lock-free.
The existing code provided two PRNGs - a xorshift1024* (with a spinlock!) to provide high quality seed values for a xorshift32 or xorshift64* (depending on 32/64 bit platform), where the latter ran on single-threaded PRNG states.
What exists now is a single PRNG, with a 32 bit and 64 bit version (the SplitMix PRNG), which is lock-free, and so can be used by any number of threads in parallel, blah, the usual - and, critically, is now in keeping with the rest of the library and can in fact be offered as a lock-free ‘data structure’, along with all the rest.
I can however think of only two reasons for using a lock-free PRNG.
First, convenience. Per-thread state requires, well, per-thread state. A single PRNG state which is thread-safe removes that problem.
Second, is if seed generation is a problem. If there is no source of seed entropy, then one seed must be provided per thread. If the number of threads is variable, then… well, then you need to have two PRNGs, one to generate seeds from a single original seed, and another to run from those seeds - which brings you back to needing a spinlock on the original PRNG.
Oouuf.
That was quite a bit of work, and I thought it would be.
I’ve refactored the test application so that it no longer uses the standard library or performs memory allocation.
It still needs threads, though.
The test app is now almost completely a library, with a convenience command line wrapper provided.
The test suite is performed by calling basically one function, and you get back the results of which tests passed and which failed (and why).
All you have to do as the caller is pass in a block of memory (for the tests) and implement the threads abstraction layer, which is already done for Linux user-mode and Windows user-mode. You can also pass in callbacks for text output if you want to see results as they come. I think I’ve got code for Windows kernel-mode. I’ll need to see about Linux kernel-mode.
There is still one fundamental shortcoming though. One use case for the library is on embedded single-threaded cores. The use case is that the lock-free data structure instances are safe to use both inside and outside of interrupt handlers. The test application has no way of running in that environment since it uses threads.
I hate to do this, because I want to support platforms, but the impact of supporting Visual Studio is massively retarding the rate at which I write software in general for the library - that’s the real problem.
I am strongly motivated by finishing work. So for example right now I’m working on the benchmark app. It’ll be fantastic and unbelieveably useful and I can’t wait to get it done - that excites me, and so that motivates me.
Problem is, when I get it done, it’s not done. See, I need to release it. Now, for Linux, release and test is trivial. It takes minutes. Adjust the makefiles, build, run.
It could not be a more different story under Windows, when using Visual Studio.
See, there are two problems. First, I have Windows 7 - not licensed, but it doesn’t expire; but I only have a 90 day trial version of Windows 8. Every 90 days, when I want to use MSVC 2013, which requires Windows 8, I have to install Windows 8 from scratch, and install MSVC 2013 from scratch.
I then also need to install the kernel development kit from scratch.
Going back to Windows 7, it doesn’t expire, but I only have a trial version of MSVC 2012. That does expire, so I have to install that from scratch, and the kernel development kit from scratch - making VMs for each of these installs, since you can’t co-install these various versions of MSVC.
So; I want to make a release. Linux? run a batch file. Builds all builds, runs all test suites. One command, then wait for a bit. Done.
Windows? copy the Windws 7 VM twice (once for 2012 without WDK, once for with), make a new VM for Windows 8, install Windows 8 from scratch, install MSVC2013 from scratch, clone the VM, install the WDK on the clone.
That’s just the beginning, though. All I’ve done so far is get working tools.
At this point I need to make any new solution files which are required.
There are currently three solutions files per combination of MS tools. So that’s three for MSVC2012, three for MSVC2012+WDK, three for MSVC2013, three more for MSVC2013+WDK (Intel) and three more for MSVC2013+WDK (ARM).
I need to add another two, so that’s another two for each platform.
Setting up a new solution file is an ordeal. It takes hours and hours and hours - literally - of adjusting hundreds and hundreds of settings, click after click after click after click. The mouse is not a suitable interface for this kind of work.
It is unbelieveably tedius and fiddly and this itself makes it hard to get the solution files right. It’s like trying to line up a thousand different pins, all of which are inside a haystack.
It takes a minute, tops, under Linux, with a makefile. I just change the filenames in the makefile.
I think I have found a way to clone and rename solution files, but I’m not 100% sure it’s legit.
Furthermore, for the kernel solutions, I don’t actually know if I’ve set them up correctly. I can find no documentation on doing so - everything seems to rely on you following the wizards, but the wizards make no sense to me in what they come out with. I want a LIB Or DLL - I can’t see how to get it - but I know how to get it manually for non-kernel builds. I’ve tried to do the same for the kernel builds. Is it right? of course not.
So the problem is this - I am motivating by finishing. Right now, my motivation is ruined by knowing I have this appalling, awful, hideous lumphammer of work ahead of me. No matter what I do now, I’m not finished - I’m miles from finished.
Because of this, I’m just not getting much done.
So what I’m thinking to do is keep supporting MS, but only with the command line compilers under Windows 7 and gnumake. MS have stopped providing command line compilers, so I’ll be used old software - but liblfds is platform independent. All it really needs is threads, atomic ops and a bare C compiler - so that shouldn’t be an issue.
This also means I’ll keep supporting WDK 7.1, the obsolete WDK for Windows XP, as this is command line based. It’s crude as hell, but it basically has something like a makefile, so I configure it as I would under Linux - I adjust a file by typing, and it’s easy.
I am thinking that now and then I will construct the solutions files for MSVC and release them - i.e. port a build to MSVC by providing the build files - but I’m not going to do this as I have done till now for every release.
The benchmark code is coming along nicely.
I’ve added a btree benchmark - lock-free beats everything else by 5x to 10x, ATM, although I think I can imrpove the non-lock-free performance by 2x.
The queue benchmark is also back up.
I was able to perform an experiment I’ve had in my mind for a bit - there is a CAs at the end of enqueue, which updates the enqueue pointer to point at the newly enqueued element. It’s “optional”, in that the dequeue will move the enqueue pointer if it’s out of place. So I took it out and re-ran the benchmark.
Performance dropped by about 20%.
Benchmark app coming along nicelly :-)