Well wadda ya know - I’ve actually done some real actual more work today - about six or so hours. I’ve been working on the first real SMR test, making it work and getting a feel for how the new SMR API is to use. Found a design flaw in how SMR data structure cleanup was arranged and fixed it.
The basic use principles of the new SMR API seem to simpify out to something like; think only about your current thread, take it as read that anything you put into SMR won’t come out till you call SMR release processing, and remember that any given call to SMR release processing may release zero elements, but that sooner or later, calling SMR release processing will release elements, so you can often find the way to write code is to have your operation in a loop, which exits when the operation is successful, where in the loop you call SMR release processing if the operation failed.
So, the freelist rapid push/pop test, using SMR, revealed an SMR bug, which in turn revealed the changes I made to the SMR function for advancing the generation counter, such that that function became multi-threaded (rather than being a critical section bounded by CAS flag) were broken.
What I’ve come up with now instead is the idea that there is a “setting” phase, where threads set their SMR state flags, and then when a thread sees that the generation counter can be advanced, we then have a “clearing” phase, where threads calling the generation-advance function act not to check to see if the generation counter can be advanced, but rather they check to see if all the per-thread flags have been cleared, so that we’re then in the correct state for threads to start the work again of indicating the SMR state so we can again see if the generation counter can be advanced.
The whole point of all this is to ensure any single thread entering the check function can make forward progress - before, this was not the case. I think the design is sound, although I’ve not yet implemented, because…
…I’ve realised in the course of this there is a design flaw, and I’ve not yet resolved it. The handle idle threads, I have it so that when a thread enters a lock-free section, it sets a flag - “LOCK_FREE_IN_PROGRESS” - and lowers that flag on exit. Design flaw is that the code only uses a store barrier, so there’s no guaranteee that flag is visible to other threads (who have issued a load barrier) until an atomic operation is performed - which typically occurs in the act of performing the lock-free work being done, i.e. after we’ve read in and are using sensitive memory addresses.
Any real SMR has to support idle threads, so I have to think of a solution.
Two steps forward, one step backwards, the mantra of all lock-free design work :-)
Made the new SMR compile.
There’s an outstanding design issue, which is how to handle new threads, since they will not have the correct generation count.
I know basically what I’m going to do - there will be a flag, one of the bit in the generation count / flags word, which indicates a new thread, and it’ll get special handling.
So, I wanted to write down the new SMR design.
First, an SMR state is instantiated. As usual, you can have as many as you want, they’re independent. Each thread which is using an SMR state has its own SMR thread state. That SMR thread state is associated with a single SMR state (by a function call).
Threads typically use lock-free data structure APIs, to perform their lock-free work which is using SMR. When such an API function is called, the thread passes in its SMR thread state (this contains a pointer to the SMR state it is associated with).
When a data structure begins some lock-free work (popping from a freelist, etc) it calls a macro to enter a lock-free section, and when it’s done, it calls another macro to leave the lock-free section. These macros set state in the SMR thread state. As such, we know when a thread is in a lock-free section (the entry flag is raised, the exit flag is not) and we know if the thread has been idle since we last looked at it (both flags have not changed).
The SMR API functions which perform the SMR work are manually called by the user; they are not called behind the scenes.
There are two SMR API functions for performing work. The first tried to advance the generation counter (more on this below); the second releases all releasable allocations for the calling thread.
The SMR state maintains a generation counter. This begins at 0 and wraps (unsigned int; we do the necessary math to handle when we wrap).
When an allocation is submitted to SMR (which is to say, has exited a data structure and now we need to know no threads are pointing to it), the current generation counter is recorded.
When the user calls the function to try to advance the generation counter, the function examines the SMR thread state for every thread associated with that SMR state.
If since the last examiniation every thread has either been idle, or has exited one or more lock-free sections, then we know that every allocation with an earlier generation count cannot be pointed at by any thread - as such, we advance the generation counter.
When the user has a thread call the other SMR API function, it scans the allocations submitted by the current thread to SMR and if their generation count is older than the current count, releases them.
That’s the broad overview, and it hasn’t changed.
What’s new though is that I’ve made changes to make the first SMR API functon (advancing the generation counter). This function needs to allow foward progress for any thread calling it - i.e. no matter what other threads are in this function, no matter what state it is in, when a new thread comes to it, it needs to be able to complete the operation and advance the generation counter.
To achieve this the function now has two stages; setting and clearing. In the first stage, setting, the work being done is scanning the SMR thread states to see if they have all been idle or exited a lock-free section. When this is found to be so, the generation counter is then advanced, and the function moves to the second stage, clearing, where it revisits the SMR thread states and updates the information they contain (lowers the flag for having exited a lock-free section, etc).
Part of implementing this means that the generation count and the flags are now in the same word, so they can be atomically set together; for when in the settings phase we check an SMR thread state, we need to be sure we are checking the flags for the expected (and so correct) generation in the SMR thread state.