Michael - Hazard Pointers; Safe Memory Reclaimation for Lock-Free Objects
Algorithm Description
There are two fundamental difficulties involved in lock-free algorithms. The first is the ABA problem, which was originally solved by using pointer-counter pairs, where that solution has the drawback of requiring contigious double-word compare-and-swap. The second is that it is often the case that an algorithm will need to dereference a pointer when that pointer could be pointing to memory which has been deallocated, where the solution is to use a freelist to provide elements to the algorithm, where the drawback is that the number of elements the algorithm can use is provided at initialization time and all those elements are pre-allocated (e.g. free() cannot be used, so you must preallocate).
Safe Memory Reclaimation is the general term for providing a solution to the second problem. There are a couple of different techniques for achieving safe memory reclaimation; hazard pointers is one such technique.
Note that solving the second problem inherently provides a solution to the first problem. This needs explanation.
Consider a singly linked list which does not use pointer-counter pairs and so is subject to ABA. A thread comes to insert a new element, as the new head; this operation is exactly the same as pushing onto the top of a Treiber stack (the simple lock-free stack using CAS). The thread sets the next pointer of the new element to the next pointer of the head element; the thread then checks in the compare-and-swap that the head element is still the same (and in doing so makes the new element the head element). The problem comes when after setting the next pointer of the head element, another thread deletes several elements from the head of the list (returning them to the freelist) but then creates new elements at the head, where the freelist has provided the same memory addresses for the new elements, so that head ends up with the same address as before, but with a different next pointer.
However, if we are using safe memory reclaimation, things are different.
First, we are not using a freelist, but rather, malloc() and free(). When a thread deletes elements from the list, it will free them, not return them to a freelist. (It will not free them immediately; but only when it knows it is safe to do so - how it knows is what this explanation is about). We could however still fall to ABA, since the implementation of malloc() can, when the thread allocates new elements to insert at the head of the list, return the same memory addresses as before (perhaps using an internal freelist for performance reasons).
Second, and this is the important bit, the free operation on deleted elements can only occur when we know that no other threads have pointers to them - and in this case, the first thread will have a pointer to the original next element of head. As such, malloc cannot return previously in-use addresses, because they have not yet been freed.
We know when no thread retains a pointer to an element we want to free because we keep track, on a per-thread basis of the pointers threads are using to manipulate pointers in the data structure; and when we come to free, we check it's safe.
To be more exact, what happens is that when a thread comes to delete an element from a data structure, that element is placed into a delete candidate list (each thread has its own delete candidate list). This means that although threads may have pointers to this memory, no new pointers can come to point at this memory; now all we need is somehow to know when no threads continue to hold pointers to this memory.
There is a single global data structure, where hazard pointers are stored. As such, we have a set of pointers which are the full set of pointers that could be pointing at elements in delete candidate lists (no pointers in the data structure itself can be, since the element in question has been removed from the data structure).
When there are enough elements in its delete candidate list (so we don't have too much overhead), a thread then makes a hash out of all the hazard pointers and checks each delete candidate against that list. If the delete candidate isn't found, no thread is continuing to point at that element (and its not in the data structure any more), so we're free to free() it!
We see two very important benefits here; first, we can use malloc() and free(). We're not limited to the up-front allocations of freelists. Secondly, we only need single-word compare-and-swap, since ABA counters are no longer required. This permits implementation on platforms (SPARC, IA64, etc) which lack contigious double-word compare-and-swap.
Implementation Considerations
The main question seems to be the design of the global data structure which holds the hazard pointers.
Limitations
Freelists
Freelists remove the need to in-line memory allocation, by allowing pre-allocation and storing the pre-allocated elements in a list. As such, when a thread is done with an element obtained from a freelist, it will wish to push the element back to the freelist, rather than freeing the element.
What this means is that the delete candidate list actually is a push candidate list. This means that in the hazard pointer implementation, rather than always calling free on delete candidate list element which is known to be safe, we actually need a user provided function.
It also means that for a freelist to provide n elements, we actually need (n + delete candidate list purge threshold * number of threads) - which is to say, since each thread can 'buffer' up to the number of elements it needs in its delete candidate list, before purging that list, the freelist can end up buffering up a lot of elements. To ensure n elements in the freelist, you need n elements plus the maximum size of the delete candidate list per thread.
Singly Linked Lists
Element re-use is only safe when the element passes through the delete candidate list. If, for example, we had a singly linked list, ABA can happen if we merely unlink elements and then relink them to the list. Imagine we have a thread which is about to link a new element as the new head of the list. The thread sets the next pointer of the new element to the next pointer of the current head, then does CAS on the head with the new element; if the head is still the same, the new element becomes the head.
However, if immediately before the CAS, another thread unlinks a number of elements from the head, then relinks them, but in a different order, although with the head remaining the same, then we see the next pointer of head has changed, but the first threads CAS will still succeed.
Safe memory reclaimation didn't save us, because the unlinked elements didn't go through the delete candidate list. (Indeed, it would really be an unlinking candidates list, since we wouldn't be freeing the elements, but relinking them to the list. This wouldn't be practical, since it would mean an arbitrary length wait between the unlink and relink, where we wait for enough elements to fill the unlink list that it's efficient to process it and also for all other threads to no longer have pointers at the unlinked elements).
So the singly linked list only really supports adding and removing elements, not moving them around. If you want to move them, you need to remove them, duplicate them and then add the duplicate to the new location.