Memory Barriers (part 2)
Introduction
After reading the first article, the question which usually arises after a couple of days is something like this - "how does the load barrier help? okay, it's issued, but anything could happen immediately after that but before the atomic operation is issued, so surely we're back where we started?"
The Short Answer
So long as the load barrier occurs after the atomic operation which honoured the store barrier, then although of course anything can happen instantly after the load barrier is issued, we will however still know we have seen everything which occured up to the point of the atomic operation which honoured the store barrier.
What you then have to keep in mind is that implicit in all code which uses lock-free operations is a control of the order in which events occur, so that we ensure this ordering of load and store barriers is honoured. Consider a lock-free linked-list; for a thread to be able to read an element from the list, an element must be in the list - otherwise, how else can the reading thread even know of the existence of that element? So we see that inherent in the linked-list code is control of the order of events - an element must be added to the list before it can be read while in the list.
So if we consider an element, first it must be added, so the store barrier is issued (to ensure the void pointer of user data reaches memory before our adjustments to the linked list pointers) and the atomic operation necessary to add the element occurs; and then, when we come to read, a load barrier is issued before we try to read the element - so when we read, we know we are seeing all elements which have been added to the list prior to the issuing of the load barrier. Of course, more elements could be added the instant after the load barrier is issued - but we won't see these elements (or, rather, we're not guaranteed to see them - we might get lucky and they might end up being visible anyway) - but that's okay; we will see everything up to the point the load barrier was issued - and that's enough because in general, when programming, from the users point of view, the operations performed by a function are considered to have fully occurred by the time the function returns. So when a user adds a new element, he expects the element to be added by the time the linked list "add new element" function returns. Similarly, when reading the list, the user expects the function to see everything that has happened up to the point the "read element" function is called. These expectations are honoured; if the "add" functions have returned, their effects will be seen by any "read" function which occurs afterwards.
If however an "add" function is called after the load barrier is issued, then the "add" function must be being called after the "read" function has been called, and so it's okay if the results of that "add" function are not seen.
The memory modified by atomic operations does not need barriers, as atomic operations always read and write directly from memory (or at least out to the cache coherency protocol, which amounts to the say thing). They really know the truth, always.
It is the same for all lock-free data structures. A key part of their design always is to ensure the order of events is correct, so that data is always correctly read.