r6:Function:abstraction dcas

From liblfds.org
Revision as of 14:07, 4 January 2015 by Admin (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Source Files

/src/abstraction/abstraction_dcas.c
/inc/liblfds.h

Prototype

INLINE unsigned char abstraction_dcas( volatile atom_t *destination, atom_t *exchange, atom_t *compare );

Parameters

volatile atom_t *destination

Address of the destination.

atom_t *exchange

Address of the value of the placed into destination if destination and compare are equal.

atom_t *compare

Address of the value to compare to destination; the value pointed to by compare must be over-written, after the compare, with the original value of destination.

Return Value

The return value is 0 if the swap did not occur, 1 if the swap occurred.

Notes

The original value of destination must be placed back into compare.

To solve the ABA problem, liblfds uses paired pointer-counters. This is why a contigious double-word compare-and-swap is required, rather than just a normal one-word compare-and-swap. Accordingly, the arguments to this function are pointers to a double-word (e.g. two contigious pointers, where the second is being used as a counter) and this function needs to perform a double-word compare-and-swap on those double-word pointers.

The prototype syntax could indicate that the arguments are in fact pointers to arrays. I have chosen not to do this since C pointer to array notion is clumsy and there is almost no benefit in using it. There is one issue which arises as a result of this choice; if you are to implement abstraction_dcas hand crafting assembly, remember that your argument is an array of two elements. This can matter when for example indicating to the compiler in the assembly syntax which memory locations are clobbered. It is not enough just to say *destination is clobbered; you also need to specify *(destination+1). This would automatically be taken care of it the prototype indicated pointer to array.

In principle, it is entirely reasonable to port liblfds to platforms which provide load-link/conditional-store, for here, although only single-word compare-and-swap is available, it is entirely adequate, since the stronger atomic semantics of load-link/conditional-store mean the counters are not required. On such a platform, the arguments will still be paired pointer-counters, but only the pointer (the first word) needs to be compared and swapped. This is of course inefficient, particularly in cache useage, but where I do not have yet available to me a load-link/condition-store platform to which to port, I can't really modify the code to avoid this overhead.

I have however not yet done such a port so there is no documentation or example abstraction layer and I'm sure that making such a port will lead to not insignificant changes in the code (mainly I think to get rid of the counter on such platforms).

Examples

This function is probably the most troublesome to implement since a simple-to-use platform provided function may not exist. Windows does provide a function for this, both for 32 bit CPUs and 64 bit CPUs (_InterlockedCompareExchange64 and _InterlockedCompareExchange128, respectively) but GCC does not.

Implementation under Windows is as such straightforward, but implementation on GCC for x86 and x64 requires handcrafting inline assembly to call the appropriate machine code instruction, CMPXCH8B or CMPXCH16B, respectively.

32-bit Windows (user-mode and kernel-mode) on x86 or x64

For 32 bit Windows, there is a compiler instrinsic _InterlockedCompareExchange64, which has the following prototype;

__int64 _InterlockedCompareExchange64( __int64 volatile * Destination, __int64 Exchange, __int64 Comperand );

Since this is a compiler instrinsic, it can be used in both user-mode and kernel-mode. As such, the implementation of abstraction_dcas on 32 bit Windows (user-mode or kernel-mode) looks like this;

#if (!defined _WIN64 && defined _WIN32 && defined _MSC_VER)

  /* TRD : 32 bit Windows on any CPU with the Microsoft C compiler

           (!defined _WIN64 && defined _WIN32)  indicates 32 bit Windows
           _MSC_VER                             indicates Microsoft C compiler
  */

  INLINE unsigned char abstraction_dcas( volatile atom_t *destination, atom_t *exchange, atom_t *compare )
  {
    __int64
      original_compare;

    *(__int64 *) &original_compare = *(__int64 *) compare;

    *(__int64 *) compare = _InterlockedCompareExchange64( (volatile __int64 *) destination, *(__int64 *) exchange, *(__int64 *) compare );

    return( *(__int64 *) compare == *(__int64 *) &original_compare );
  }

#endif

This implementation was chosen as an example because it highlights a particular issue; that the abstraction function is expected to return 1 if the compare occurred and 0 if it did not and also to return the original value of the destination by writing it to the value pointed at by compare.

This is a little awkward with _InterlockedCompareExchange64, because this function doesn't directly tell you if the swap occurred or not. What it does is return the original value of destination. The caller then needs to check this original value of destination with his compare value. If they match, the caller can then know, a priori, that the swap must have occurred.

GCC on x64

There is no direct support for a contigious double-word compare-and-swap on GCC. As such, we must hand craft inline assembly to call the appropriate machine code instruction.

As such, the implementation of abstraction_dcas on GCC on x64 looks like this;

#if (defined __x86_64__ && __GNUC__)

  /* TRD : any OS on x64 with GCC

           __x86_64__  indicates x64
           __GNUC__    indicates GCC
  */

  INLINE unsigned char abstraction_dcas( volatile atom_t *destination, atom_t *exchange, atom_t *compare )
  {
    unsigned char
      cas_result;

    __asm__ __volatile__
    (
      "lock;"           // make cmpxchg16b atomic
      "cmpxchg16b %0;"  // cmpxchg16b sets ZF on success
      "setz       %3;"  // if ZF set, set cas_result to 1

      // output
      : "+m" (*(volatile atom_t (*)[2]) destination), "+a" (*compare), "+d" (*(compare+1)), "=q" (cas_result)

      // input
      : "b" (*exchange), "c" (*(exchange+1))

      // clobbered
      : "cc"
    );

    return( cas_result );
  }

#endif

The code used to create inline assembly is of course entirely platform dependent. Note the cast of *destination to *(volatile atom_t (*)[2]) destination, as per comments in the Notes section, due to destination in fact pointing to an array of two elements.

See Also