Queue (bounded, many producer, many consumer)

From liblfds.org
Jump to navigation Jump to search

Source Files

└───liblfds710
    ├───inc
    │   └───liblfds710
    │           lfds710_queue_bmm.h
    └───src
        └───lfds710_queue
                lfds710_queue_bmm_cleanup.c
                lfds710_queue_bmm_dequeue.c
                lfds710_queue_bmm_enqueue.c
                lfds710_queue_bmm_init.c
                lfds710_queue_bmm_internal.h
                lfds710_queue_bmm_query.c

Enums

enum lfds710_queue_bmm_query;

Opaque Structures

struct lfds710_queue_bmm_element;
struct lfds710_queue_bmm_state;

Macros

#define LFDS710_QUEUE_BMM_GET_USER_STATE_FROM_STATE( queue_bmm_state )

Prototypes

void lfds710_queue_init_valid_on_current_logical_core( struct lfds710_queue_bmm_state *qumms,
                                                       struct lfds710_queue_bmm_element *element_array,
                                                       lfds710_pal_uint_t number_elements,
                                                       void *user_state );

void lfds710_queue_bmm_cleanup( struct lfds710_queue_bmm_state *qbmms,
                                void (*element_dequeue_callback)(struct lfds710_queue_umm_state *qumms,
                                                                 void *key,
                                                                 void *value) );

void lfds710_queue_bmm_enqueue( struct lfds710_queue_bmm_state *qbmms, void *key, void *value );

int lfds710_queue_bmm_dequeue( struct lfds710_queue_bmm_state *qbmms, void **key, void **value );

void lfds710_queue_bmm_query( struct lfds710_queue_bmm_state *qbmms,
                              enum lfds710_queue_umm_query query_type,
                              void *query_input,
                              void *query_output );

Overview

This data structure implements a bounded, many-producer, many-consumer queue and internally implements exponential backoff to help deal with high load and so improve scalability.

The implementation performs no allocations. The user is responsible for all allocations (and deallocations), where these allocations are passed into the API functions, which then use them. As such, allocations can be on the stack, on the heap, or as can sometimes be the the case in embedded systems, allocated with fixed addresses at compile time from a fixed global store. Allocations can also be shared memory, but in this case, the virtual addresses used must be the same in all processes.

General usage is that the user calls lfds710_queue_bmm_init_valid_on_current_logical_core, passing in an array of struct lfds710_queue_bmm_elements which form the store for the queue, to initialize a struct lfds710_queue_bmm_state, and then calls lfds710_queue_bmm_enqueue and lfds710_queue_bmm_dequeue to enqueue and dequeue struct lfds710_queue_bmm_elements.

A queue element provides the ability to store a key and a value, both of which are of type void *. The key is not used in any way by the queue (and of course the value is neither), rather, it is available as a convenience for the user, for situations where data is being transferred between different types of data structures, where some of these data structures do support a meaningful key.

The state and element structures are both public, present in the lfds710_queue_bmm.h header file, so that users can embed them in their own structures (and where necessary pass them to sizeof).

Lock-free Specific Behaviour

The number of elements passed to the init function must be a positive integer power of two, i.e. two, four, eight, sixteen, etc.

The state initialization function, lfds710_queue_bmm_init_valid_on_current_logical_core, as the same suggests, initializes the state structure but that initialization is only valid on the current logical core. For the initialization to be valid on other logical cores (i.e. other threads where they are running on other logical cores) those other threads need to call the long-windedly named macro LFDS710_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE, which will do that which its name suggests.

In general, in liblfds, the data structures are written such that by the time one of their functions returns, the operation performed by that function (enqueue, push, pop, etc) is both complete and visible to all other logical cores in the system. However, this data structure is not like this; when its functions return, they guarantee the operation has occurred and is visible to the current logical core, but they do not guarantee that the operation is yet visible to all other logical cores in the system, and in fact there is no actual guarantee that it will ever be visible to other logical cores in the system, although in practise it's a bit like breathing - there's no actual guarantee you'll take your next breath, but usually, you always do :-)

In the spirit of Frankie Goes to Hollywood, these data structures are known as 'relaxed', because they just don't do it :-)

As such, it can be for example that, starting with a full queue, one thread fully empties the queue, but other threads for some brief time (tens of microseconds, typically) will still see the queue as full.

This can make writing code using this API awkward, but the benefit is performance. All relaxed data structures go like blazes.

It must be understood though that this is not actually a lock-free data structure. The term lock-free is actually a specific technical term, with a particular meaning, but this is not how the term is used generally in the computing community - there, rather, it simply means "not using locking mechaisms, such as mutexes and spinlocks".

The specific technical meaning is that when multiple threads are performing operations an instance of the data structure, any given single thread will be able to complete whatever operation it issues, regardless of the state of the other threads - for example, if they have been idled by the operating system. Lock-free actually means any given thread cannot be locked by other threads.

If we consider say the freelist, this is lock-free; even if we had a million threads, and they were in every possible variation of performing an operation on the freelist, and they were idled, then any thread which continues working (or any new thread) would be able to perform push or pop operations just fine.

This however is not true of this bounded queue. There is in fact a very brief window during every enqueue and dequeue operation - a few instructions wide - where if the thread is idled, no other thread can progress.

This is unfortunate - it is in fact an inherent property of using an array as the backing store. All array-backed data structures display this problem. However, as mentioned, the window is extremely brief and as such, in practise, it is not an issue.

White Paper

The code is written from scratch for liblfds, but the idea which has been implemented comes from Dmitry Vyukov, from his 1024cores.net site.

License

Dmitry Vyukov granted email permission for the idea to be used and implemented in liblfds, under the MIT license.

Example

#include <stdio.h>
#include <string.h>
#include "liblfds710.h"

struct test_data
{
  char
    name[64];
};

int main()
{
  struct lfds710_queue_bmm_element
    qbmme[8]; // TRD : must be a positive integer power of 2 (2, 4, 8, 16, etc)

  struct lfds710_queue_bmm_state
    qbmms;

  struct test_data
    td,
    *temp_td;

  lfds710_queue_bmm_init_valid_on_current_logical_core( &qbmms, qbmme, 8, NULL );

  strcpy( td.name, "Madge The Skutter" );

  lfds710_queue_bmm_enqueue( *qbmms, NULL, &td );

  lfds710_queue_bmm_dequeue( &qbmms, NULL, &temp_td );

  printf( "skutter name = %s\n", temp_td->name );

  lfds710_queue_bmm_cleanup( &qbmms, NULL );

  return( EXIT_SUCCESS );
}

See Also