Difference between pages "r7.1.1:Usage Guide (libbenchmark)" and "r7.1.1:Usage Guide (liblfds)"

Latest revision as of 09:44, 23 February 2019

Introduction

This page describes how to use the liblfds library, and then covers the novel and pecuilar issues which originate in the lock-free nature of the data structures in the library.

Where-ever possible such issues have been hidden from the user, but there are some which simply cannot be hidden and as such the user has be aware of them.

Library Initialization and Cleanup

No library initialization or cleanup are required.

Usage

To use liblfds, include the header file liblfds711.h and link as normal to the library in your build.

Novel and Peculiar Issues

Memory Allocation

The liblfds library performs no memory allocation or deallocation. Accordindly, there are no new and delete functions, but rather init and cleanup.

The user is responsible for all allocation and all deallocation. As such, allocations can be from the heap or from the stack, or from user-mode or from the kernel; the library itself just uses what you give it, and doesn't know about and so does not differentiate between virtual or physical addresses. Allocations can be shared memory, but note the virtual memory ranges must be the same in all processes - liblfds uses addresses directly, rather than using offsets. Being able to used shared memory is particularly important for Windows, which lacks a high performance cross-process lock; the data structures in liblfds when used with shared memory provide a process and thread safe cross-process communication channel (but they do not provide sychronization, so the reader cannot be signalled, by the library, as to when to read).

Memory Deallocation

Any data structure element which has at any time been present in a lock-free data structure can never be passed to free until the data structure in question is no longer in use and has had its cleanup function called.

As such, typical usage is for data structure elements to be supplied from (and returned to) a lock-free freelist.

There is a single exception to this, which is the unbounded, many producer, many consumer queue. It is safe to deallocate elements which have emerged from this data structure.

Data Structure Initialization

Passing a data structure state to its init function initializes that state but that initialization is and is only valid for the logical core upon which it occurs.

The macro LFDS711_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE is used to make the initialization valid on other logical cores and it will make the initialization valid upon and only upon the logical core which calls the macro.

Expected use is that a thread will initialize a data structure, pass a pointer to its state to other threads, all of whom will then call LFDS711_LIBLFDS_MAKE_USABLE_TO_CURRENT_LOGICAL_CORE_INITS_PERFORMED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE.

The Standard Library and lfds711_pal_uint_t

The liblfds library is intended for both 32 and 64 bit platforms. As such, there is a need for an unsigned type which is 32 bits long on a 32 bit platform, and 64 bits long on a 64 bit platform - but remember that the Standard Library is not used, so we can't turn to it for a solution (and also that for C89, there wasn't really such a type anyway - size_t did in fact behave in this way on Windows and Linux, but semantically size_t means something else, and so it is only co-incidentally behaving in this way).

As such, liblfds in the platform abstraction layer typedefs liblfds711_pal_uint_t (and a signed equivelent, lfds711_pal_int_t). This is set to be an unsigned integer which is the natural length for the platform, i.e. the length of the processor register, 32 bits on a 32 bit CPU and 64 bits on a 64 bit CPU.

Exclusive Reservation Granule (ARM, POWERPC)

On ARM and POWERPC there is a define in the liblfds header file lfds711_porting_abstraction_layer_processor.h, LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES which SHOULD BE SET CORRECTLY, as the value in the header file is necessrily the worse-case (longest) value, which in the case of ARM is 2048 bytes - which has the effect ot making many of the data structure structures HUGE.

There are two approaches in hardware, in processors, to implementing atomic operations. The first approach is compare-and-swap (CAS), the second approach is load-linked/store-conditional (LL/SC).

Atomic operation involve writing a variable to memory. CAS implements atomicity by locking the cache-line containing the variable. LL/SC implements atomicity by loading the variable into a register, where anything can be done with it, but watching the memory location it comes from until the variable is stored again; if in the meantime another write occurred to that memory location, the store aborts.

The granularity of the 'watching' varies a great deal. On some platforms, such as MIPS, only one 'watcher' is available per logical processor. On some platforms, such as ARM and POWERPC, memory is conceptually divided up into pages (which are known as "Exclusive Reservation Granules", "ERG" for short) and if a write occurs to the page which contains the target varible, then the store fails.

On ARM the Exclusive Reservation Granule range in size from 8 to 2048 bytes (always a power of two, though - 8, 16, 32, 64, etc), depending on implementation.

Obviously, for liblfds to alway work, the header file has to use the 2048 byte value - but where in all the liblfds structures variables which are the target of atomic operations have to be in their own granule, then naturally the larger the granule size, the larger the structure. Some structures have a number of variables which are the target of atomic operations, and so those structures can become very large.

This then leads to the question of finding out or determining the ERG length.

The ERG length can be obtained from the processor, it's stored in a register, but on ARM this is only possible when the processor is in supervisor mode; so liblfds cannot access this information. The documentation for any given system should, somewhere deeply buried, indicate the ERG length.

There is however another way. The libtest library offers a function libtest_misc_determine_erg which attempts to empircally determine the ERG length, by running on one logical core an LL operation, then on every other logical core touching memory just inside the largest possible ERG size and then trying the SC operation, repeating this with progressively smaller ERG sizes, until the operation fails, which indicates the ERG size.

This function can only work on systems which have more than one physical processor (multiple logical processors in one physical processor is not enough). This is because ARM implements per-processor 'local' watchers, which are typically much more relaxed than the 'global' (system-wide, i.e. multiple physical processors) watchers, which normally not throw an error even if a write occurs inside the ERG - i.e. with the local watcher only, it's not possible to make the LL/SC fail, so the code cannot work out the length.

The test binary has an argument, "-e", which runs a test using this function, like so;

test -e

The output will look like this (plus a bunch of fixed explanatory text);

 ERG length in bytes : Number successful LL/SC ops
 =================================================
 4 bytes : 0
 8 bytes : 0
 16 bytes : 0
 32 bytes : 0
 64 bytes : 1024
 128 bytes : 1023
 256 bytes : 1023
 512 bytes : 1024
 1024 bytes : 1024
 2048 bytes : 12

The ERG size is on the left, the number of successful LL/SC ops on the right. Each size is tested 1024 times. The smallest size with 1024, or almost 1024, operations, is the ERG size. (LL/SC ops can fail naturally due to system activity, so it's expected that sometimes one or two LL/SC operations will fail by themselves and so the total value will be slightly slower).

We see here the ERG sie for this platform is 64 bytes, which is correct - this is a Cortex A7 in a Raspberry Pi 2 Model B.

(That there are 12 successful ops for the 2048 byte size is not in fact understood. There is a 4 byte ERG test as a sanity check, because if it passes, then it is clear the test is not working properly).

@@ Line 1: / Line 1: @@
-{{DISPLAYTITLE:Usage Guide (libbenchmark)}}
+{{DISPLAYTITLE:Usage Guide (liblfds)}}
 ==Introduction==
-This page describes how to use the ''libbenchmark'' library.
+This page describes how to use the ''liblfds'' library, and then covers the novel and pecuilar issues which originate in the lock-free nature of the data structures in the library.
-The library implements a great deal of functionality, almost all of which is used and only used by the ''libbenchmark'' library itself.  From the point of view of an external caller to its API, there are only a few functions; a couple to init and run the entire benchmark suite, a few more to handle the results from a benchmark run and one or two miscellanous functions.
+Where-ever possible such issues have been hidden from the user, but there are some which simply cannot be hidden and as such the user has be aware of them.
+==Library Initialization and Cleanup==
+No library initialization or cleanup are required.
 ==Usage==
-To use ''libbenchmark'', include the header file ''libbenchmark.h'' and link as normal to the library in your build.
+To use ''liblfds'', include the header file ''liblfds711.h'' and link as normal to the library in your build.
+==Novel and Peculiar Issues==
+===Memory Allocation===
+The ''liblfds'' library performs no memory allocation or deallocation.  Accordindly, there are no ''new'' and ''delete'' functions, but rather ''init'' and ''cleanup''.
+The user is responsible for all allocation and all deallocation.  As such, allocations can be from the heap or from the stack, or from user-mode or from the kernel; the library itself just uses what you give it, and doesn't know about and so does not differentiate between virtual or physical addresses.  Allocations can be shared memory, but note the virtual memory ranges must be the same in all processes - ''liblfds'' uses addresses directly, rather than using offsets.  Being able to used shared memory is particularly important for Windows, which lacks a high performance cross-process lock; the data structures in ''liblfds'' when used with shared memory provide a process and thread safe cross-process communication channel (but they do not provide sychronization, so the reader cannot be signalled, by the library, as to ''when'' to read).
+===Memory Deallocation===
+Any data structure element which has at any time been present in a lock-free data structure can never be passed to ''free'' until the data structure in question is no longer in use and has had its ''cleanup'' function called.
+As such, typical usage is for data structure elements to be supplied from (and returned to) a lock-free freelist.
+There is a single exception to this, which is the unbounded, many producer, many consumer queue.  It ''is'' safe to deallocate elements which have emerged from this data structure.
+===Data Structure Initialization===
+Passing a data structure state to its ''init'' function initializes that state but that initialization is and is only valid for the logical core upon which it occurs.
+The macro ''[[r7.1.1:LFDS711_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE|LFDS711_MISC_MAKE_VALID_ON_CURRENT_LOGICAL_CORE_INITS_COMPLETED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE]]'' is used to make the initialization valid on other logical cores and it will make the initialization valid upon and only upon the logical core which calls the macro.
+Expected use is that a thread will initialize a data structure, pass a pointer to its state to other threads, all of whom will then call ''LFDS711_LIBLFDS_MAKE_USABLE_TO_CURRENT_LOGICAL_CORE_INITS_PERFORMED_BEFORE_NOW_ON_ANY_OTHER_LOGICAL_CORE''.
+===The Standard Library and ''lfds711_pal_uint_t''===
+The ''liblfds'' library is intended for both 32 and 64 bit platforms.  As such, there is a need for an unsigned type which is 32 bits long on a 32 bit platform, and 64 bits long on a 64 bit platform - but remember that the Standard Library is not used, so we can't turn to it for a solution (and also that for C89, there wasn't really such a type anyway - ''size_t'' did in fact behave in this way on Windows and Linux, but semantically ''size_t'' means something else, and so it is only co-incidentally behaving in this way).
+As such, ''liblfds'' in the platform abstraction layer typedefs ''liblfds711_pal_uint_t'' (and a signed equivelent, ''lfds711_pal_int_t'').  This is set to be an unsigned integer which is the natural length for the platform, i.e. the length of the processor register, 32 bits on a 32 bit CPU and 64 bits on a 64 bit CPU.
+===Exclusive Reservation Granule (ARM, POWERPC)===
+On ARM and POWERPC there is a define in the ''liblfds'' header file ''lfds711_porting_abstraction_layer_processor.h'', ''LFDS711_PAL_ATOMIC_ISOLATION_IN_BYTES'' which '''SHOULD BE SET CORRECTLY''', as the value in the header file is necessrily the worse-case (longest) value, which in the case of ARM is 2048 bytes - which has the effect ot making many of the data structure structures '''HUGE'''.
+There are two approaches in hardware, in processors, to implementing atomic operations.  The first approach is ''compare-and-swap'' (CAS), the second approach is ''load-linked/store-conditional'' (LL/SC).
+Atomic operation involve writing a variable to memory.  CAS implements atomicity by locking the cache-line containing the variable.  LL/SC implements atomicity by loading the variable into a register, where anything can be done with it, but watching the memory location it comes from until the variable is stored again; if in the meantime another write occurred to that memory location, the store aborts.
+The granularity of the 'watching' varies a great deal.  On some platforms, such as MIPS, only one 'watcher' is available per logical processor.  On some platforms, such as ARM and POWERPC, memory is conceptually divided up into pages (which are known as "Exclusive Reservation Granules", "ERG" for short) and if a write occurs to the page which contains the target varible, then the store fails.
+On ARM the Exclusive Reservation Granule range in size from 8 to 2048 bytes (always a power of two, though - 8, 16, 32, 64, etc), depending on implementation.
-==Dependencies==
+Obviously, for ''liblfds'' to alway work, the header file has to use the 2048 byte value - but where in all the ''liblfds'' structures variables which are the target of atomic operations have to be in their own granule, then naturally the larger the granule size, the larger the structure.  Some structures have a number of variables which are the target of atomic operations, and so those structures can become ''very'' large.
-The ''libbenchmark'' libraries depends on the ''libshared'' library and the ''liblfds711'' library.
-==Source Files==
+This then leads to the question of finding out or determining the ERG length.
- └── test_and_benchmark
-     └── libbenchmark
-         ├── inc
-         │   └── libbenchmark
-         │       ├── libbenchmark_benchmarksuite.h
-         │       ├── libbenchmark_enums.h
-         │       ├── libbenchmark_gnuplot.h
-         │       ├── libbenchmark_results.h
-         │       ├── libbenchmark_topology.h
-         │       └── libbenchmark_topology_node.h
-         └── src
-             ├── libbenchmark_benchmarksuite
-             │   ├── libbenchmark_benchmarksuite_cleanup.c
-             │   ├── libbenchmark_benchmarksuite_gnuplot.c
-             │   ├── libbenchmark_benchmarksuite_init.c
-             │   ├── libbenchmark_benchmarksuite_internal.h
-             │   └── libbenchmark_benchmarksuite_run.c
-             ├── libbenchmark_results
-             │   ├── libbenchmark_results_cleanup.c
-             │   ├── libbenchmark_results_get_result.c
-             │   ├── libbenchmark_results_init.c
-             │   └── libbenchmark_results_internal.h
-             ├── libbenchmark_topology
-             │   ├── libbenchmark_topology_cleanup.c
-             │   ├── libbenchmark_topology_init.c
-             │   ├── libbenchmark_topology_internal.h
-             │   ├── libbenchmark_topology_numa.c
-             │   └── libbenchmark_topology_lpsets.c
-             └── libbenchmark_topology_node
-                 ├── libbenchmark_topology_node_cleanup.c
-                 └── libbenchmark_topology_node_init.c
-This is a small subset of the full set of files, and shows only those files used by the publically exposed APIs.
+The ERG length can be obtained from the processor, it's stored in a register, but on ARM this is only possible when the processor is in supervisor mode; so ''liblfds'' cannot access this information.  The documentation for any given system should, somewhere deeply buried, indicate the ERG length.
-==Defines==
+There is however another way.  The ''libtest'' library offers a function ''libtest_misc_determine_erg'' which attempts to empircally determine the ERG length, by running on one logical core an LL operation, then on every other logical core touching memory just inside the largest possible ERG size and then trying the SC operation, repeating this with progressively smaller ERG sizes, until the operation fails, which indicates the ERG size.
-  #define [[r7.1.1:define LIBBENCHMARK_BENCHMARKSUITE_OPTION_DURATION|LIBBENCHMARK_BENCHMARKSUITE_OPTION_DURATION]]
-==Enums==
+This function can only work on systems which have more than one ''physical'' processor (multiple logical processors in one physical processor is not enough).  This is because ARM implements per-processor 'local' watchers, which are typically much more relaxed than the 'global' (system-wide, i.e. multiple physical processors) watchers, which normally not throw an error even if a write occurs inside the ERG - i.e. with the local watcher only, it's not possible to make the LL/SC fail, so the code cannot work out the length.
- enum [[r7.1.1:enum libbenchmark_benchmark_id|libbenchmark_benchmark_id]];
- enum [[r7.1.1:enum libbenchmark_datastructure_id|libbenchmark_datastructure_id]];
-  enum [[r7.1.1:enum libbenchmark_lock_id|libbenchmark_lock_id]];
- enum [[r7.1.1:enum libbenchmark_topology_node_type|libbenchmark_topology_node_type]];
- enum [[r7.1.1:enum libbenchmark_topology_numa_mode|libbenchmark_topology_numa_mode]];
-==Opaque Structures==
+The ''test'' binary has an argument, "-e", which runs a test using this function, like so;
- struct [[r7.1.1:struct lfds711_list_aso_element|lfds711_list_aso_element]];
- struct [[r7.1.1:struct lfds711_list_aso_state|lfds711_list_aso_state]];
- struct [[r7.1.1:struct lfds711_list_asu_state|lfds711_list_asu_state]];
- struct [[r7.1.1:struct libbenchmark_gnuplot_options|libbenchmark_gnuplot_options]];
- struct [[r7.1.1:struct libbenchmark_results_state|libbenchmark_results_state]];
- struct [[r7.1.1:struct libbenchmark_benchmarksuite_state|libbenchmark_benchmarksuite_state]];
- struct [[r7.1.1:struct libbenchmark_topology_state|libbenchmark_topology_state]];
- struct [[r7.1.1:struct libbenchmark_topology_node_state|libbenchmark_topology_node_state]];
- struct [[r7.1.1:struct libshared_memory_state|libshared_memory_state]];
-==Macros==
+  test -e
-  #define [[r7.1.1:macro LIBBENCHMARK_TOPOLOGY_NODE_SET_TYPE|LIBBENCHMARK_TOPOLOGY_NODE_SET_TYPE]]( tns, node_type )
- #define [[r7.1.1:macro LIBBENCHMARK_TOPOLOGY_NODE_SET_LOGICAL_PROCESSOR_NUMBER|LIBBENCHMARK_TOPOLOGY_NODE_SET_LOGICAL_PROCESSOR_NUMBER]]( tns, processor_number )
- #define [[r7.1.1:macro LIBBENCHMARK_TOPOLOGY_NODE_SET_WINDOWS_GROUP_NUMBER|LIBBENCHMARK_TOPOLOGY_NODE_SET_WINDOWS_GROUP_NUMBER]]( tns, win_group_number )
- #define [[r7.1.1:macro LIBBENCHMARK_TOPOLOGY_NODE_UNSET_WINDOWS_GROUP_NUMBER|LIBBENCHMARK_TOPOLOGY_NODE_UNSET_WINDOWS_GROUP_NUMBER]]( tns )
- #define [[r7.1.1:macro LIBBENCHMARK_GNUPLOT_OPTIONS_INIT|LIBBENCHMARK_GNUPLOT_OPTIONS_INIT]]( gpo )
- #define [[r7.1.1:macro LIBBENCHMARK_GNUPLOT_OPTIONS_SET_Y_AXIS_SCALE_TYPE_LOGARITHMIC|LIBBENCHMARK_GNUPLOT_OPTIONS_SET_Y_AXIS_SCALE_TYPE_LOGARITHMIC]]( gpo )
- #define [[r7.1.1:macro LIBBENCHMARK_GNUPLOT_OPTIONS_SET_WIDTH_IN_PIXELS|LIBBENCHMARK_GNUPLOT_OPTIONS_SET_WIDTH_IN_PIXELS]]( gpo, wip )
- #define [[r7.1.1:macro LIBBENCHMARK_GNUPLOT_OPTIONS_SET_HEIGHT_IN_PIXELS|LIBBENCHMARK_GNUPLOT_OPTIONS_SET_HEIGHT_IN_PIXELS]]( gpo, wip )
-==Prototypes==
+The output will look like this (plus a bunch of fixed explanatory text);
- int [[r7.1.1:function libbenchmark_topology_init|libbenchmark_topology_init]]( struct libbenchmark_topology_state *ts,
-                                 struct libshared_memory_state *ms );
- void [[r7.1.1:function libbenchmark_topology_cleanup|libbenchmark_topology_cleanup]]( struct libbenchmark_topology_state *ts );
- void [[r7.1.1:function libbenchmark_topology_generate_deduplicated_logical_processor_sets|libbenchmark_topology_generate_deduplicated_logical_processor_sets]]( struct libbenchmark_topology_state *ts,
-                                                                          struct libshared_memory_state *ms,
-                                                                          struct lfds711_list_asu_state *lp_sets );
- void [[r7.1.1:function libbenchmark_topology_generate_numa_modes_list|libbenchmark_topology_generate_numa_modes_list]]( struct libbenchmark_topology_state *ts,
-                                                      enum libbenchmark_topology_numa_mode numa_mode,
-                                                      struct libshared_memory_state *ms,
-                                                      struct lfds711_list_asu_state *numa_modes_list );
- void [[r7.1.1:function libbenchmark_topology_node_init|libbenchmark_topology_node_init]]( struct libbenchmark_topology_node_state *tns );
- void [[r7.1.1:function libbenchmark_topology_node_cleanup|libbenchmark_topology_node_cleanup]]( struct libbenchmark_topology_node_state *tns,
-                                          void (*element_cleanup_callback)(struct lfds711_list_aso_state *lasos,
-                                                                           struct lfds711_list_aso_element *lasoe) );
- void [[r7.1.1:function libbenchmark_results_init|libbenchmark_results_init]]( struct libbenchmark_results_state *rs,
-                                 struct libshared_memory_state *ms );
- void [[r7.1.1:function libbenchmark_results_cleanup|libbenchmark_results_cleanup]]( struct libbenchmark_results_state *rs );
- int [[r7.1.1:function libbenchmark_results_get_result|libbenchmark_results_get_result]]( struct libbenchmark_results_state *rs,
-                                      enum libbenchmark_datastructure_id datastructure_id,
-                                      enum libbenchmark_benchmark_id benchmark_id,
-                                      enum libbenchmark_lock_id lock_id,
-                                      enum libbenchmark_topology_numa_mode numa_mode,
-                                      struct lfds711_list_aso_state *lpset,
-                                      struct libbenchmark_topology_node_state *tns,
-                                      lfds711_pal_uint_t *result );
- void [[r7.1.1:function libbenchmark_benchmarksuite_init|libbenchmark_benchmarksuite_init]]( struct libbenchmark_benchmarksuite_state *bss,
-                                        struct libbenchmark_topology_state *ts,
-                                        struct libshared_memory_state *ms,
-                                        enum libbenchmark_topology_numa_mode numa_mode,
-                                        lfds711_pal_uint_t options_bitmask,
-                                        lfds711_pal_uint_t benchmark_duration_in_seconds );
- void [[r7.1.1:function libbenchmark_benchmarksuite_cleanup|libbenchmark_benchmarksuite_cleanup]]( struct libbenchmark_benchmarksuite_state *bss );
- void [[r7.1.1:function libbenchmark_benchmarksuite_run|libbenchmark_benchmarksuite_run]]( struct libbenchmark_benchmarksuite_state *bss,
-                                       struct libbenchmark_results_state *rs );
- void [[r7.1.1:function libbenchmark_benchmarksuite_get_list_of_gnuplot_strings|libbenchmark_benchmarksuite_get_list_of_gnuplot_strings]]( struct libbenchmark_benchmarksuite_state *bss,
-                                                               struct libbenchmark_results_state *rs,
-                                                               char *gnuplot_system_string,
-                                                               struct libbenchmark_gnuplot_options *gpo,
-                                                               struct lfds711_list_asu_state *list_of_gnuplot_strings );
-==Overview==
+  ERG length in bytes : Number successful LL/SC ops
-It has become apparent from creating this page that the API interface to benchmarking is too complicated.  The API to run the benchmarks and get hold of the results is actually simple enough - two init functions (one for the benchmark and one for the results) and then a function to run the benchmarks.  The problem come when offering APIs to query the results.
+  =================================================
+bytes : 0
+bytes : 0
+bytes : 0
+bytes : 0
+bytes : 1024
+bytes : 1023
+bytes : 1023
+bytes : 1024
+bytes : 1024
+bytes : 12
-There are many data structures, each of which can have many benchmarks, where each benchmark is run both for the ''liblfds'' lock-free version and then for the full range of system provided locking mechanisms, where each run of the benchmark for a given lock type runs over many combinations of logical cores, where for every combination a result is stored for each logical processor; and all of this on a NUMA system with more than one NUMA node is run twice, once NUMA aware and once not, to show how much difference this makes to performance.
+The ERG size is on the left, the number of successful LL/SC ops on the right.  Each size is tested 1024 times.  The smallest size with 1024, or almost 1024, operations, is the ERG size.  (LL/SC ops can fail naturally due to system activity, so it's expected that sometimes one or two LL/SC operations will fail by themselves and so the total value will be slightly slower).
-As such, values for all of these parameters must be specified to the function which queries the result state - and this exposes a great deal of complexity and functionality, as can be seen above.
+We see here the ERG sie for this platform is 64 bytes, which is correct - this is a Cortex A7 in a Raspberry Pi 2 Model B.
-It will take a day to document this library, and the effort to do so is futile, because it will still be too complex to use.  As such, the effort will go instead into simplifying the API.  However, this should not delay the release of 7.1.1, so the functions in this library will for now go undocumented.
+(That there are 12 successful ops for the 2048 byte size is not in fact understood.  There is a 4 byte ERG test as a sanity check, because if it passes, then it is clear the test is not working properly).
 ==See Also==
-* [[r7.1.1:Usage Guide (benchmarking)|Usage Guide (benchmarking)]]
+* [[r7.1.1:Release_7.1.1_Documentation|Release 7.1.1 Documentation]]