function libbenchmark_pal_populate_topology

From liblfds.org
Revision as of 20:16, 17 February 2017 by Admin (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Source Files

└───test_and_benchmark
    └───libbenchmark
        ├───inc
        │   └───libbenchmark
        │           libbenchmark_porting_abstraction_layer.h
        └───src
            └───libbenchmark_porting_abstraction_layer
                    libbenchmark_porting_abstraction_layer_populate_topology.c

Enums

enum libbenchmark_topology_node_cache_type;

Opaque Structures

struct libbenchmark_topology_node_state;

Prototype

 int libbenchmark_porting_abstraction_layer_populate_topology( struct libbenchmark_topology_state *ts,
                                                               struct libshared_memory_state *ms );

Parameters

struct libbenchmark_topology_state *ts

A pointer to an initialized struct libbenchmark_topology_state.

struct libshared_memory_state *ms

A pointer to an initialized and populated struct libshared_memory_state. This function is not NUMA aware and so when an allocation occurs, it is simply take from the memory block with the most free space.

Return Value

Return 1 on success, 0 on failure.

Helper Functions

void libbenchmark_misc_pal_helper_new_topology_node( struct libbenchmark_topology_node_state **tns,
                                                     struct libshared_memory_state *ms );

void libbenchmark_misc_pal_helper_add_logical_processor_to_topology_node( struct libbenchmark_topology_node_state *tns,
                                                                          struct libshared_memory_state *ms,
                                                                          lfds711_pal_uint_t logical_processor_number,
                                                                          enum flag windows_processor_group_inuse_flag,
                                                                          lfds711_pal_uint_t windows_processor_group_number );

void libbenchmark_misc_pal_helper_add_system_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                    struct libbenchmark_topology_node_state *tns );

void libbenchmark_misc_pal_helper_add_numa_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                  struct libbenchmark_topology_node_state *tns,
                                                                  lfds711_pal_uint_t numa_node_id );

void libbenchmark_misc_pal_helper_add_socket_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                    struct libbenchmark_topology_node_state *tns );

void libbenchmark_misc_pal_helper_add_physical_processor_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                                struct libbenchmark_topology_node_state *tns );

void libbenchmark_misc_pal_helper_add_cache_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                   struct libbenchmark_topology_node_state *tns,
                                                                   lfds711_pal_uint_t level,
                                                                   enum libbenchmark_topology_node_cache_type type );

void libbenchmark_misc_pal_helper_add_logical_processor_node_to_topology_tree( struct libbenchmark_topology_state *ts,
                                                                               lfds711_pal_uint_t logical_processor_number,
                                                                               enum flag windows_processor_group_inuse_flag,
                                                                               lfds711_pal_uint_t windows_processor_group_number );

Example

int libbenchmark_porting_abstraction_layer_populate_topology( struct libbenchmark_topology_state *ts,
                                                              struct libshared_memory_state *ms )
{
  BOOL brv;
  DWORD slpi_length = 0, number_slpi, loop;
  enum libbenchmark_topology_node_cache_type
    processor_cache_type_to_libbenchmark_topology_node_cache_type[3] = 
    {
      LIBBENCHMARK_TOPOLOGY_NODE_CACHE_TYPE_UNIFIED, LIBBENCHMARK_TOPOLOGY_NODE_CACHE_TYPE_INSTRUCTION, LIBBENCHMARK_TOPOLOGY_NODE_CACHE_TYPE_DATA
    };
  int rv = 1;
  struct libbenchmark_topology_node_state *tns;
  SYSTEM_LOGICAL_PROCESSOR_INFORMATION *slpi = NULL;
  ULONG_PTR mask;

  LFDS711_PAL_ASSERT( ts != NULL );
  LFDS711_PAL_ASSERT( ms != NULL );

  // TRD : obtain information from the OS
  brv = GetLogicalProcessorInformation( slpi, &slpi_length );
  slpi = libshared_memory_alloc_from_most_free_space_node( ms, slpi_length, sizeof(lfds711_pal_uint_t) );
  brv = GetLogicalProcessorInformation( slpi, &slpi_length );
  number_slpi = slpi_length / sizeof(SYSTEM_LOGICAL_PROCESSOR_INFORMATION);

  /* TRD : we loop twice over the topology information
           first time we form up the system node
           and add that
           second time, we do everything else
  */

  libbenchmark_misc_pal_helper_new_topology_node( &tns, ms );

  for( loop = 0 ; loop < number_slpi ; loop++ )
    if( (slpi+loop)->Relationship == RelationNumaNode )
      internal_populate_logical_processor_array_from_bitmask( ms, tns, (lfds711_pal_uint_t) (slpi+loop)->ProcessorMask );

  libbenchmark_misc_pal_helper_add_system_node_to_topology_tree( ts, tns );

  for( loop = 0 ; loop < number_slpi ; loop++ )
  {
    if( (slpi+loop)->Relationship == RelationNumaNode )
    {
      libbenchmark_misc_pal_helper_new_topology_node( &tns, ms );
      internal_populate_logical_processor_array_from_bitmask( ms, tns, (lfds711_pal_uint_t) ((slpi+loop)->ProcessorMask) );
      libbenchmark_misc_pal_helper_add_numa_node_to_topology_tree( ts, tns, (lfds711_pal_uint_t) (slpi+loop)->NumaNode.NodeNumber );

      // TRD : add each LP as an individual LP node
      for( mask = 1 ; mask != 0 ; mask <<= 1 )
        if( ((slpi+loop)->ProcessorMask & mask) == mask )
          libbenchmark_misc_pal_helper_add_logical_processor_node_to_topology_tree( ts, ms, (lfds711_pal_uint_t) ((slpi+loop)->ProcessorMask & mask), LOWERED, 0 );
    }

    if( (slpi+loop)->Relationship == RelationProcessorPackage )
    {
      libbenchmark_misc_pal_helper_new_topology_node( &tns, ms );
      internal_populate_logical_processor_array_from_bitmask( ms, tns, (lfds711_pal_uint_t) ((slpi+loop)->ProcessorMask) );
      libbenchmark_misc_pal_helper_add_socket_node_to_topology_tree( ts, tns );
    }

    if( (slpi+loop)->Relationship == RelationProcessorCore )
    {
      libbenchmark_misc_pal_helper_new_topology_node( &tns, ms );
      internal_populate_logical_processor_array_from_bitmask( ms, tns, (lfds711_pal_uint_t) ((slpi+loop)->ProcessorMask) );
      libbenchmark_misc_pal_helper_add_physical_processor_node_to_topology_tree( ts, tns );
    }

    if( (slpi+loop)->Relationship == RelationCache )
    {
      if( (slpi+loop)->Cache.Type == CacheUnified or (slpi+loop)->Cache.Type == CacheInstruction or (slpi+loop)->Cache.Type == CacheData )
      {
        libbenchmark_misc_pal_helper_new_topology_node( &tns, ms );
        internal_populate_logical_processor_array_from_bitmask( ms, tns, (lfds711_pal_uint_t) (slpi+loop)->ProcessorMask );
        libbenchmark_misc_pal_helper_add_cache_node_to_topology_tree( ts, tns, (lfds711_pal_uint_t) (slpi+loop)->Cache.Level, processor_cache_type_to_libbenchmark_topology_node_cache_type[(slpi+loop)->Cache.Type] );
      }
    }
  }

  return rv;
}

Notes

This is the most complicated abstraction function in all of the abstraction layers. It is used to obtain processor/memory topology. Internally, libbenchmark represents topology in a binary tree (the topology tree), where each node in the tree is a topology entity (a topology node), such as a socket, or a NUMA node, or a logical processor. This abstraction function populates the tree with nodes.

The purpose of this information is that the performance of a benchmark naturally will vary depending on how many, and which, logical processors are running the benchmark, and there are a wide range of different combinations of logical processors which are of interest, as they reveal performance characteristics. The most obvious example is scaling. If there are say eight logical processors, it is interesting to run the benchamrk on one processor, then on two, then on three, etc. Another example would be for hyperthreaded systems, where it would be interesting to run the benchmark on every logical core in the system, and then on every *physical core*, i.e. on one of the two hyperthreads only.

The benchmark code generates a wide range of interesting combinations of logical processor sets, but to do this, it needs processor/memory topology information.

Internally, a node is a struct which has basically two fields. The first indicates the type of topology entity represented by the node, the second is a list of logical processors "belonging" to that node - e.g. is the node is a socket, then the list of logical processors belonging to that node is that of all logical processors found in that socket. When a node is inserted into the tree, the list of logical processors must be complete, or the node is inserted into the wrong place in the tree. In fact, there is no functionality available to the user to do so - but it is an important point and helps to convey in this documentation how the tree works.

A set of helper functions are provided which (hopefully!) act to mask detail and so simplify implementation.

First, libbenchmark_misc_pal_helper_new_topology_node is called to obtain a new, initialized topology node.

Second, libbenchmark_misc_pal_helper_add_logical_processor_to_topology_node is called, using the topology node, as many times as necessary to add the full list of "belonging" logical processors to the topology node.

Finally, one of the libbenchmark_misc_pal_helper_add_*_node_to_topology_tree functions is called, as appropriate to the node type, to insert the topology node into the tree.

In the event a system lacks an API to enumerate processor/memory topology, the helper API functions can simply be called manually, to create a fixed topology tree (this is for example what is done in the Windows kernel abstraction prior to Windows 7).

Finally, note that the helper API for adding a logical processor itself to the tree (as opposed to adding it to the list of "belonging" processors for a node) is slightly different to the other helper functions. This is because a logical processor node has no list of "belonging" nodes. As such, the calls to libbenchmark_misc_pal_helper_new_topology_node and libbenchmark_misc_pal_helper_add_logical_processor_to_topology_node are unnecessary; the user simply calls libbenchmark_misc_pal_helper_add_logical_processor_node_to_topology_tree.

The node types are self-explanatory except for the "system" node. There is and only is a single system node, and it contains every logical processor. The minimum valid topology tree is the single system node, and one logical processor.

See Also