Segfault in libbenchmark in liblfds7.1.0

  1. 7 months ago

    For some reason (and perhaps for a good reason), LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID does return NULL on my Workstation (Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz) which causes libbenchmark to segfault due to a NULL pointer dereference.

    The following patch fixes this (or atleast it does not segfault anymore and the benchmark can run, don't know enough of liblfds yet to determine if the benchmark value is affected or not).

    --- libbenchmark_threadset_init.c.orginal	2016-08-15 14:16:45.706318564 +0200
    +++ libbenchmark_threadset_init.c	2016-08-15 14:20:03.982017273 +0200
    @@ -72,9 +72,13 @@
           {
             pns = libshared_memory_alloc_from_specific_node( ms, LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID(*tns_numa_node), sizeof(struct libbenchmark_threadset_per_numa_state), LFDS710_PAL_ATOMIC_ISOLATION_IN_BYTES );
             pns->numa_node_id = LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID( *tns_numa_node );
    -        pns->users_per_numa_state = NULL;
    -        LFDS710_LIST_ASU_SET_VALUE_IN_ELEMENT( pns->lasue, pns );
    -        lfds710_list_asu_insert_at_start( &tsets->list_of_per_numa_states, &pns->lasue );
    +
    +        if ( pns->nume_node_id != NULL )
    +        {
    +          pns->users_per_numa_state = NULL;
    +          LFDS710_LIST_ASU_SET_VALUE_IN_ELEMENT( pns->lasue, pns );
    +          lfds710_list_asu_insert_at_start( &tsets->list_of_per_numa_states, &pns->lasue );
    +        }
           }
         }
     
  2. admin

    15 Aug 2016 Administrator

    Hej!

    Thankyou very much for reporting the problem. I'm brain-deep in debugging the guts of SMR at this very instant, and then it's bed - I'll look closely into the problem first thing in the morning.

    The only comment I have offhand is that LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID() returns an integer, so it can rightfully return 0 - but that should never lead to a core dump.

  3. admin

    16 Aug 2016 Administrator

    Could I ask you to paste the output from;

    benchmark -t

    That will display the processor topology. However, I wonder if this also is crashing for you?

    While I'm asking for stuff :-) is there any chance of an SSH login on your machine? if I can debug in place, that's by far the quickest way to solve things. I can send you a key.

  4. I'm terrible sorry but I managed to completely mess up the patch that I sent in the first post. The reason was that I first fixed it on another day without saving it and then I tried to recreate the patch in order to send it here without actually testing it (the deadly sin) so it was wrong. The problem that I experience is that the memory allocation failed so the pns pointer where used in a NULL pointer dereference.

    The correct patch to fix this is:

    --- libbenchmark_threadset_init.c.orginal	2016-08-15 14:16:45.706318564 +0200
    +++ libbenchmark_threadset_init.c	2016-08-18 10:48:05.802346033 +0200
    @@ -71,10 +71,14 @@
           if( lasue == NULL )
           {
             pns = libshared_memory_alloc_from_specific_node( ms, LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID(*tns_numa_node), sizeof(struct libbenchmark_threadset_per_numa_state), LFDS710_PAL_ATOMIC_ISOLATION_IN_BYTES );
    -        pns->numa_node_id = LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID( *tns_numa_node );
    -        pns->users_per_numa_state = NULL;
    -        LFDS710_LIST_ASU_SET_VALUE_IN_ELEMENT( pns->lasue, pns );
    -        lfds710_list_asu_insert_at_start( &tsets->list_of_per_numa_states, &pns->lasue );
    +
    +        if ( pns != NULL )
    +        {
    +          pns->numa_node_id = LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID( *tns_numa_node );
    +          pns->users_per_numa_state = NULL;
    +          LFDS710_LIST_ASU_SET_VALUE_IN_ELEMENT( pns->lasue, pns );
    +          lfds710_list_asu_insert_at_start( &tsets->list_of_per_numa_states, &pns->lasue );
    +        }
           }
         }
  5. admin

    20 Aug 2016 Administrator
    Edited 7 months ago by admin

    For any future readers of this thread (mythical beasts :-) the problem turns out to be that I completely messed up the benchmark initialization code when running on non-NUMA machines (i.e. Linux without libnuma). What the code is doing is in principle correct, but it does not make the correct memory allocation call for SMP. I am now inspecting the test framework, because something is seriously amiss.

    The solution is to install libnuma (yum or apt-get "libnuma-devel", name varies a bit by platform) and to compile the "gcc_gnumake_hosted_liblfds710_liblfds700_libnuma" build of the benchmark binary.

    This makes a NUMA binary (which is fine even on SMP systems - they're the same a single NUMA node systems) and the benchmark works.

    The next release, 7.2.0, is due soon, and will have this *highly* embarrassing problem fixed.

 

or Sign Up to reply!