Known issues release 6

From liblfds.org
Revision as of 14:07, 4 January 2015 by Admin (talk | contribs) (1 revision imported)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Current

2. The release note doesn't mention the new() function API changes.

3. The test and benchmark program offers an iteration argument, for running tests multiple times; in fact, this is bugged, and will crash on the second iteration. The problem is use of static variables in the tests, which don't get reset to 0 and so cause memory exceptions (it worked on x64!).

4. liblfds does not in fact support IA64. I had thought there was a compiler environment bug of some kind which affected this build, but it turns out to be a mistake on my part; IA64 does not in fact support contigious double-word compare-and-swap.

5. In the slist test (/src/test/test_slist.c) there is a missing check for NULL on line 444, which is part of the core of the traverser thread, a test thread which simply constantly loops over a list. The code looks like this;

if( !(iteration % stss->iteration_modulo) )
{
  slist_get_next( se, &se );
  count++;
}

It should look like this;

if( se != NULL and !(iteration % stss->iteration_modulo) )
{
  slist_get_next( se, &se );
  count++;
}

This result of this is the existence of a race condition. Some slist tests all basically work by generating list elements on the fly and while having a traverser thread running. If however the traverser thread begins to run before any elements have been generated, the missing check for NULL will permit slist_get_next() to execute on a NULL element which causes an assert.

The following tests are affected;

  • 4. one head writer and one list traverser per CPU
  • 5. make one element, then one after writer and one list traverser per CPU
  • 12. one head writer, one after writer, one traverser and one 25%% deleter-traverser per CPU

6. The freelist popping test is bugged. A variable, count, is declared as unsigned int when in fact it needs to be atom_t. The consequence is that on 64-bit platforms where unsigned int is 32 bit and pointers are 64 bit, the freelist test will report missing elements.

The problem code is in /test/src/test_freelist.c, line 32.

The code looks like this;

unsigned int
  loop,
  cpu_count,
  count;

It should look like this;

unsigned int
  loop,
  cpu_count;

atom_t
  count;

7. The freelist pushing test is bugged. The test allocates (one million / number cores) freelist elements per thread. Now, with integer division, on a (for example) 24 core machine, this gives (1,000,000 / 24) = 41,666 elements per thread. However, after the test has run, there is a check that there are in *exactly* 1,000,000 elements now present in the destination freelist. This fails, because 41,666 * 24 = 999,984. The error seen is that the freelist is missing elements.

8. The library does not use memory barriers correctly. The main upshot of this is that the queue can be corrupted; a missing barrier means that the queue can be misled about its state during enqueuing or dequeuing and as such attempt to perform the wrong action (both of those operations operate in different ways depending on the current internal state of the queue). How likely this is to happen is not clear; the window is a few instructions in length (e.g. short), it can only happen when multiple cores are using one queue, it is more likely the more busy the queue is (both in terms of number of cores using the queue and the frequency of enqueuing and dequeuing), but on my test systems, which include a 16 core, 2 NUMA node machine, I have yet to see errors even with all cores busy-using the queue (which may mean the tests are no good, at least for this problem - they certainly detected enough errors during development and testing).

Revoked

1. slist is broken. About 10% of the test program runs for slist will assert on a NULL pointer. (This may be due to the pointer decleration problem, which has been fixed; further testing may result in this known issue being revoked).

Having fixed the pointer decleration problem, I've run 50 iterations on ARM with no problems. I declare this problem fixed.

History

  • Carried #1 over from release 5 (29th December 2009)
  • Added #2 (29th December 2009)
  • Added #3 (31st December 2009)
  • Revoked #1 (31st December 2009)
  • Added #4 (16th Jan 2010)
  • Added #5 (26th Jan 2010)
  • Added #6 (12th Nov 2010)
  • Added #7 (23rd Oct 2011>
  • Added #8 (12th Nov 2012>