Last active yesterday
The virtual memory addresses being used in the various processes have to be identical.
The data structures are not internally using offsets - they use pointers. I have thought about using offsets, but it's work which has had less importance than other work, in part since mmap() with a specific virtual memory address does work.
Call mmap() with a first argument which specifies a particular virtual memory addresses.
I'm just on my way home now, and I'll look at the code and so on properly once I'm here.
One thing I would say now though, from just having run my eyes over the code, is that the BMM queue is a *relaxed* queue. By this word I mean to say that although once an enqueue call has returned, an element definitely will be enqueued and in the expected order, it does *not* mean that element will yet be seen by *dequeuers in other threads*.
So - in theory - you could have one thread sitting there enqueuing all day long, and another thread which is dequeuing never returning anything. (This obviously doesn't happen in practise - it's an extreme example - but much shorter term versions of it are happening for every enqueue; it takes a certain time for information to propagate, and there are no guarantees from the hardware about how long this will take.)
What it means is that you must *always* check the return value from the dequeue() function. This is because no matter how many elements you may have enqueued, they may not yet be visible to the dequeue thread.
I may be wrong, but I think the problem you're running into with the queue is that when initializing, all of the store for queue elements is allocated in one single block of memory, but then when cleaning up, the cleanup function presents you with *each element* in turn.
If you then try to free each element, it will crash, because what should actually be happening is that there is only a *single* call to free, which is passed the pointer to the single block of memory which was allocated when the queue was initialized.
The cleanup function allows you if you need to, to perform cleanup work *related to whatever it is you were doing* - so, for example, say we were using the queue to buffer network packets or something like that, and when the application shut down, we want to handle all outstanding enqueued elements. In this case, the cleanup function lets us process the data in those elements.
To actually free the memory of the queue elements, you keep track of the pointer to single allocation holding all the queue elements and free that after the queue cleanup function has returned.
So in your case, once the queue cleanu has returned, you free "synth_commands_queue_element".
Note however the queue only calls the cleanup function *for enqueue elements which are still in the queue*. It does NOT call the cleanup function with EVERY element which was given to the queue (when the queue was initialized).
So if you have a malloc per queue element, you would for example after the queue cleanup function is called iterate over the elements in the single allocation of all elements and *there* call free() on usd->synth, for all of them, and not call free for that at all in the cleanup function, because f you did call free for usd->synth in the cleanup function, it would only be called on those queue elements which were still enqueued at the time the queue was cleanedup.
Thankyou for your kind words about the library :-)
I'm glad, but I'm still mortified. Sample code which doesn't work is a cardinal sin. Thankyou for posting about it, so it became known.
I'm very sorry, aperez; this is an embarrassing documentation fault on my part.
The number of elements given to the bounded/many/many queue must be a positive integer power of 2, i.e. 2, 4, 8, 16, etc. (In fact, the demo code is wrong - and that's a *really* bad mistake; it happened because there was so much documentation to write. Of course, any code not run is broken, and this is no exception. The library code itself is run extensively, by the test programme; the library is *not* treated as the example code was in this case.)
There is an assert for this, and that is probably what is causing the crash. The init function itself is utterly simple - it just writes initial values into the state structure.
I have read over the docs for the queue and I can find NOWHERE where this requirement is stated. This surprises me very deeply - but there it is.
I don't have time to fix it this very instant - I have to leave in a minute - but I will correct it (describing the requirement, and fixing the sample code) tonight.
In fact, what needs to happen is all the sample code is made part of the test suite, so it is always being run.
Something I recently realised is that you can only offer work you can automate. Anything non-automated is simply too much work to keep doing manually.
The library has never been compiled and tested on Android, because that platform is not (yet) available.
However, the library is actually almost competely independent of the platform. The only interaction between the library and the platform is the abstraction layer, which abstracts atomic instruction functionality and the like. It could be possible if this is incorrect for the library to crash.
Another way forward here is for me to duplicate your build environment. What platform are you using to execute the Android binary?
You can email as well, if you wish, "admin at liblfds dot org".
For any future readers of this thread (mythical beasts :-) the problem turns out to be that I completely messed up the benchmark initialization code when running on non-NUMA machines (i.e. Linux without libnuma). What the code is doing is in principle correct, but it does not make the correct memory allocation call for SMP. I am now inspecting the test framework, because something is seriously amiss.
The solution is to install libnuma (yum or apt-get "libnuma-devel", name varies a bit by platform) and to compile the "gcc_gnumake_hosted_liblfds710_liblfds700_libnuma" build of the benchmark binary.
This makes a NUMA binary (which is fine even on SMP systems - they're the same a single NUMA node systems) and the benchmark works.
The next release, 7.2.0, is due soon, and will have this *highly* embarrassing problem fixed.
Could I ask you to paste the output from;
That will display the processor topology. However, I wonder if this also is crashing for you?
While I'm asking for stuff :-) is there any chance of an SSH login on your machine? if I can debug in place, that's by far the quickest way to solve things. I can send you a key.
Thankyou very much for reporting the problem. I'm brain-deep in debugging the guts of SMR at this very instant, and then it's bed - I'll look closely into the problem first thing in the morning.
The only comment I have offhand is that LIBBENCHMARK_TOPOLOGY_NODE_GET_NUMA_ID() returns an integer, so it can rightfully return 0 - but that should never lead to a core dump.
I'm almost ready to release 7.1.0 (i.e. no pre-thread state).
Hopefully one week.