Hi,

Is anyone using librados AIO APIs? I seem to have a problem with that where the rados_aio_wait_for_complete() call just waits for a long period of time before it finishes without error.

More info on my setup:
I am using Ceph 14.2.4 and write 8MB objects.

I run my AIO program on 24 nodes at the same time each writing a different data (splits into 8MB objects and  ), each data is about 2G.

Normally, it takes about 10 mins for all of them to complete. But often one or more nodes takes considerably longer to finish. When looking at the one of those, I mostly see that the IO requests have been submitted and waits at:

#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002aaaaad0c8fa in rados_aio_wait_for_complete () from /cgv/geovation/2/test/ceph/lib/librados.so.2

Then it eventually completes with no errors from  rados_aio_wait_for_complete() call.

The (pseudo) code looks like:

        while (data remains to be written) {
          size_t aio_ops_count = 0;
         rados_completion_t aio_comp[12];

            for (size_t j = 0; j < 12; ++j) {
                int err = rados_aio_create_completion(NULL, NULL, NULL, &aio_comp[j]);
                if (err < 0) {
                    cerr << "rados_aio_create_completion: " << strerror(-err) << endl;
                    return 1;
                }

                string obj_ = getobjectid();

                err = rados_aio_write_full(io, obj_.c_str(), aio_comp[j], read_buf[j], bytes);
                if (err < 0) {
                    cerr << "rados_write_full: " << strerror(-err) << endl;
                    return 1;
                }

                ++aio_ops_count;
            }

            for (size_t j = 0; j < aio_ops_count; ++j) {
                rados_aio_wait_for_complete(aio_comp[j]);
                int err = rados_aio_get_return_value(aio_comp[j]); // Considerably longer delay here ??
                if (err < 0) {
                    cerr << "rados_aio_get_return_value: " << strerror(-err) << endl;
                    return 1;
                }

                rados_aio_release(aio_comp[j]);
            }

}

I ran under Valgrind and see no issues and also read the data back and checksum it to verify no corruption issues. So everything appears to "work" as expected except for longer delays at times.
Wondering if anyone is using the AIO APIs to write objects and had experienced any similar problems. 

--
Regards,
Ponnuvel P