On Fri, Mar 01, 2024 at 08:18:24PM +0000, Donald Jennings wrote:
Hello all,
We are using Ceph as the storage backend for some Cloud research which
involves offloading functions to storage nodes to benefit from
near-storage processing. We are using rados_exec to achieve this by
attempting to call a class method on the object which then executes
the function locally. However, we have been running into an issue
where rados_exec fails with EIO and the request is never reaching the
storage node with method never being called.
Upon debugging this, I have noticed that if i re-put the same object
with a different key it works (provided it is on a different OSD). It
appears that the OSD cannot serve a rados_exec request.
What's the simplest
offload function you can reproduce the problem with,
and can you share that?
This bug happens under a few conditions
1. If we invoke the function before uploading it
2. Non-deterministically when the OSD is under load.
I cannot seem to debug it for the life of me and only thing I have to
go on is the OSDs cannot serve requests. I have attempted to remove
the object from the pool and put it back with the same key and it does
the exact same thing.
My initial read of this is that the content of the object is
breaking
your function?
--
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation President & Treasurer
E-Mail : robbat2(a)gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136