Hi everyone:
I asked about this on the #ceph IRC channel, but didn't get much traction (and, as an
aside: the advertised host for the channel's logs turns up unaccessible to me...).
I have a new Ceph cluster presenting an erasure-coded pool.
Current configuration is 8 nodes, each hosting a mon, and 1 osd per hdd (20 10TB HGST
spinning disks per node) for a total of 160 OSDs in the cluster.
The cluster configures fine with ceph-ansible (as nautilus or mimic), and ceph health is
always marked as good (other than the "warn" for not having associated an
application string with the current test pool).
rados bench maxes out the bandwidth of the network interface when I try it with 4MB
objects.
However, attempting a more "real-world" test of
rados -p ecpool --striper put obj 120MBfile
causes the transfer to fail with "operation not permitted (95)"
Inspection reveals that 3 stripe chunks get created - the first being the expected size,
and the second and third being only a few kb in size.
Object metadata from rados -p --striper ls obj is inconsistent with the sum of the on-disk
size of the chunks.
Can you advise how to diagnose what's breaking here?
Thanks
Sam Skipsey
University of Glasgow