On Thu, Sep 10, 2020 at 10:19 AM shubjero <shubjero(a)gmail.com> wrote:
Hi Casey,
I was never setting rgw_max_chunk_size in my ceph.conf so it must have
been default? Funny enough I dont even see this configuration
parameter in the documentation
https://docs.ceph.com/docs/nautilus/radosgw/config-ref/ .
Armed with your information I tried setting the following in my ceph.conf:
root@ceph-1:~# ceph --admin-daemon
/var/run/ceph/ceph-client.rgw.ceph-1.28726.94406486979736.asok config
show | egrep "rgw_max_chunk_size|rgw_put_obj_min|rgw_obj_stripe_size"
"rgw_max_chunk_size": "67108864",
"rgw_obj_stripe_size": "67108864",
"rgw_put_obj_min_window_size": "67108864",
And with this configuration I was able to upload with large part sizes
(2GB) using the aws client without error.
are you sure there's a benefit to using such large part sizes? a
smaller part size should allow the client to stream more uploads at a
time. it also makes recovery much cheaper; if a 2GB PUT request times
out, the client will retry and send the entire 2GB again. with a
smaller part size, the server can commit this data more frequently and
limit the amount of bandwidth wasted on retries
Do you know if there is any expected performance improvement with
larger chunk/stripe/window sizes? As I said previously our use case is
dealing with very large genomic files being uploaded and downloaded
(average is probably 100GB per file).
rgw_max_chunk_size specifies how much data we'll send in a single osd
request. rgw_obj_stripe_size specifies how much data we'll write to a
single rados object before creating a new stripe/object.
rgw_put_obj_min_window_size specifies how much object data we'll
buffer in memory as we stream chunks out to their osds
i don't think we saw any benefit from chunk sizes over 4M, but you're
welcome to experiment and measure that in your environment. generally
you want a rgw_obj_stripe_size == rgw_max_chunk_size so that each of
your writes go to a different rados object; if, for example, your
stripe size was 2x the chunk size, we would write two chunks to each
rados object - but the osd has to apply these writes sequentially, so
you lose some parallelism this way
regarding rgw_put_obj_min_window_size, the number of parallel writes
we can do is equal to (rgw_put_obj_min_window_size /
rgw_max_chunk_size). in a default configuration, this is 16M/4M = 4.
you can experiment with a larger multiplier here, but do take overall
memory usage into account! If rgw_max_concurrent_requests is 1024 and
all of those are large PUT requests, then we'd use up to
(rgw_max_concurrent_requests * rgw_put_obj_min_window_size) or 16G of
memory
in general, i think the default tunings should perform well here. if
you have a lot of memory to work with on rgw nodes, you can experiment
with larger values of rgw_put_obj_min_window_size
On Wed, Sep 9, 2020 at 11:29 AM Casey Bodley <cbodley(a)redhat.com> wrote:
What is your rgw_max_chunk_size? It looks like you'll get these
EDEADLK errors when rgw_max_chunk_size > rgw_put_obj_min_window_size,
because we try to write in units of chunk size but the window is too
small to write a single chunk.
On Wed, Sep 9, 2020 at 8:51 AM shubjero <shubjero(a)gmail.com> wrote:
Will do Matt
On Tue, Sep 8, 2020 at 5:36 PM Matt Benjamin <mbenjami(a)redhat.com> wrote:
thanks, Shubjero
Would you consider creating a ceph tracker issue for this?
regards,
Matt
On Tue, Sep 8, 2020 at 4:13 PM shubjero <shubjero(a)gmail.com> wrote:
>
> I had been looking into this issue all day and during testing found
> that a specific configuration option we had been setting for years was
> the culprit. Not setting this value and letting it fall back to the
> default seems to have fixed our issue with multipart uploads.
>
> If you are curious, the configuration option is rgw_obj_stripe_size
> which was being set to 67108864 bytes (64MiB). The default is 4194304
> bytes (4MiB). This is a documented option
> (
https://docs.ceph.com/docs/nautilus/radosgw/config-ref/) and from my
> testing it seems like using anything but the default (only tried
> larger values) breaks multipart uploads.
>
> On Tue, Sep 8, 2020 at 12:12 PM shubjero <shubjero(a)gmail.com> wrote:
> >
> > Hey all,
> >
> > I'm creating a new post for this issue as we've narrowed the problem
> > down to a partsize limitation on multipart upload. We have discovered
> > that in our production Nautilus (14.2.11) cluster and our lab Nautilus
> > (14.2.10) cluster that multipart uploads with a configured part size
> > of greater than 16777216 bytes (16MiB) will return a status 500 /
> > internal server error from radosgw.
> >
> > So far I have increased the following rgw settings/values that looked
> > suspect, without any success/improvement with partsizes.
> > Such as:
> > "rgw_get_obj_window_size": "16777216",
> > "rgw_put_obj_min_window_size": "16777216",
> >
> > I am trying to determine if this is because of a conservative default
> > setting somewhere that I don't know about or if this is perhaps a bug?
> >
> > I would appreciate it if someone on Nautilus with rgw could also test
> > / provide feedback. It's very easy to reproduce and configuring your
> > partsize with aws2cli requires you to put the following in your aws
> > 'config'
> > s3 =
> > multipart_chunksize = 32MB
> >
> > rgw server logs during a failed multipart upload (32MB chunk/partsize):
> > 2020-09-08 15:59:36.054 7f2d32fa6700 1 ====== starting new request
> > req=0x55953dc36930 =====
> > 2020-09-08 15:59:36.082 7f2d32fa6700 -1 res_query() failed
> > 2020-09-08 15:59:36.138 7f2d32fa6700 1 ====== req done
> > req=0x55953dc36930 op status=0 http_status=200 latency=0.0839988s
> > ======
> > 2020-09-08 16:00:07.285 7f2d3dfbc700 1 ====== starting new request
> > req=0x55953dc36930 =====
> > 2020-09-08 16:00:07.285 7f2d3dfbc700 -1 res_query() failed
> > 2020-09-08 16:00:07.353 7f2d00741700 1 ====== starting new request
> > req=0x55954dd5e930 =====
> > 2020-09-08 16:00:07.357 7f2d00741700 -1 res_query() failed
> > 2020-09-08 16:00:07.413 7f2cc56cb700 1 ====== starting new request
> > req=0x55953dc02930 =====
> > 2020-09-08 16:00:07.417 7f2cc56cb700 -1 res_query() failed
> > 2020-09-08 16:00:07.473 7f2cb26a5700 1 ====== starting new request
> > req=0x5595426f6930 =====
> > 2020-09-08 16:00:07.473 7f2cb26a5700 -1 res_query() failed
> > 2020-09-08 16:00:09.465 7f2d3dfbc700 0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.465 7f2d3dfbc700 1 ====== req done
> > req=0x55953dc36930 op status=-35 http_status=500 latency=2.17997s
> > ======
> > 2020-09-08 16:00:09.549 7f2d00741700 0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.549 7f2d00741700 1 ====== req done
> > req=0x55954dd5e930 op status=-35 http_status=500 latency=2.19597s
> > ======
> > 2020-09-08 16:00:09.605 7f2cc56cb700 0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.609 7f2cc56cb700 1 ====== req done
> > req=0x55953dc02930 op status=-35 http_status=500 latency=2.19597s
> > ======
> > 2020-09-08 16:00:09.641 7f2cb26a5700 0 WARNING: set_req_state_err
> > err_no=35 resorting to 500
> > 2020-09-08 16:00:09.641 7f2cb26a5700 1 ====== req done
> > req=0x5595426f6930 op status=-35 http_status=500 latency=2.16797s
> > ======
> >
> > awscli client side output during a failed multipart upload:
> > root@jump:~# aws --no-verify-ssl --endpoint-url
> >
http://lab-object.cancercollaboratory.org:7480 s3 cp 4GBfile
> > s3://troubleshooting
> > upload failed: ./4GBfile to s3://troubleshooting/4GBfile An error
> > occurred (UnknownError) when calling the UploadPart operation (reached
> > max retries: 2): Unknown
> >
> > Thanks,
> >
> > Jared Baker
> > Cloud Architect for the Cancer Genome Collaboratory
> > Ontario Institute for Cancer Research
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
--
Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103
http://www.redhat.com/en/technologies/storage
tel. 734-821-5101
fax. 734-769-8938
cel. 734-216-5309
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io