You could try increasing osd_recovery_max_active
(setting it will
override the osd_recovery_max_active_hdd default of 3) and
osd_recovery_max_single_start (default 1) to recover more of the small
objects concurrently.
True, but that would increase the workload on the HDDs which can (and
probably will) reduce performance.
Small objects are always a problem because of them being small. This
seems to apply of any object store and/or filesystem.
I always say: Nothing is free in this world
Meaning, writing a lot of small files might be easier (cheaper) on the
developer's side, but it's more difficult (expensive) on the storage side.
Each Object in RADOS has a certain amount of overhead and that results
in slower backfills/recovery. Especially on HDDs you will notice this.
Yes, reducing the amount of objects by making them larger helps, but
that shifts the problem to the user of RGW.
Short: Larger objects will increase your recovery performance.
Wido
-Sam
On Sun, May 2, 2021 at 8:27 PM Prasad Krishnan
<prasad.krishnan(a)flipkart.com> wrote:
Dear Ceph Developers,
On our Ceph S3 Storage clusters we have found that the rate of recovery/backfill on the
cluster with S3/RADOS object sizes between 10KB - 100KB takes much longer to recover
compared to our other cluster whose S3 object sizes are usually a few tens of MB (with
"rgw_obj_stripe_size" of 4MB, so RADOS objects would be 4MB or lesser).
We're exploring ways to improve the recovery speed by keeping the following factors
constant (since tweaking them would lead to other issues):
Type of media - this would be HDD as moving all data to SSD would be prohibitively
expensive
"osd_max_backfills" - We do not want to increase this as it leads to blocked
requests and interferes with client I/O. We suspect that the disk RPS gets saturated if
increased.
PG count - Increasing this would lead to more memory usage beyond what's available
with the OSDs.
I came across the same question posted on this forum a few years back but seems to have
no answers. Refer this and this.
Can the community help me understand what is theoretically causing this slowness? Is the
overhead in recovering each RADOS object (grabbing a lock on the PG, txn overhead) so high
that any increase in its number would decrease the recovery throughput?
Should I just tweak our workloads to not generate small sized S3/RADOS objects so that
the MTTR would become better for our cluster?
Thanks,
Prasad Krishnan
-----------------------------------------------------------------------------------------
This email and any files transmitted with it are confidential and intended solely for the
use of the individual or entity to whom they are addressed. If you have received this
email in error, please notify the system manager. This message contains confidential
information and is intended only for the individual named. If you are not the named
addressee, you should not disseminate, distribute or copy this email. Please notify the
sender immediately by email if you have received this email by mistake and delete this
email from your system. If you are not the intended recipient, you are notified that
disclosing, copying, distributing or taking any action in reliance on the contents of this
information is strictly prohibited.
Any views or opinions presented in this email are solely those of the author and do not
necessarily represent those of the organization. Any information on shares, debentures or
similar instruments, recommended product pricing, valuations and the like are for
information purposes only. It is not meant to be an instruction or recommendation, as the
case may be, to buy or to sell securities, products, services nor an offer to buy or sell
securities, products or services unless specifically stated to be so on behalf of the
Flipkart group. Employees of the Flipkart group of companies are expressly required not to
make defamatory statements and not to infringe or authorise any infringement of copyright
or any other legal right by email communications. Any such communication is contrary to
organizational policy and outside the scope of the employment of the individual concerned.
The organization will not accept any liability in respect of such communication, and the
employee resp
onsible
will be personally liable for any damages or other liability arising.
Our organization accepts no liability for the content of this email, or for the
consequences of any actions taken on the basis of the information provided, unless that
information is subsequently confirmed in writing. If you are not the intended recipient,
you are notified that disclosing, copying, distributing or taking any action in reliance
on the contents of this information is strictly prohibited.
-----------------------------------------------------------------------------------------
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io
_______________________________________________
Dev mailing list -- dev(a)ceph.io
To unsubscribe send an email to dev-leave(a)ceph.io