Hello Ceph users,
We are experiencing an issue with ceph 14.2.9 / RGW Beast frontend. We are seeing this
across our two separate clusters.
Over a few weeks the qlen and qactive are going up and not returning to zero. At some
point we start seeing performance degrade and we need to reboot the services. We are
viewing the queue numbers in the perfcounters_dump. In objecter_requests we aren't
seeing any request ( apart from very briefly )
We can reproduce the issue by use S3 browser and setting the concurrent downloads to 100.
After completing download of ~1000 files, the queue length has incremented by 2-5 and
never returns back to zero. Subsequent bulk downloads increase the qlen.
We have the following tunables set
rgw_bucket_index_max_aio 128
rgw_dns_name <fqdn##>
rgw_frontends beast ssl_port=443 ssl_certificate=<CERT##>
rgw_max_chunk_size 4194304
rgw_num_rados_handles 16
rgw_thread_pool_size 500
Anyone seen this or have any idea how to further debug?
Any additional tuning suggested? 350TB S3 data
Glen
This e-mail is intended solely for the benefit of the addressee(s) and any other named
recipient. It is confidential and may contain legally privileged or confidential
information. If you are not the recipient, any use, distribution, disclosure or copying of
this e-mail is prohibited. The confidentiality and legal privilege attached to this
communication is not waived or lost by reason of the mistaken transmission or delivery to
you. If you have received this e-mail in error, please notify us immediately.
Show replies by date