It looks like your radosgw is using a different version of librados. In
the backtrace, the top useful line begins:
librados::v14_2_0
when it should be v15.2.0, like the ceph::buffer in the same line.
Is there an old librados lying around that didn't get cleaned up somehow?
Daniel
On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote:
Hello,
I am experiencing very frequent crashes of the radosgw service. It happens multiple times
every hour. As an example, over the last 12 hours we've had 35 crashes. Has anyone
experienced similar behaviour of the radosgw octopus release service? More info below:
Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on one of the
servers to Ubuntu 20.04 with latest updates. The second server is still running Ubuntu
18.04. Both services crash occasionally, but the service which is running on Ubuntu 20.04
crashes far more often it seems. The ceph cluster itself is pretty old and was initially
setup around 2013. The cluster was updated pretty regularly with every major release.
Currently, I've got Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers.
Crash Backtrace:
ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8 |less
{
"backtrace": [
"(()+0x46210) [0x7f815a49a210]",
"(gsignal()+0xcb) [0x7f815a49a18b]",
"(abort()+0x12b) [0x7f815a479859]",
"(()+0x9e951) [0x7f8150ee9951]",
"(()+0xaa47c) [0x7f8150ef547c]",
"(()+0xaa4e7) [0x7f8150ef54e7]",
"(()+0xaa799) [0x7f8150ef5799]",
"(()+0x344ba) [0x7f815a1404ba]",
"(()+0x71e04) [0x7f815a17de04]",
"(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor
const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]",
"(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
RGWAccessListFilter*)+0x115) [0x7f815b0d9935]",
"(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
RGWSI_SysObj::Pool::ListCtx*)+0x255) [0x7f815abd7035]",
"(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)+0x206) [0x7f815b0ccfe6]",
"(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, void**)+0x41)
[0x7f815ad23201]",
"(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, void**)+0x71) [0x7f815ad254d1]",
"(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]",
"(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25)
[0x7f815ae60f25]",
"(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*,
ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]",
"(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]",
"(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]",
"(()+0x9609) [0x7f815130d609]",
"(clone()+0x43) [0x7f815a576293]"
],
"ceph_version": "15.2.8",
"crash_id":
"2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8",
"entity_name": "client.radosgw1.gateway",
"os_id": "ubuntu",
"os_name": "Ubuntu",
"os_version": "20.04.1 LTS (Focal Fossa)",
"os_version_id": "20.04",
"process_name": "radosgw",
"stack_sig":
"347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a",
"timestamp": "2021-01-28T11:36:48.912771Z",
"utsname_hostname": "arh-ibstorage1-ib",
"utsname_machine": "x86_64",
"utsname_release": "5.4.0-64-generic",
"utsname_sysname": "Linux",
"utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021"
}
radosgw.log file (file names were redacted):
-25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG
HTTP/1.1" 400 460 - -
-24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request
req=0x7f80437f5780 =====
-23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for trans_id
= tx000000000000000001431-006012a1d0-31197b5c-default
-22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1
-21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj verifying
requester
-20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj normalizing
buckets and tenants
-19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init
permissions
-18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid dest
placement: default-placement/REDUCED_REDUNDANCY
-17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22
new_err_no=-22
-16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op status=0
-15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http
status=400
-14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done req=0x7f80437f5780 op
status=0 http_status=400 latency=0s ======
-13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
/<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - -
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id
= tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing
buckets and tenants
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id
= tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id
= tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing
buckets and tenants
-7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init
permissions
-6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest
placement: default-placement/REDUCED_REDUNDANCY
-5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22
new_err_no=-22
-4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op status=0
-3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http
status=400
-2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done req=0x7f8043ff6780 op
status=0 http_status=400 latency=0s ======
-1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
/<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - -
0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) **
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045
(ceph:ceph)
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8
(bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, pid 30417
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s
Could someone help me troubleshoot and fix the issue?
Thanks
Andrei
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io