Hi Daniel,
Thanks for you're reply. I've checked the package versions on that server and all
ceph related packages on that server are from 15.2.8 version:
ii librados2 15.2.8-1focal amd64 RADOS distributed object store client
library
ii libradosstriper1 15.2.8-1focal amd64 RADOS striping interface
ii python3-rados 15.2.8-1focal amd64 Python 3 libraries for the Ceph librados
library
ii radosgw 15.2.8-1focal amd64 REST gateway for RADOS distributed object
store
ii librbd1 15.2.8-1focal amd64 RADOS block device client library
ii python3-rbd 15.2.8-1focal amd64 Python 3 libraries for the Ceph librbd
library
ii ceph 15.2.8-1focal amd64 distributed storage and file
system
ii ceph-base 15.2.8-1focal amd64 common ceph daemon libraries
and management tools
ii ceph-common 15.2.8-1focal amd64 common utilities to mount and
interact with a ceph storage cluster
ii ceph-fuse 15.2.8-1focal amd64 FUSE-based client for the
Ceph distributed file system
ii ceph-mds 15.2.8-1focal amd64 metadata server for the ceph
distributed file system
ii ceph-mgr 15.2.8-1focal amd64 manager for the ceph
distributed storage system
ii ceph-mgr-cephadm 15.2.8-1focal all cephadm orchestrator module
for ceph-mgr
ii ceph-mgr-dashboard 15.2.8-1focal all dashboard module for
ceph-mgr
ii ceph-mgr-diskprediction-cloud 15.2.8-1focal all diskprediction-cloud module
for ceph-mgr
ii ceph-mgr-diskprediction-local 15.2.8-1focal all diskprediction-local module
for ceph-mgr
ii ceph-mgr-k8sevents 15.2.8-1focal all kubernetes events module for
ceph-mgr
ii ceph-mgr-modules-core 15.2.8-1focal all ceph manager modules which
are always enabled
ii ceph-mgr-rook 15.2.8-1focal all rook module for ceph-mgr
ii ceph-mon 15.2.8-1focal amd64 monitor server for the ceph
storage system
ii ceph-osd 15.2.8-1focal amd64 OSD server for the ceph
storage system
ii cephadm 15.2.8-1focal amd64 cephadm utility to bootstrap
ceph daemons with systemd and containers
ii libcephfs2 15.2.8-1focal amd64 Ceph distributed file system
client library
ii python3-ceph 15.2.8-1focal amd64 Meta-package for python
libraries for the Ceph libraries
ii python3-ceph-argparse 15.2.8-1focal all Python 3 utility libraries
for Ceph CLI
ii python3-ceph-common 15.2.8-1focal all Python 3 utility libraries
for Ceph
ii python3-cephfs 15.2.8-1focal amd64 Python 3 libraries for the
Ceph libcephfs library
As this is a brand new 20.04 server I do not see how the older version could have got onto
it.
Andrei
----- Original Message -----
From: "Daniel Gryniewicz"
<dang(a)redhat.com>
To: "ceph-users" <ceph-users(a)ceph.io>
Sent: Thursday, 28 January, 2021 14:06:16
Subject: [ceph-users] Re: radosgw process crashes multiple times an hour
> It looks like your radosgw is using a different version of librados. In
> the backtrace, the top useful line begins:
>
> librados::v14_2_0
>
> when it should be v15.2.0, like the ceph::buffer in the same line.
>
> Is there an old librados lying around that didn't get cleaned up somehow?
>
> Daniel
>
>
>
> On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote:
>> Hello,
>>
>> I am experiencing very frequent crashes of the radosgw service. It happens
>> multiple times every hour. As an example, over the last 12 hours we've had
35
>> crashes. Has anyone experienced similar behaviour of the radosgw octopus
>> release service? More info below:
>>
>> Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on
>> one of the servers to Ubuntu 20.04 with latest updates. The second server is
>> still running Ubuntu 18.04. Both services crash occasionally, but the service
>> which is running on Ubuntu 20.04 crashes far more often it seems. The ceph
>> cluster itself is pretty old and was initially setup around 2013. The cluster
>> was updated pretty regularly with every major release. Currently, I've got
>> Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers.
>>
>> Crash Backtrace:
>>
>> ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8
>> |less
>> {
>> "backtrace": [
>> "(()+0x46210) [0x7f815a49a210]",
>> "(gsignal()+0xcb) [0x7f815a49a18b]",
>> "(abort()+0x12b) [0x7f815a479859]",
>> "(()+0x9e951) [0x7f8150ee9951]",
>> "(()+0xaa47c) [0x7f8150ef547c]",
>> "(()+0xaa4e7) [0x7f8150ef54e7]",
>> "(()+0xaa799) [0x7f8150ef5799]",
>> "(()+0x344ba) [0x7f815a1404ba]",
>> "(()+0x71e04) [0x7f815a17de04]",
>> "(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor
>> const&, ceph::buffer::v15_2_0::list const&)+0x5d)
[0x7f815a18c7bd]",
>> "(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&,
>> RGWAccessListFilter*)+0x115) [0x7f815b0d9935]",
>> "(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
>> const&, std::__cxx11::basic_string<char, std::char_traits<char>,
>> std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255)
>> [0x7f815abd7035]",
>> "(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
>> const&)+0x206) [0x7f815b0ccfe6]",
>>
"(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&,
void**)+0x41)
>> [0x7f815ad23201]",
>> "(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char,
>> std::char_traits<char>, std::allocator<char> > const&,
>> std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
>> const&, void**)+0x71) [0x7f815ad254d1]",
>> "(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]",
>> "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25)
>> [0x7f815ae60f25]",
>> "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*,
>> ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]",
>> "(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb)
[0x7f81517b072b]",
>> "(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]",
>> "(()+0x9609) [0x7f815130d609]",
>> "(clone()+0x43) [0x7f815a576293]"
>> ],
>> "ceph_version": "15.2.8",
>> "crash_id":
"2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8",
>> "entity_name": "client.radosgw1.gateway",
>> "os_id": "ubuntu",
>> "os_name": "Ubuntu",
>> "os_version": "20.04.1 LTS (Focal Fossa)",
>> "os_version_id": "20.04",
>> "process_name": "radosgw",
>> "stack_sig":
"347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a",
>> "timestamp": "2021-01-28T11:36:48.912771Z",
>> "utsname_hostname": "arh-ibstorage1-ib",
>> "utsname_machine": "x86_64",
>> "utsname_release": "5.4.0-64-generic",
>> "utsname_sysname": "Linux",
>> "utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC
2021"
>> }
>>
>>
>>
>>
>>
>> radosgw.log file (file names were redacted):
>>
>>
>> -25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
/<file_name>-u115134.JPG
>> HTTP/1.1" 400 460 - -
>> -24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request
>> req=0x7f80437f5780 =====
>> -23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for
>> trans_id = tx000000000000000001431-006012a1d0-31197b5c-default
>> -22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1
>> -21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
>> verifying requester
>> -20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
>> normalizing buckets and tenants
>> -19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init
>> permissions
>> -18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid
>> dest placement: default-placement/REDUCED_REDUNDANCY
>> -17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER:
err_no=-22
>> new_err_no=-22
>> -16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op
>> status=0
>> -15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http
>> status=400
>> -14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done
>> req=0x7f80437f5780 op status=0 http_status=400 latency=0s ======
>> -13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8:
>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
>> /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - -
>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
>> req=0x7f8043ff6780 =====
>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
verifying
>> requester
>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
>> normalizing buckets and tenants
>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
>> req=0x7f8043ff6780 =====
>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
verifying
>> requester
>> -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
>> req=0x7f8043ff6780 =====
>> -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
>> trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
>> -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
>> -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
verifying
>> requester
>> -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
>> normalizing buckets and tenants
>> -7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init
>> permissions
>> -6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid
dest
>> placement: default-placement/REDUCED_REDUNDANCY
>> -5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER:
err_no=-22
>> new_err_no=-22
>> -4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op
>> status=0
>> -3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http
>> status=400
>> -2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done
>> req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ======
>>
>> -1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
>> 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
>> /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - -
>> 0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted)
**
>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045
>> (ceph:ceph)
>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8
>> (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw,
>> pid 30417
>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb
>> 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s
>>
>>
>> Could someone help me troubleshoot and fix the issue?
>>
>> Thanks
>> Andrei
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io