Hi Daniel,
Thanks for you're reply. I've checked the package versions on that server and
all ceph related packages on that server are from 15.2.8 version:
ii librados2 15.2.8-1focal amd64 RADOS distributed object store
client library
ii libradosstriper1 15.2.8-1focal amd64 RADOS striping interface
ii python3-rados 15.2.8-1focal amd64 Python 3 libraries for the Ceph
librados library
ii radosgw 15.2.8-1focal amd64 REST gateway for RADOS
distributed object store
ii librbd1 15.2.8-1focal amd64 RADOS block device client library
ii python3-rbd 15.2.8-1focal amd64 Python 3 libraries for the Ceph
librbd library
ii ceph 15.2.8-1focal amd64 distributed storage
and file system
ii ceph-base 15.2.8-1focal amd64 common ceph daemon
libraries and management tools
ii ceph-common 15.2.8-1focal amd64 common utilities to
mount and interact with a ceph storage cluster
ii ceph-fuse 15.2.8-1focal amd64 FUSE-based client
for the Ceph distributed file system
ii ceph-mds 15.2.8-1focal amd64 metadata server for
the ceph distributed file system
ii ceph-mgr 15.2.8-1focal amd64 manager for the
ceph distributed storage system
ii ceph-mgr-cephadm 15.2.8-1focal all cephadm
orchestrator module for ceph-mgr
ii ceph-mgr-dashboard 15.2.8-1focal all dashboard module
for ceph-mgr
ii ceph-mgr-diskprediction-cloud 15.2.8-1focal all
diskprediction-cloud module for ceph-mgr
ii ceph-mgr-diskprediction-local 15.2.8-1focal all
diskprediction-local module for ceph-mgr
ii ceph-mgr-k8sevents 15.2.8-1focal all kubernetes events
module for ceph-mgr
ii ceph-mgr-modules-core 15.2.8-1focal all ceph manager
modules which are always enabled
ii ceph-mgr-rook 15.2.8-1focal all rook module for
ceph-mgr
ii ceph-mon 15.2.8-1focal amd64 monitor server for
the ceph storage system
ii ceph-osd 15.2.8-1focal amd64 OSD server for the
ceph storage system
ii cephadm 15.2.8-1focal amd64 cephadm utility to
bootstrap ceph daemons with systemd and containers
ii libcephfs2 15.2.8-1focal amd64 Ceph distributed
file system client library
ii python3-ceph 15.2.8-1focal amd64 Meta-package for
python libraries for the Ceph libraries
ii python3-ceph-argparse 15.2.8-1focal all Python 3 utility
libraries for Ceph CLI
ii python3-ceph-common 15.2.8-1focal all Python 3 utility
libraries for Ceph
ii python3-cephfs 15.2.8-1focal amd64 Python 3 libraries
for the Ceph libcephfs library
As this is a brand new 20.04 server I do not see how the older version could
have got onto it.
Andrei
----- Original Message -----
From: "Daniel Gryniewicz"
<dang(a)redhat.com>
To: "ceph-users" <ceph-users(a)ceph.io>
Sent: Thursday, 28 January, 2021 14:06:16
Subject: [ceph-users] Re: radosgw process crashes multiple times an hour
It looks like your radosgw is using a different
version of librados. In
the backtrace, the top useful line begins:
librados::v14_2_0
when it should be v15.2.0, like the ceph::buffer in the same line.
Is there an old librados lying around that didn't get cleaned up somehow?
Daniel
On 1/28/21 7:27 AM, Andrei Mikhailovsky wrote:
Hello,
I am experiencing very frequent crashes of the radosgw service. It happens
multiple times every hour. As an example, over the last 12 hours we've had 35
crashes. Has anyone experienced similar behaviour of the radosgw octopus
release service? More info below:
Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on
one of the servers to Ubuntu 20.04 with latest updates. The second server is
still running Ubuntu 18.04. Both services crash occasionally, but the service
which is running on Ubuntu 20.04 crashes far more often it seems. The ceph
cluster itself is pretty old and was initially setup around 2013. The cluster
was updated pretty regularly with every major release. Currently, I've got
Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers.
Crash Backtrace:
ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8
|less
{
"backtrace": [
"(()+0x46210) [0x7f815a49a210]",
"(gsignal()+0xcb) [0x7f815a49a18b]",
"(abort()+0x12b) [0x7f815a479859]",
"(()+0x9e951) [0x7f8150ee9951]",
"(()+0xaa47c) [0x7f8150ef547c]",
"(()+0xaa4e7) [0x7f8150ef54e7]",
"(()+0xaa799) [0x7f8150ef5799]",
"(()+0x344ba) [0x7f815a1404ba]",
"(()+0x71e04) [0x7f815a17de04]",
"(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor
const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]",
"(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
RGWAccessListFilter*)+0x115) [0x7f815b0d9935]",
"(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
const&, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255)
[0x7f815abd7035]",
"(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
const&)+0x206) [0x7f815b0ccfe6]",
"(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&, void**)+0x41)
[0x7f815ad23201]",
"(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> > const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >
const&, void**)+0x71) [0x7f815ad254d1]",
"(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]",
"(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25)
[0x7f815ae60f25]",
"(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*,
ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]",
"(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]",
"(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]",
"(()+0x9609) [0x7f815130d609]",
"(clone()+0x43) [0x7f815a576293]"
],
"ceph_version": "15.2.8",
"crash_id":
"2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8",
"entity_name": "client.radosgw1.gateway",
"os_id": "ubuntu",
"os_name": "Ubuntu",
"os_version": "20.04.1 LTS (Focal Fossa)",
"os_version_id": "20.04",
"process_name": "radosgw",
"stack_sig":
"347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a",
"timestamp": "2021-01-28T11:36:48.912771Z",
"utsname_hostname": "arh-ibstorage1-ib",
"utsname_machine": "x86_64",
"utsname_release": "5.4.0-64-generic",
"utsname_sysname": "Linux",
"utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021"
}
radosgw.log file (file names were redacted):
-25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG
HTTP/1.1" 400 460 - -
-24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request
req=0x7f80437f5780 =====
-23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for
trans_id = tx000000000000000001431-006012a1d0-31197b5c-default
-22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1
-21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
verifying requester
-20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj
normalizing buckets and tenants
-19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init
permissions
-18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid
dest placement: default-placement/REDUCED_REDUNDANCY
-17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22
new_err_no=-22
-16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op
status=0
-15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http
status=400
-14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done
req=0x7f80437f5780 op status=0 http_status=400 latency=0s ======
-13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
/<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - -
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
normalizing buckets and tenants
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request
req=0x7f8043ff6780 =====
-11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for
trans_id = tx000000000000000001432-006012a1d0-31197b5c-default
-10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1
-9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying
requester
-8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj
normalizing buckets and tenants
-7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init
permissions
-6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest
placement: default-placement/REDUCED_REDUNDANCY
-5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22
new_err_no=-22
-4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op
status=0
-3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http
status=400
-2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done
req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ======
-1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010:
176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT
/<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - -
0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) **
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045
(ceph:ceph)
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8
(bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw,
pid 30417
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb
2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s
Could someone help me troubleshoot and fix the issue?
Thanks
Andrei
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io