Dear Ceph folks,
Can someoen with better ceph expertise help out?
I am running a small ceph cluster of 3 nodes, Luminoous 12.2.12. On one node with weaker
CPU, i have to reweight OSDs to keep them alive. If PGs on these OSDs exceeds some 40+,
then these OSDs will continusouly restart (down) without success. I checked the CPU usage
and it is actually not overloaded.
What could be the root problem?
best regards,
Samuel
***********************************************************************************************************************************************
lags = delete
-42> 2020-03-17 17:13:53.374486 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbf69ccd:::rbd_data.128dd6b8b4567.0000000000003e96:head have 217'1230 flags =
delete tried to add 217'1230 flags = delete
-41> 2020-03-17 17:13:53.374502 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbf72b1e:::rbd_data.128f46b8b4567.00000000000005e4:head have 217'1103 flags =
delete tried to add 217'1103 flags = delete
-40> 2020-03-17 17:13:53.374518 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbf91ecf:::rbd_data.8f3b66b8b4567.000000000000038e:head have 3093'11873 flags =
delete tried to add 3093'11873 flags = delete
-39> 2020-03-17 17:13:53.374533 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbf9b36e:::rbd_data.2a0286b8b4567.0000000000000176:head have 1619'2253 flags =
delete tried to add 1619'2253 flags = delete
-38> 2020-03-17 17:13:53.374549 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbf9dcdd:::rbd_data.128f46b8b4567.00000000000019b8:head have 217'1145 flags =
delete tried to add 217'1145 flags = delete
-37> 2020-03-17 17:13:53.374565 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfa7411:::rbd_data.73a8f6b8b4567.0000000000000231:head have 1934'8269 flags =
delete tried to add 1934'8269 flags = delete
-36> 2020-03-17 17:13:53.374581 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfc02e9:::rbd_data.128dd6b8b4567.00000000000002d0:head have 217'1111 flags =
delete tried to add 217'1111 flags = delete
-35> 2020-03-17 17:13:53.374597 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfccec4:::rbd_data.128f46b8b4567.0000000000000ea6:head have 217'1125 flags =
delete tried to add 217'1125 flags = delete
-34> 2020-03-17 17:13:53.374612 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfcf45a:::rbd_data.74a826b8b4567.000000000000004a:head have 1934'10159 flags =
none tried to add 1934'10159 flags = none
-33> 2020-03-17 17:13:53.374628 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfeb6f9:::rbd_data.8d7246b8b4567.00000000000042a5:head have 4114'12725 flags =
none tried to add 4114'12725 flags = none
-32> 2020-03-17 17:13:53.374643 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbfeb9ce:::rbd_data.6e6fa6b8b4567.0000000000000093:head have 1934'10160 flags =
none tried to add 1934'10160 flags = none
-31> 2020-03-17 17:13:53.374659 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbff1dd0:::rbd_data.128dd6b8b4567.0000000000004332:head have 217'1240 flags =
delete tried to add 217'1240 flags = delete
-30> 2020-03-17 17:13:53.374675 201a0afdb40 0 0x201d403edd0 6.d3 unexpected need
for 6:cbffd7b5:::rbd_data.128dd6b8b4567.0000000000004330:head have 217'1239 flags =
delete tried to add 217'1239 flags = delete
-29> 2020-03-17 17:13:54.738477 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c0b4c0:::rbd_data.9177446e87ccd.0000000000000b1d:head have 4711'802 flags =
none tried to add 4711'802 flags = none
-28> 2020-03-17 17:13:54.738516 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c185d7:::rbd_data.589572ae8944a.0000000000000c3a:head have 1805'339 flags =
delete tried to add 1805'339 flags = delete
-27> 2020-03-17 17:13:54.738535 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c1b3ab:::rbd_data.8922074b0dc51.0000000000003ea1:head have 3099'603 flags =
delete tried to add 3099'603 flags = delete
-26> 2020-03-17 17:13:54.738552 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c28113:::rbd_data.9117a327b23c6.00000000000002bd:head have 3122'646 flags =
none tried to add 3122'646 flags = none
-25> 2020-03-17 17:13:54.738568 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c2926a:::rbd_data.589572ae8944a.000000000000022d:head have 1805'337 flags =
delete tried to add 1805'337 flags = delete
-24> 2020-03-17 17:13:54.738584 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c2bb8b:::rbd_data.5f15f625558ec.000000000000027f:head have 1913'383 flags =
none tried to add 1913'383 flags = none
-23> 2020-03-17 17:13:54.738600 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c3ca0e:::rbd_data.8a738625558ec.0000000000005ca0:head have 3298'654 flags =
none tried to add 3298'654 flags = none
-22> 2020-03-17 17:13:54.738615 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c4892e:::rbd_data.8ff1766334873.0000000000009ea0:head have 3092'523 flags =
delete tried to add 3092'523 flags = delete
-21> 2020-03-17 17:13:54.738631 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c4ce18:::rbd_data.9177446e87ccd.0000000000000d58:head have 4750'803 flags =
none tried to add 4750'803 flags = none
-20> 2020-03-17 17:13:54.738647 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18c675ec:::rbd_data.8ff1766334873.00000000000018a2:head have 3092'468 flags =
delete tried to add 3092'468 flags = delete
-19> 2020-03-17 17:13:54.738662 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18cbb5d5:::rbd_data.9117a327b23c6.00000000000028a6:head have 3126'652 flags =
none tried to add 3126'652 flags = none
-18> 2020-03-17 17:13:54.738678 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18d2a04f:::rbd_data.110706b8b4567.0000000000000662:head have 1601'112 flags =
none tried to add 1601'112 flags = none
-17> 2020-03-17 17:13:54.738693 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18d424df:::rbd_data.9177446e87ccd.0000000000001ca7:head have 3122'649 flags =
none tried to add 3122'649 flags = none
-16> 2020-03-17 17:13:54.738708 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18d43cd1:::rbd_data.589572ae8944a.0000000000003a12:head have 1805'350 flags =
delete tried to add 1805'350 flags = delete
-15> 2020-03-17 17:13:54.738724 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18d91ef4:::rbd_data.589572ae8944a.0000000000004d34:head have 1805'354 flags =
delete tried to add 1805'354 flags = delete
-14> 2020-03-17 17:13:54.738739 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18e2953a:::rbd_data.5f15f625558ec.00000000000089d0:head have 1913'384 flags =
none tried to add 1913'384 flags = none
-13> 2020-03-17 17:13:54.738755 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18e514a7:::rbd_data.8f17c327b23c6.00000000000040a3:head have 2962'410 flags =
delete tried to add 2962'410 flags = delete
-12> 2020-03-17 17:13:54.738770 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18ea2e7b:::rbd_data.9177446e87ccd.000000000000095c:head have 4711'801 flags =
none tried to add 4711'801 flags = none
-11> 2020-03-17 17:13:54.738785 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18ed4244:::rbd_data.8ff1766334873.00000000000006a2:head have 3092'456 flags =
delete tried to add 3092'456 flags = delete
-10> 2020-03-17 17:13:54.738801 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18f0ccd9:::rbd_data.589572ae8944a.00000000000045e7:head have 1805'353 flags =
delete tried to add 1805'353 flags = delete
-9> 2020-03-17 17:13:54.738816 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18f69445:::rbd_data.589572ae8944a.0000000000000229:head have 1805'336 flags =
delete tried to add 1805'336 flags = delete
-8> 2020-03-17 17:13:54.738833 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18f88400:::rbd_data.589572ae8944a.0000000000004e31:head have 1805'355 flags =
delete tried to add 1805'355 flags = delete
-7> 2020-03-17 17:13:54.738848 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18f8e666:::rbd_data.9177446e87ccd.0000000000000387:head have 4416'800 flags =
none tried to add 4416'800 flags = none
-6> 2020-03-17 17:13:54.738864 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18fae8ee:::rbd_data.9177446e87ccd.0000000000005aa6:head have 3330'657 flags =
none tried to add 3330'657 flags = none
-5> 2020-03-17 17:13:54.738880 201a12fdb40 0 0x201d4123390 4.318 unexpected need
for 4:18fdcb08:::rbd_data.110706b8b4567.0000000000001401:head have 1601'113 flags =
none tried to add 1601'113 flags = none
-4> 2020-03-17 17:14:19.091232 2000f81db40 0 -- 192.168.230.122:6812/171483
>> 192.168.230.11:0/1871093011 conn(0x20014cce840 :6812
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging
authorizer
-3> 2020-03-17 17:14:19.106328 2000ec51b40 0 -- 192.168.230.122:6812/171483
>> 192.168.230.12:0/4148513923 conn(0x20014109640 :6812
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging
authorizer
-2> 2020-03-17 17:14:19.155177 2000e3ddb40 0 -- 192.168.230.122:6812/171483
>> 192.168.230.13:0/768399090 conn(0x200141ac4d0 :6812
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging
authorizer
-1> 2020-03-17 17:14:36.138777 2000f81db40 0 -- 192.168.230.122:6812/171483
>> 192.168.230.202:0/3091162658 conn(0x200147d1f90 :6812
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging
authorizer
0> 2020-03-17 17:15:06.927121 201a6afdb40 -1 *** Caught signal (Bus error) **
in thread 201a6afdb40 thread_name:tp_osd_tp
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
1: (()+0x145882c) [0x2000245882c]
2: (()+0x19890) [0x2000d281890]
3: (BlueStore::ExtentMap::reshard(KeyValueDB*,
std::shared_ptr<KeyValueDB::TransactionImpl>)+0x2df0) [0x2000229da60]
4: (BlueStore::_txc_write_nodes(BlueStore::TransContext*,
std::shared_ptr<KeyValueDB::TransactionImpl>)+0x218) [0x2000229f888]
5: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
>&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x71c)
[0x200022c7a6c]
6: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*, Context*, Context*,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x240) [0x20001c19ee0]
7: (PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&,
boost::intrusive_ptr<OpRequest>)+0x90) [0x20001e871b0]
8: (ReplicatedBackend::_do_pull_response(boost::intrusive_ptr<OpRequest>)+0x650)
[0x2000203e5f0]
9: (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0x3a4)
[0x200020440c4]
10: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x94)
[0x20001ecea74]
11: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x814) [0x20001de1384]
12: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x614) [0x20001b817d4]
13: (PGQueueable::RunVis::operator()(boost::intrusive_ptr<OpRequest>
const&)+0xb8) [0x20001f98968]
14: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x1c24)
[0x20001bb5fd4]
15: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0xab4) [0x200024d60a4]
16: (ShardedThreadPool::WorkThreadSharded::entry()+0x28) [0x200024da278]
17: (Thread::entry_wrapper()+0xec) [0x20002769b4c]
18: (Thread::_entry_func(void*)+0x20) [0x20002769ba0]
19: (()+0x80fc) [0x2000d2700fc]
20: (()+0x119854) [0x2000bed1854]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
huxiaoyu(a)horebdata.cn
Show replies by date