November 2019 - ceph-users

by Romain Raynaud

Hello please can you help me with that. I'm on ubuntu 18.04 bionic trying to install ceph version 12.2.12 with ceph-deploy to 3 nodes on odroid xu4 (armhf). Creating osd with lvm volumes seem's to work but after starting manager by the command "ceph-deploy mgr create node1", mgr go to starting mode then stay in a health warn state due to no mgr demons active. regards --$ ceph -s --------------------------------------------------------------------- cluster: id: 84f324c2-ac27-4f72-bcb0-7ff1355ee97e health: HEALTH_WARN no active mgr services: mon: 3 daemons, quorum node1,node3,node2 mgr: no daemons active osd: 3 osds: 3 up, 3 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0B usage: 0B used, 0B / 0B avail pgs: ---/var/log/ceph/ceph-mgr.node1.log------------------------------------------------------------- 2019-11-14 11:08:45.794236 b6f87230 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 7879 2019-11-14 11:08:45.799472 b6f87230 0 pidfile_write: ignore empty --pid-file 2019-11-14 11:08:45.839575 b6f87230 1 mgr send_beacon standby 2019-11-14 11:08:45.866590 b06c9c30 -1 *** Caught signal (Segmentation fault) ** in thread b06c9c30 thread_name:ms_dispatch ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x302eac) [0x7d6eac] 2: (()+0x25750) [0xb6862750] 3: (_ULarm_step()+0x5b) [0xb67eecec] 4: (()+0x255e8) [0xb6cb05e8] 5: (GetStackTrace(void**, int, int)+0x25) [0xb6cb0a3e] 6: (tcmalloc::PageHeap::GrowHeap(unsigned int)+0xb9) [0xb6ca536a] 7: (tcmalloc::PageHeap::New(unsigned int)+0x79) [0xb6ca55e6] 8: (tcmalloc::CentralFreeList::Populate()+0x71) [0xb6ca45ce] 9: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x1b) [0xb6ca4760] 10: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x6d) [0xb6ca47de] 11: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int)+0x51) [0xb6ca6a56] 12: (malloc()+0x22d) [0xb6cb1a8e] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -108> 2019-11-14 11:08:45.776013 b6f87230 5 asok(0x5625320) register_command perfcounters_dump hook 0x55bc090 -107> 2019-11-14 11:08:45.776070 b6f87230 5 asok(0x5625320) register_command 1 hook 0x55bc090 -106> 2019-11-14 11:08:45.776096 b6f87230 5 asok(0x5625320) register_command perf dump hook 0x55bc090 -105> 2019-11-14 11:08:45.776134 b6f87230 5 asok(0x5625320) register_command perfcounters_schema hook 0x55bc090 -104> 2019-11-14 11:08:45.776155 b6f87230 5 asok(0x5625320) register_command perf histogram dump hook 0x55bc090 -103> 2019-11-14 11:08:45.776194 b6f87230 5 asok(0x5625320) register_command 2 hook 0x55bc090 -102> 2019-11-14 11:08:45.776244 b6f87230 5 asok(0x5625320) register_command perf schema hook 0x55bc090 -101> 2019-11-14 11:08:45.776267 b6f87230 5 asok(0x5625320) register_command perf histogram schema hook 0x55bc090 -100> 2019-11-14 11:08:45.776304 b6f87230 5 asok(0x5625320) register_command perf reset hook 0x55bc090 -99> 2019-11-14 11:08:45.776325 b6f87230 5 asok(0x5625320) register_command config show hook 0x55bc090 -98> 2019-11-14 11:08:45.776358 b6f87230 5 asok(0x5625320) register_command config help hook 0x55bc090 -97> 2019-11-14 11:08:45.776380 b6f87230 5 asok(0x5625320) register_command config set hook 0x55bc090 -96> 2019-11-14 11:08:45.776416 b6f87230 5 asok(0x5625320) register_command config get hook 0x55bc090 -95> 2019-11-14 11:08:45.776457 b6f87230 5 asok(0x5625320) register_command config diff hook 0x55bc090 -94> 2019-11-14 11:08:45.776479 b6f87230 5 asok(0x5625320) register_command config diff get hook 0x55bc090 -93> 2019-11-14 11:08:45.776513 b6f87230 5 asok(0x5625320) register_command log flush hook 0x55bc090 -92> 2019-11-14 11:08:45.776534 b6f87230 5 asok(0x5625320) register_command log dump hook 0x55bc090 -91> 2019-11-14 11:08:45.776568 b6f87230 5 asok(0x5625320) register_command log reopen hook 0x55bc090 -90> 2019-11-14 11:08:45.776679 b6f87230 5 asok(0x5625320) register_command dump_mempools hook 0x5757b04 -89> 2019-11-14 11:08:45.794236 b6f87230 0 ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable), process ceph-mgr, pid 7879 -88> 2019-11-14 11:08:45.799472 b6f87230 0 pidfile_write: ignore empty --pid-file -87> 2019-11-14 11:08:45.800376 b6f87230 1 finished global_init_daemonize -86> 2019-11-14 11:08:45.813287 b6f87230 5 asok(0x5625320) init /var/run/ceph/ceph-mgr.node1.asok -85> 2019-11-14 11:08:45.813359 b6f87230 5 asok(0x5625320) bind_and_listen /var/run/ceph/ceph-mgr.node1.asok -84> 2019-11-14 11:08:45.813867 b6f87230 5 asok(0x5625320) register_command 0 hook 0x55bc180 -83> 2019-11-14 11:08:45.813920 b6f87230 5 asok(0x5625320) register_command version hook 0x55bc180 -82> 2019-11-14 11:08:45.813954 b6f87230 5 asok(0x5625320) register_command git_version hook 0x55bc180 -81> 2019-11-14 11:08:45.813994 b6f87230 5 asok(0x5625320) register_command help hook 0x55bc178 -80> 2019-11-14 11:08:45.814085 b6f87230 5 asok(0x5625320) register_command get_command_descriptions hook 0x55bc170 -79> 2019-11-14 11:08:45.814371 b3ed0c30 5 asok(0x5625320) entry start -78> 2019-11-14 11:08:45.819066 b36cfc30 2 Event(0x55be068 nevent=5000 time_id=1).set_owner idx=0 owner=3010264112 -77> 2019-11-14 11:08:45.819281 b2ecec30 2 Event(0x55be488 nevent=5000 time_id=1).set_owner idx=1 owner=3001871408 -76> 2019-11-14 11:08:45.819484 b26cdc30 2 Event(0x55be1c8 nevent=5000 time_id=1).set_owner idx=2 owner=2993478704 -75> 2019-11-14 11:08:45.821210 b6f87230 1 Processor -- start -74> 2019-11-14 11:08:45.821472 b6f87230 1 -- - start start -73> 2019-11-14 11:08:45.821506 b6f87230 10 monclient: build_initial_monmap -72> 2019-11-14 11:08:45.821667 b6f87230 10 monclient: init -71> 2019-11-14 11:08:45.821796 b6f87230 5 adding auth protocol: cephx -70> 2019-11-14 11:08:45.821826 b6f87230 10 monclient: auth_supported 2 method cephx -69> 2019-11-14 11:08:45.822606 b6f87230 2 auth: KeyRing::load: loaded key file /var/lib/ceph/mgr/ceph-node1/keyring -68> 2019-11-14 11:08:45.822947 b6f87230 10 monclient: _reopen_session rank -1 -67> 2019-11-14 11:08:45.823131 b6f87230 10 monclient(hunting): picked mon.noname-b con 0x5804d00 addr [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 -66> 2019-11-14 11:08:45.823297 b6f87230 10 monclient(hunting): picked mon.noname-c con 0x5805a00 addr [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -65> 2019-11-14 11:08:45.823437 b6f87230 1 -- - --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x5600680 con 0 -64> 2019-11-14 11:08:45.823555 b6f87230 1 -- - --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 -- auth(proto 0 30 bytes epoch 0) v1 -- 0x5600820 con 0 -63> 2019-11-14 11:08:45.823646 b6f87230 10 monclient(hunting): _renew_subs -62> 2019-11-14 11:08:45.825773 b26cdc30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 learned_addr learned my addr [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 -61> 2019-11-14 11:08:45.826773 b26cdc30 2 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_acked_seq 0 vs out_seq 0 -60> 2019-11-14 11:08:45.827140 b2ecec30 2 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=0)._process_connection got newly_acked_seq 0 vs out_seq 0 -59> 2019-11-14 11:08:45.828839 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 1 0x5630900 mon_map magic: 0 v1 -58> 2019-11-14 11:08:45.829033 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.1 [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 1 ==== mon_map magic: 0 v1 ==== 469+0+0 (726272991 0 0) 0x5630900 con 0x5805a00 -57> 2019-11-14 11:08:45.829141 b06c9c30 10 monclient(hunting): handle_monmap mon_map magic: 0 v1 -56> 2019-11-14 11:08:45.829085 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 2 0x560e540 auth_reply(proto 2 0 (0) Success) v1 -55> 2019-11-14 11:08:45.829165 b2ecec30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1). rx mon.2 seq 1 0x5630a80 mon_map magic: 0 v1 -54> 2019-11-14 11:08:45.829225 b06c9c30 10 monclient(hunting): got monmap 1, mon.noname-c is now rank -1 -53> 2019-11-14 11:08:45.829244 b06c9c30 10 monclient(hunting): dump: epoch 1 fsid 84f324c2-ac27-4f72-bcb0-7ff1355ee97e last_changed 2019-11-13 14:18:49.133327 created 2019-11-13 14:18:49.133327 0: [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:6789/0 mon.node1 1: [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 mon.fpgh3 2: [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 mon.fpgh2 -52> 2019-11-14 11:08:45.829331 b2ecec30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1). rx mon.2 seq 2 0x560e700 auth_reply(proto 2 0 (0) Success) v1 -51> 2019-11-14 11:08:45.829402 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.1 [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (3710817385 0 0) 0x560e540 con 0x5805a00 -50> 2019-11-14 11:08:45.829512 b06c9c30 10 monclient(hunting): my global_id is 24280 -49> 2019-11-14 11:08:45.829791 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x5600d00 con 0 -48> 2019-11-14 11:08:45.829910 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.2 [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 1 ==== mon_map magic: 0 v1 ==== 469+0+0 (726272991 0 0) 0x5630a80 con 0x5804d00 -47> 2019-11-14 11:08:45.829982 b06c9c30 10 monclient(hunting): handle_monmap mon_map magic: 0 v1 -46> 2019-11-14 11:08:45.830052 b06c9c30 10 monclient(hunting): got monmap 1, mon.fpgh2 is now rank 2 -45> 2019-11-14 11:08:45.830085 b06c9c30 10 monclient(hunting): dump: epoch 1 fsid 84f324c2-ac27-4f72-bcb0-7ff1355ee97e last_changed 2019-11-13 14:18:49.133327 created 2019-11-13 14:18:49.133327 0: [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:6789/0 mon.node1 1: [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 mon.fpgh3 2: [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 mon.fpgh2 -44> 2019-11-14 11:08:45.830244 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.2 [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (2346605455 0 0) 0x560e700 con 0x5804d00 -43> 2019-11-14 11:08:45.830343 b06c9c30 10 monclient(hunting): my global_id is 24308 -42> 2019-11-14 11:08:45.830535 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x5600680 con 0 -41> 2019-11-14 11:08:45.832185 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 3 0x560e8c0 auth_reply(proto 2 0 (0) Success) v1 -40> 2019-11-14 11:08:45.832327 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.1 [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3339254479 0 0) 0x560e8c0 con 0x5805a00 -39> 2019-11-14 11:08:45.832834 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x5600d00 con 0 -38> 2019-11-14 11:08:45.833078 b2ecec30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1). rx mon.2 seq 3 0x560e700 auth_reply(proto 2 0 (0) Success) v1 -37> 2019-11-14 11:08:45.833228 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.2 [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 206+0+0 (3081092587 0 0) 0x560e700 con 0x5804d00 -36> 2019-11-14 11:08:45.833816 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x5600820 con 0 -35> 2019-11-14 11:08:45.836366 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 4 0x560ea80 auth_reply(proto 2 0 (0) Success) v1 -34> 2019-11-14 11:08:45.836509 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.1 [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 751+0+0 (1301292739 0 0) 0x560ea80 con 0x5805a00 -33> 2019-11-14 11:08:45.837249 b2ecec30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=87 cs=1 l=1). rx mon.2 seq 4 0x560e8c0 auth_reply(proto 2 0 (0) Success) v1 -32> 2019-11-14 11:08:45.837359 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN pgs=87 cs=1 l=1).mark_down -31> 2019-11-14 11:08:45.837425 b06c9c30 2 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86b6]:6789/0 conn(0x5804d00 :-1 s=STATE_OPEN pgs=87 cs=1 l=1)._stop -30> 2019-11-14 11:08:45.837558 b06c9c30 1 monclient: found mon.fpgh3 -29> 2019-11-14 11:08:45.837623 b06c9c30 10 monclient: _send_mon_message to mon.fpgh3 at [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -28> 2019-11-14 11:08:45.837669 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- mon_subscribe({mgrmap=0+,monmap=0+}) v2 -- 0x55beb00 con 0 -27> 2019-11-14 11:08:45.837797 b06c9c30 10 monclient: _check_auth_rotating renewing rotating keys (they expired before 2019-11-14 11:08:15.837792) -26> 2019-11-14 11:08:45.837860 b06c9c30 10 monclient: _send_mon_message to mon.fpgh3 at [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -25> 2019-11-14 11:08:45.837923 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x5600680 con 0 -24> 2019-11-14 11:08:45.838036 b6f87230 5 monclient: authenticate success, global_id 24280 -23> 2019-11-14 11:08:45.838182 b6f87230 10 log_channel(cluster) update_config to_monitors: true to_syslog: false syslog_facility: daemon prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201) -22> 2019-11-14 11:08:45.838290 b6f87230 10 log_channel(audit) update_config to_monitors: true to_syslog: false syslog_facility: local0 prio: info to_graylog: false graylog_host: 127.0.0.1 graylog_port: 12201) -21> 2019-11-14 11:08:45.838619 b6f87230 5 asok(0x5625320) register_command objecter_requests hook 0x55bc1d8 -20> 2019-11-14 11:08:45.838736 b6f87230 10 monclient: _renew_subs -19> 2019-11-14 11:08:45.838770 b6f87230 10 monclient: _send_mon_message to mon.fpgh3 at [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -18> 2019-11-14 11:08:45.838816 b6f87230 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- mon_subscribe({osdmap=0}) v2 -- 0x55bec60 con 0 -17> 2019-11-14 11:08:45.839247 b6f87230 5 asok(0x5625320) register_command mds_requests hook 0xbe83e770 -16> 2019-11-14 11:08:45.839310 b6f87230 5 asok(0x5625320) register_command mds_sessions hook 0xbe83e770 -15> 2019-11-14 11:08:45.839347 b6f87230 5 asok(0x5625320) register_command dump_cache hook 0xbe83e770 -14> 2019-11-14 11:08:45.839383 b6f87230 5 asok(0x5625320) register_command kick_stale_sessions hook 0xbe83e770 -13> 2019-11-14 11:08:45.839420 b6f87230 5 asok(0x5625320) register_command status hook 0xbe83e770 -12> 2019-11-14 11:08:45.839575 b6f87230 1 mgr send_beacon standby -11> 2019-11-14 11:08:45.839646 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 5 0x55fde00 mgrmap(e 11813) v1 -10> 2019-11-14 11:08:45.839756 b06c9c30 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 <== mon.1 [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 5 ==== mgrmap(e 11813) v1 ==== 237+0+0 (2937343514 0 0) 0x55fde00 con 0x5805a00 -9> 2019-11-14 11:08:45.839778 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 6 0x5630900 mon_map magic: 0 v1 -8> 2019-11-14 11:08:45.840302 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 7 0x560e700 auth_reply(proto 2 0 (0) Success) v1 -7> 2019-11-14 11:08:45.840645 b26cdc30 5 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 >> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 conn(0x5805a00 :-1 s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=67 cs=1 l=1). rx mon.1 seq 8 0x5600680 osd_map(45..45 src has 1..45) v3 -6> 2019-11-14 11:08:45.840927 b6f87230 10 monclient: _send_mon_message to mon.fpgh3 at [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -5> 2019-11-14 11:08:45.840980 b6f87230 1 -- [2a01:e0a:c:b9f0:21e:6ff:fe30:c8fc]:0/1860748422 --> [2a01:e0a:c:b9f0:21e:6ff:fe36:86ad]:6789/0 -- mgrbeacon mgr.node1(84f324c2-ac27-4f72-bcb0-7ff1355ee97e,24280, -, 0) v6 -- 0x55d6400 con 0 -4> 2019-11-14 11:08:45.841111 b6f87230 4 mgr init Complete. -3> 2019-11-14 11:08:45.841220 b06c9c30 4 mgr ms_dispatch standby mgrmap(e 11813) v1 -2> 2019-11-14 11:08:45.841266 b06c9c30 4 mgr handle_mgr_map received map epoch 11813 -1> 2019-11-14 11:08:45.841282 b06c9c30 4 mgr handle_mgr_map active in map: 0 active is 0 0> 2019-11-14 11:08:45.866590 b06c9c30 -1 *** Caught signal (Segmentation fault) ** in thread b06c9c30 thread_name:ms_dispatch ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0x302eac) [0x7d6eac] 2: (()+0x25750) [0xb6862750] 3: (_ULarm_step()+0x5b) [0xb67eecec] 4: (()+0x255e8) [0xb6cb05e8] 5: (GetStackTrace(void**, int, int)+0x25) [0xb6cb0a3e] 6: (tcmalloc::PageHeap::GrowHeap(unsigned int)+0xb9) [0xb6ca536a] 7: (tcmalloc::PageHeap::New(unsigned int)+0x79) [0xb6ca55e6] 8: (tcmalloc::CentralFreeList::Populate()+0x71) [0xb6ca45ce] 9: (tcmalloc::CentralFreeList::FetchFromOneSpansSafe(int, void**, void**)+0x1b) [0xb6ca4760] 10: (tcmalloc::CentralFreeList::RemoveRange(void**, void**, int)+0x6d) [0xb6ca47de] 11: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, unsigned int)+0x51) [0xb6ca6a56] 12: (malloc()+0x22d) [0xb6cb1a8e] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 1 reserver 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/10 civetweb 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 bluestore 1/ 5 bluefs 1/ 3 bdev 1/ 5 kstore 4/ 5 rocksdb 4/ 5 leveldb 4/ 5 memdb 1/ 5 kinetic 1/ 5 fuse 1/ 5 mgr 1/ 5 mgrc 1/ 5 dpdk 1/ 5 eventtrace -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-mgr.node1.log --- end dump of recent events --- *Romain Raynaud*

4 years, 5 months

1
0
0 0

mgr hangs with upmap balancer

by Bryan Stillwell

On multiple clusters we are seeing the mgr hang frequently when the balancer is enabled. It seems that the balancer is getting caught in some kind of infinite loop which chews up all the CPU for the mgr which causes problems with other modules like prometheus (we don't have the devicehealth module enabled yet). I've been able to reproduce the issue doing an offline balance as well using the osdmaptool: osdmaptool --debug-osd 10 osd.map --upmap balance-upmaps.sh --upmap-pool default.rgw.buckets.data --upmap-max 100 It seems to loop over the same group of PGs of ~7,000 PGs over and over again like this without finding any new upmaps that can be added: 2019-11-19 16:39:11.131518 7f85a156f300 10 trying 24.d91 2019-11-19 16:39:11.138035 7f85a156f300 10 trying 24.2e3c 2019-11-19 16:39:11.144162 7f85a156f300 10 trying 24.176b 2019-11-19 16:39:11.149671 7f85a156f300 10 trying 24.ac6 2019-11-19 16:39:11.155115 7f85a156f300 10 trying 24.2cb2 2019-11-19 16:39:11.160508 7f85a156f300 10 trying 24.129c 2019-11-19 16:39:11.166287 7f85a156f300 10 trying 24.181f 2019-11-19 16:39:11.171737 7f85a156f300 10 trying 24.3cb1 2019-11-19 16:39:11.177260 7f85a156f300 10 24.2177 already has pg_upmap_items [368,271] 2019-11-19 16:39:11.177268 7f85a156f300 10 trying 24.2177 2019-11-19 16:39:11.182590 7f85a156f300 10 trying 24.a4 2019-11-19 16:39:11.188053 7f85a156f300 10 trying 24.2583 2019-11-19 16:39:11.193545 7f85a156f300 10 24.93e already has pg_upmap_items [80,27] 2019-11-19 16:39:11.193553 7f85a156f300 10 trying 24.93e 2019-11-19 16:39:11.198858 7f85a156f300 10 trying 24.e67 2019-11-19 16:39:11.204224 7f85a156f300 10 trying 24.16d9 2019-11-19 16:39:11.209844 7f85a156f300 10 trying 24.11dc 2019-11-19 16:39:11.215303 7f85a156f300 10 trying 24.1f3d 2019-11-19 16:39:11.221074 7f85a156f300 10 trying 24.2a57 While this cluster is running Luminous (12.2.12), I've reproduced the loop using the same osdmap on Nautilus (14.2.4). Is there somewhere I can privately upload the osdmap for someone to troubleshoot the problem? Thanks, Bryan

4 years, 5 months

2
2
0 0

Mimic (13.2.6) OSD daemon won't start up after system restart, with failed assert...

by aoanla＠gmail.com

Hi everyone, I'm looking for some advice on diagnosing an OSD issue. We have a Mimic cluster, not very full, with Bluestore OSDs. We recently had to bring the cluster down to allow power testing in the host datacentre, and when we brought things up again, 1 OSD daemon would not start. The log shows (cut to useful context): -314> 2019-11-21 15:55:15.561 7efdc049dd80 4 rocksdb: Options.ttl: 0 -314> 2019-11-21 15:55:15.563 7efdc049dd80 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/rocksdb/db/version_set.cc:3362] Recovered from manifest file:db/MANIFEST-000127 succeeded,manifest_file_number is 127, next_file_number is 264, last_sequence is 21956004, log_number is 0,prev_log_number is 0,max_column_family is 0,deleted_log_number is 123 -314> 2019-11-21 15:55:15.563 7efdc049dd80 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/rocksdb/db/version_set.cc:3370] Column family [default] (ID 0), log number is 255 -314> 2019-11-21 15:55:15.563 7efdc049dd80 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1574351715564768, "job": 1, "event": "recovery_started", "log_files": [252, 255]} -314> 2019-11-21 15:55:15.563 7efdc049dd80 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/rocksdb/db/db_impl_open.cc:551] Recovering log #252 mode 0 -314> 2019-11-21 15:55:16.722 7efdc049dd80 4 rocksdb: [/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/rocksdb/db/db_impl_open.cc:551] Recovering log #255 mode 0 -314> 2019-11-21 15:55:17.885 7efdc049dd80 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/os/bluestore/KernelDevice.cc: In function 'virtual int KernelDevice::read(uint64_t, uint64_t, ceph::bufferlist*, IOContext*, bool)' thread 7efdc049dd80 time 2019-11-21 15:55:17.870632 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/os/bluestore/KernelDevice.cc: 825: FAILED assert((uint64_t)r == len) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x7efdb788036b] 2: (()+0x26e4f7) [0x7efdb78804f7] 3: (KernelDevice::read(unsigned long, unsigned long, ceph::buffer::list*, IOContext*, bool)+0x4b4) [0x5619ab313144] 4: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x3c2) [0x5619ab2d59a2] 5: (BlueRocksSequentialFile::Read(unsigned long, rocksdb::Slice*, char*)+0x34) [0x5619ab2f88f4] 6: (rocksdb::SequentialFileReader::Read(unsigned long, rocksdb::Slice*, char*)+0x6b) [0x5619ab4e541b] 7: (rocksdb::log::Reader::ReadMore(unsigned long*, int*)+0xd8) [0x5619ab3f3148] 8: (rocksdb::log::Reader::ReadPhysicalRecord(rocksdb::Slice*, unsigned long*)+0x70) [0x5619ab3f3240] 9: (rocksdb::log::Reader::ReadRecord(rocksdb::Slice*, std::string*, rocksdb::WALRecoveryMode)+0x12b) [0x5619ab3f351b] 10: (rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool)+0xea2) [0x5619ab3a3bf2] 11: (rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool)+0xa59) [0x5619ab3a54e9] 12: (rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::string const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool)+0x689) [0x5619ab3a6299] 13: (rocksdb::DB::Open(rocksdb::DBOptions const&, std::string const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x22) [0x5619ab3a7ac2] 14: (RocksDBStore::do_open(std::ostream&, bool, std::vector<KeyValueDB::ColumnFamily, std::allocator<KeyValueDB::ColumnFamily> > const*)+0x164e) [0x5619ab27a43e] 15: (BlueStore::_open_db(bool, bool)+0xd6a) [0x5619ab205f9a] 16: (BlueStore::_mount(bool, bool)+0x4d1) [0x5619ab237071] 17: (OSD::init()+0x28f) [0x5619aaddeedf] 18: (main()+0x23a3) [0x5619aacbd7a3] 19: (__libc_start_main()+0xf5) [0x7efdb33f2505] The disk behind this OSD is very new, and hasn't been stressed very much, so I am not convinced it's a disk failure issue. Is this a known bug in Mimic (it's hard to find a similar bug in the bug tracker)... how should I diagnose this? Sam

4 years, 5 months

1
0
0 0

Cephalocon 2020 will be March 4-5 in Seoul, South Korea!

by Sage Weil

Hi everyone, We're pleased to announce that the next Cephalocon will be March 3-5 in Seoul, South Korea! https://ceph.com/cephalocon/seoul-2020/ The CFP for the conference is now open: https://linuxfoundation.smapply.io/prog/cephalocon_2020 Main conference: March 4-5 Developer summit: March 3 Mark your calendars, and get your talk proposals in! The CFP will close in early December in order to get a final schedule published in early January. In addition to the two day conference, we will also have a developer summit on March 3 to take advantage of having so many developers in the same place at the same time. The developer sessions will include video conferencing so that remote developers will also be able to participate. A sponsorship prospectus will be available Real Soon Now. We hope you can join us!

4 years, 5 months

1
0
0 0

scrub error on object storage pool

by M Ranga Swami Reddy

Hello - Recently we have upgraded to Luminous 12.2.11. After that we can see the scrub errors on the object storage pool only on daily basis. After repair, it will be cleared. But again it will come tomorrow after scrub performed the PG. Any known issue - on scrub errs with 12.2.11 version? Thanks Swami

4 years, 5 months

1
0
0 0

msgr2 not used on OSDs in some Nautilus clusters

by Bryan Stillwell

I've upgraded 7 of our clusters to Nautilus (14.2.4) and noticed that on some of the clusters (3 out of 7) the OSDs aren't using msgr2 at all. Here's the output for osd.0 on 2 clusters of each type: ### Cluster 1 (v1 only): # ceph osd find 0 | jq -r '.addrs' { "addrvec": [ { "type": "v1", "addr": "10.26.0.33:6809", "nonce": 4185021 } ] } ### Cluster 2 (v1 only): # ceph osd find 0 | jq -r '.addrs' { "addrvec": [ { "type": "v1", "addr": "10.197.0.243:6801", "nonce": 3802140 } ] } ### Cluster 3 (v1 & v2): # ceph osd find 0 | jq -r '.addrs' { "addrvec": [ { "type": "v2", "addr": "10.32.0.36:6802", "nonce": 3167 }, { "type": "v1", "addr": "10.32.0.36:6804", "nonce": 3167 } ] } ### Cluster 4 (v1 & v2): # ceph osd find 0 | jq -r '.addrs' { "addrvec": [ { "type": "v2", "addr": "10.36.0.12:6820", "nonce": 3150 }, { "type": "v1", "addr": "10.36.0.12:6827", "nonce": 3150 } ] } All of the mon nodes have the same msgr settings of: # ceph daemon mon.$(hostname -s) config show | grep msgr "mon_warn_on_msgr2_not_enabled": "true", "ms_bind_msgr1": "true", "ms_bind_msgr2": "true", "ms_msgr2_encrypt_messages": "false", "ms_msgr2_sign_messages": "false", What could be causing this? All of the clusters are listening on port 3300 for v2 and 6789 for v1. I can even connect to port 3300 on the mon nodes from the OSD nodes. Thanks, Bryan

4 years, 5 months

2
4
0 0

jewel OSDs refuse to start up again

by Janne Johansson

Three OSDs, holding the 3 replicas of a PG here are only half-starting, and hence that single PG gets stuck as "stale+active+clean". All died of suicide timeout while walking over a huge omap (pool 7 'default.rgw.buckets.index') and would not get the PG 7.b back online again. From the logs, they try to start normally, get into a bit of leveldb things, play the journal and then say nothing more. 2019-11-19 15:15:46.967543 7fe644fad840 0 set uid:gid to 167:167 (ceph:ceph) 2019-11-19 15:15:46.967600 7fe644fad840 0 ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374), process ceph-osd, pid 5149 2019-11-19 15:15:47.026065 7fe644fad840 0 pidfile_write: ignore empty --pid-file 2019-11-19 15:15:47.078291 7fe644fad840 0 filestore(/var/lib/ceph/osd/ceph-22) backend xfs (magic 0x58465342) 2019-11-19 15:15:47.079317 7fe644fad840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-22) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2019-11-19 15:15:47.079331 7fe644fad840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-22) detect_features: SEEK_DATA/SEEK_HOLE is disabled via 'filestore seek data hole' config option 2019-11-19 15:15:47.079352 7fe644fad840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-22) detect_features: splice is supported 2019-11-19 15:15:47.080287 7fe644fad840 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-22) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2019-11-19 15:15:47.080529 7fe644fad840 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-22) detect_feature: extsize is disabled by conf 2019-11-19 15:15:47.095819 7fe644fad840 1 leveldb: Recovering log #2731809 2019-11-19 15:15:47.119792 7fe644fad840 1 leveldb: Level-0 table #2731812: started 2019-11-19 15:15:47.132107 7fe644fad840 1 leveldb: Level-0 table #2731812: 140642 bytes OK 2019-11-19 15:15:47.143782 7fe644fad840 1 leveldb: Delete type=0 #2731809 2019-11-19 15:15:47.147198 7fe644fad840 1 leveldb: Delete type=3 #2731792 2019-11-19 15:15:47.159339 7fe644fad840 0 filestore(/var/lib/ceph/osd/ceph-22) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2019-11-19 15:15:47.243262 7fe644fad840 1 journal _open /var/lib/ceph/osd/ceph-22/journal fd 18: 21472739328 bytes, block size 4096 bytes, directio = 1, aio = 1 At this point they consume a ton of cpu, systemd thinks all is fine, and this has been going on for some 5 hours. ceph -s think they are down, I can't talk to the OSDs remotely from a mon, but ceph daemon on the OSD hosts works normally, except I can't do anything from there except get conf or perf numbers. Strace shows they all keep looping over the same sequence: machine1: stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7fffd7c98080) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/var/lib/ceph/osd/ceph-270/current/7.b_head", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7fffd7c98080) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-270/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 machine2: stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7ffe0b664240) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/var/lib/ceph/osd/ceph-243/current/7.b_head", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7ffe0b664240) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-243/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 machine3: stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7ffc63518650) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 stat("/var/lib/ceph/osd/ceph-22/current/7.b_head", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B", {st_mode=S_IFDIR|0755, st_size=8192, ...}) = 0 stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4", {st_mode=S_IFDIR|0755, st_size=24576, ...}) = 0 stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4/DIR_D", 0x7ffc63518650) = -1 ENOENT (No such file or directory) stat("/var/lib/ceph/osd/ceph-22/current/7.b_head/DIR_B/DIR_4/\\.dir.31716e6b-28c9-42e6-81ed-d27e3b714a9c.47687923.1711__head_6D57DD4B__7", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 Help wanted. -- May the most significant bit of your life be positive.

4 years, 5 months

1
0
0 0

How proceed to change a crush rule and remap pg's?

by Maarten van Ingen

Hi, I have a small but impacting error in my crush rules. For unknown reasons the rules are not using host but osd to place the data and thus we have some nodes with all three copies instead of three different nodes. We noticed this when rebooting a node and a pg became stale. My crush rule: { "rule_id": 0, "rule_name": "replicated_rule", "ruleset": 0, "type": 1, "min_size": 1, "max_size": 10, "steps": [ { "op": "take", "item": -2, "item_name": "default~hdd" }, { "op": "chooseleaf_firstn", "num": 0, "type": "osd" }, { "op": "emit" } ] }, Type should be host of course. And I want to alter this and move pg's such that all is as should. How can I best proceed in correcting this issue? I do like to throttle the remapping of the data so ceph itself won't be unavailable while the data is redistributed. We are running on Mimic (13.2.6), and this environment has been installed freshly as Mimic while using ceph-ansible. Current ceph -s output: cluster: id: <<fsid> health: HEALTH_OK services: mon: 3 daemons, quorum mon01,mon02,mon03 mgr: mon01(active), standbys: mon02, mon03 mds: cephfs-2/2/2 up {0=mon03=up:active,1=mon01=up:active}, 1 up:standby osd: 502 osds: 502 up, 502 in data: pools: 18 pools, 8192 pgs objects: 28.74 M objects, 100 TiB usage: 331 TiB used, 2.3 PiB / 2.6 PiB avail pgs: 8192 active+clean Cheers, Maarten van Ingen | Systems Expert | Distributed Data Processing | SURFsara | Science Park 140 | 1098 XG Amsterdam | | T +31 (0) 20 800 1300 | maarten.vaningen(a)surfsara.nl | https://surfsara.nl | We are ISO 27001 certified and meet the high requirements for information security.

4 years, 5 months

2
2
0 0

corporate video production company in bangalore

by vhtnow11＠gmail.com

Best Video Production Company in Bangalore : We at VHTnow create visual masterpieces that engage, inspire and impact people's lives. Our services also include ad film and corporate film production in bangalore visit:https://vhtnow.com/

4 years, 5 months

1
0
0 0

Re: Ssd cache question

by Wesley Peng

4 years, 5 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users November 2019