[ceph-users] Re: osd out cant' bring it back online

1 Dec 2020

Hi Stefan,

unfortunately It doesn't start.

The failed osd (osd.0) is located on gedaopl02

[root@gedasvl02 ~]# ceph osd tree
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config 
/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ID  CLASS  WEIGHT   TYPE NAME           STATUS  REWEIGHT  PRI-AFF
-1         0.43658  root default
-7         0.21829      host gedaopl01
  2    ssd  0.21829          osd.2           up   1.00000  1.00000
-3               0      host gedaopl02
-5         0.21829      host gedaopl03
  3    ssd  0.21829          osd.3           up   1.00000  1.00000
  0               0  osd.0                 down         0  1.00000

[root@gedaopl02 ~]# systemctl --failed
UNIT LOAD   ACTIVE SUB    DESCRIPTION
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)mgr.gedaopl02.pijxbm.service 
loaded failed failed Ceph mgr.gedaopl02.pijxbm for 
d0920c36-2368-11eb-a5de-005056b703af
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)osd.0.service loaded failed 
failed Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)osd.1.service loaded failed 
failed Ceph osd.1 for d0920c36-2368-11eb-a5de-005056b703af

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.

3 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

I can start the service but then after a minute or so it fails. Maybe 
I'm looking at the wrong log file, but it's empty:

[root@gedaopl02 ~]# tail -f 
/var/log/ceph/d0920c36-2368-11eb-a5de-005056b703af/ceph-osd.0.log

Yesterday when I deleted the failed osd and recreated it there were lots 
of message in the log file:

https://pastebin.com/5hH27pdR

Cheers,

Oliver

Am 01.12.2020 um 09:22 schrieb Stefan Kooman:
> On 2020-11-30 15:55, Oliver Weinmann wrote:
>
>> I have another error "pgs undersized", maybe this is also causing
trouble?
> This is a result of the loss of one OSD, and the PGs located on it. As
> you only have 1 OSDs left, the cluster cannot recover on a third OSD
> (assuming defaults here). The cluster will heal itself as soon as the
> third OSD will be back online.
>
> Can you start the OSD? If not, can you provide logs of the failing OSD?
>
> Gr. Stefan

2024

2023

2022

2021

2020

2019

[ceph-users] Re: osd out cant' bring it back online