Hi Stefan,
unfortunately It doesn't start.
The failed osd (osd.0) is located on gedaopl02
[root@gedasvl02 ~]# ceph osd tree
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config
/var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.43658 root default
-7 0.21829 host gedaopl01
2 ssd 0.21829 osd.2 up 1.00000 1.00000
-3 0 host gedaopl02
-5 0.21829 host gedaopl03
3 ssd 0.21829 osd.3 up 1.00000 1.00000
0 0 osd.0 down 0 1.00000
[root@gedaopl02 ~]# systemctl --failed
UNIT LOAD ACTIVE SUB DESCRIPTION
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)mgr.gedaopl02.pijxbm.service
loaded failed failed Ceph mgr.gedaopl02.pijxbm for
d0920c36-2368-11eb-a5de-005056b703af
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)osd.0.service loaded failed
failed Ceph osd.0 for d0920c36-2368-11eb-a5de-005056b703af
● ceph-d0920c36-2368-11eb-a5de-005056b703af(a)osd.1.service loaded failed
failed Ceph osd.1 for d0920c36-2368-11eb-a5de-005056b703af
LOAD = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB = The low-level unit activation state, values depend on unit type.
3 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.
I can start the service but then after a minute or so it fails. Maybe
I'm looking at the wrong log file, but it's empty:
[root@gedaopl02 ~]# tail -f
/var/log/ceph/d0920c36-2368-11eb-a5de-005056b703af/ceph-osd.0.log
Yesterday when I deleted the failed osd and recreated it there were lots
of message in the log file:
https://pastebin.com/5hH27pdR
Cheers,
Oliver
Am 01.12.2020 um 09:22 schrieb Stefan Kooman:
> On 2020-11-30 15:55, Oliver Weinmann wrote:
>
>> I have another error "pgs undersized", maybe this is also causing
trouble?
> This is a result of the loss of one OSD, and the PGs located on it. As
> you only have 1 OSDs left, the cluster cannot recover on a third OSD
> (assuming defaults here). The cluster will heal itself as soon as the
> third OSD will be back online.
>
> Can you start the OSD? If not, can you provide logs of the failing OSD?
>
> Gr. Stefan