Looks like the journal SSD is broken. If it's still readable but not
writable, then you can run
ceph-osd --id ... --flush-journal
and replace the disk after doing so.
You can then just point the sym links in
/var/lib/ceph/osd/ceph-*/journal to the new journal and run
ceph-osd --id ... --mkjournal
If the journal is no longer readable: the safe variant is to
completely re-create the OSDs after replacing the journal disk. (The
unsafe way to go is to just skip the --flush-journal part, not
recommended)
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
On Mon, Sep 30, 2019 at 3:51 AM 展荣臻(信泰) <zhanrzh_xt(a)teamsun.com.cn> wrote:
>
>
>
>
>
> > > Hi,all
> > > we use openstack + ceph(hammer) in my production
> >
> > Hammer is soooooo 2015.
> >
> > > There are 22 osds on a host and 11 osds share one ssd for osd journal.
> >
> > I can’t imagine a scenario in which this strategy makes sense, the documentation
and books are quite clear on why this is a bad idea. Assuming that your OSDs are HDD and
the journal devices are SATA SSD, the journals are going to be a bottleneck, and you’re
going to wear through them quickly. If you have a read-mostly workload, colocating them
would be safer.
>
> Oh, i am wrong,we use sas ssd.
>
> > I also suspect that something is amiss with your CRUSH topology that is
preventing recovery, and/or you actually have multiple overlapping failures.
> >
>
> My crushmap is at
https://github.com/rongzhen-zhan/myfile/blob/master/crushmap
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io