New subject: Ceph osd will not start.

28 May 2021

Peter,

We're seeing the same issues as you are.  We have 2 new hosts Intel(R)
Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM, and 60x 10TB SED
drives and we have tried both 15.2.13 and 16.2.4

Cephadm does NOT properly deploy and activate OSDs on Ubuntu 20.04.2 with
Docker.

Seems to be a bug in Cephadm and a product regression, as we have 4 near
identical nodes on Centos running Nautilus (240 x 10TB SED drives) and had
no problems.

FWIW we had no luck yet with one-by-one OSD daemon additions through ceph
orch either.  We also reproduced the issue easily in a virtual lab using
small virtual disks on a single ceph VM with 1 mon.

We are now looking into whether we can get past this with a manual buildout.

If you, or anyone, has hit the same stumbling block and gotten past it, I
would really appreciate some guidance.

Thanks,
Marco

On Thu, May 27, 2021 at 2:23 PM Peter Childs &lt;pchilds(a)bcs.org&gt; wrote:

...
  In the end it looks like I might be able to get the
node up to about 30
 odds before it stops creating any more.

 Or more it formats the disks but freezes up starting the daemons.

 I suspect I'm missing somthing I can tune to get it working better.

 If I could see any error messages that might help, but I'm yet to spit
 anything.

 Peter.

 On Wed, 26 May 2021, 10:57 Eugen Block, &lt;eblock(a)nde.ag&gt; wrote:

   If I add
the osd daemons one at a time with

 ceph orch daemon add osd drywood12:/dev/sda

 It does actually work, 
 Great!

  I suspect what's happening is when my rule
for creating osds run and
 creates them all-at-once it ties the orch it overloads cephadm and it  can't
  cope. 
 It's possible, I guess.

  I suspect what I might need to do at least to
work around the issue is  set
  "limit:" and bring it up until it stops
working. 
 It's worth a try, yes, although the docs state you should try to avoid
 it, it's possible that it doesn't work properly, in that case create a
 bug report. ;-)

  I did work out how to get ceph-volume to nearly
work manually.

 cephadm shell
 ceph auth get client.bootstrap-osd -o
 /var/lib/ceph/bootstrap-osd/ceph.keyring
 ceph-volume lvm create --data /dev/sda --dmcrypt

 but given I've now got "add osd" to work, I suspect I just need to fine
 tune my osd creation rules, so it does not try and create too many osds  on
  the same node at the same time. 
 I agree, no need to do it manually if there is an automated way,
 especially if you're trying to bring up dozens of OSDs.

 Zitat von Peter Childs &lt;pchilds(a)bcs.org&gt;rg>:

 > After a bit of messing around. I managed to get it somewhat working.
 >
  If I add the osd daemons one at a time with

 ceph orch daemon add osd drywood12:/dev/sda

 It does actually work,  >
  I suspect what's happening is when my rule
for creating osds run and
 creates them all-at-once it ties the orch it overloads cephadm and it  can't
  cope.  >
 > service_type: osd
 > service_name: osd.drywood-disks
 > placement:
 >   host_pattern: 'drywood*'
 > spec:
 >   data_devices:
 >     size: "7TB:"
 >   objectstore: bluestore
 >
  I suspect what I might need to do at least to
work around the issue is  set
  "limit:" and bring it up until it stops
working.  >
  I did work out how to get ceph-volume to nearly
work manually.

 cephadm shell
 ceph auth get client.bootstrap-osd -o
 /var/lib/ceph/bootstrap-osd/ceph.keyring
 ceph-volume lvm create --data /dev/sda --dmcrypt

 but given I've now got "add osd" to work, I suspect I just need to fine
 tune my osd creation rules, so it does not try and create too many osds  on
  the same node at the same time.  >
 >
 >
 > On Wed, 26 May 2021 at 08:25, Eugen Block &lt;eblock(a)nde.ag&gt; wrote:
 >
 >> Hi,
 >>
 >> I believe your current issue is due to a missing keyring for
 >> client.bootstrap-osd on the OSD node. But even after fixing that
 >> you'll probably still won't be able to deploy an OSD manually with
 >> ceph-volume because 'ceph-volume activate' is not supported with
 >> cephadm [1]. I just tried that in a virtual environment, it fails when
 >> activating the systemd-unit:
 >>
 >> ---snip---
 >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO  ] Running
 >> command: /usr/bin/systemctl enable
 >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456
 >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO  ] stderr Failed
 >> to connect to bus: No such file or directory
 >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm
 >> activate was unable to complete, while creating the OSD
 >> Traceback (most recent call last):
 >>    File
 >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
 >> line 32, in create
 >>      Activate([]).activate(args)
 >>    File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py",
 >> line 16, in is_root
 >>      return func(*a, **kw)
 >>    File
 >> 
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
  >> line
 >> 294, in activate
 >>      activate_bluestore(lvs, args.no_systemd)
 >>    File
 >> 
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
  > line
> 214, in activate_bluestore
>      systemctl.enable_volume(osd_id, osd_fsid, 'lvm')
>    File
> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
> line 82, in enable_volume
>      return enable(volume_unit % (device_type, id_, fsid))
>    File
> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
> line 22, in enable
>      process.run(['systemctl', 'enable', unit])
>    File "/usr/lib/python3.6/site-packages/ceph_volume/process.py",
> line 153, in run
>      raise RuntimeError(msg)
> RuntimeError: command returned non-zero exit status: 1
> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO  ] will
> rollback OSD ID creation
> [2021-05-26 06:47:16,697][ceph_volume.process][INFO  ] Running
> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8
> --yes-i-really-mean-it
> [2021-05-26 06:47:17,597][ceph_volume.process][INFO  ] stderr purged  osd.8
 > ---snip---
>
> There's a workaround described in [2] that's not really an option for
> dozens of OSDs. I think your best approach is to bring cephadm to
> activate the OSDs for you.
> You wrote you didn't find any helpful error messages, but did cephadm
> even try to deploy OSDs? What does your osd spec file look like? Did
> you explicitly run 'ceph orch apply osd -i specfile.yml'? This should
> trigger cephadm and you should see at least some output like this:
>
> Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000
> 7effc15ff700  0 log_channel(cephadm) log [INF] : Applying service
> osd.ssd-hdd-mix on host pacific2...
> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm
> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 :
> cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2...
>
> Regards,
> Eugen
>
> [1] https://tracker.ceph.com/issues/49159
> [2] https://tracker.ceph.com/issues/46691
>
>
> Zitat von Peter Childs &lt;pchilds(a)bcs.org&gt;rg>:
>
> > Not sure what I'm doing wrong, I suspect its the way I'm running
> > ceph-volume.
> >
> > root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda
> --dmcrypt
> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
> > Using recent ceph image ceph/ceph@sha256
> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool  --gen-print-key
 > > /usr/bin/docker: Running command:
/usr/bin/ceph-authtool  --gen-print-key
 > > /usr/bin/docker: -->  RuntimeError:
No valid ceph configuration file  was
 > > loaded.
> > Traceback (most recent call last):
> >   File "/usr/sbin/cephadm", line 8029, in <module>
> >     main()
> >   File "/usr/sbin/cephadm", line 8017, in main
> >     r = ctx.func(ctx)
> >   File "/usr/sbin/cephadm", line 1678, in _infer_fsid
> >     return func(ctx)
> >   File "/usr/sbin/cephadm", line 1738, in _infer_image
> >     return func(ctx)
> >   File "/usr/sbin/cephadm", line 4514, in command_ceph_volume
> >     out, err, code = call_throws(ctx, c.run_cmd(),  verbosity=verbosity)
 >> >   File "/usr/sbin/cephadm", line 1464, in call_throws
 >> >     raise RuntimeError('Failed command: %s' % '
'.join(command))
 >> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host
 >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged
 >> --group-add=disk
 >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256 
:54e95ae1e11404157d7b329d0t
  > >
> > root@drywood12:~# cephadm shell
> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea
> > Inferring config
> >  /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
 > > Using recent ceph image
ceph/ceph@sha256
> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
> > root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt
> > Running command: /usr/bin/ceph-authtool --gen-print-key
> > Running command: /usr/bin/ceph-authtool --gen-print-key
> > Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd
 >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
 >> > 70054a5c-c176-463a-a0ac-b44c5db0987c
 >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable 
to
  > find
> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such  file
 >> or
 >> > directory
 >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
 >> > AuthRegistry(0x7fdef405b378) no keyring found at
 >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable 
to
  > find
> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such  file
 >> or
 >> > directory
 >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
 >> > AuthRegistry(0x7fdef405ef20) no keyring found at
 >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 >> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable 
to
  > find
> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such  file
 > or
> > directory
> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
> > AuthRegistry(0x7fdef8f0bea0) no keyring found at
> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
> >  stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 
monclient(hunting):
 > > handle_auth_bad_method server
allowed_methods [2] but i only support  [1]
 > >  stderr: 2021-05-25T07:46:18.188+0000
7fdef259c700 -1  monclient(hunting):
 > > handle_auth_bad_method server
allowed_methods [2] but i only support  [1]
 > >  stderr: 2021-05-25T07:46:18.188+0000
7fdef1d9b700 -1  monclient(hunting):
 > > handle_auth_bad_method server
allowed_methods [2] but i only support  [1]
 > >  stderr: 2021-05-25T07:46:18.188+0000
7fdef8f0d700 -1 monclient:
> > authenticate NOTE: no keyring found; disabled cephx authentication
> >  stderr: [errno 13] RADOS permission denied (error connecting to the
> > cluster)
> > -->  RuntimeError: Unable to create a new OSD id
> > root@drywood12:/# lsblk /dev/sda
> > NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
> > sda    8:0    0  7.3T  0 disk
> >
> > As far as I can see cephadm gets a little further than this as the  disks
 >> > have lvm volumes on them just the osd's daemons are not created or
 >> started.
 >> > So maybe I'm invoking ceph-volume incorrectly.
 >> >
 >> >
 >> > On Tue, 25 May 2021 at 06:57, Peter Childs &lt;pchilds(a)bcs.org&gt; wrote:
 >> >
 >> >>
 >> >>
 >> >> On Mon, 24 May 2021, 21:08 Marc, &lt;Marc(a)f1-outsourcing.eu&gt; wrote:
 >> >>
 >> >>> >
 >> >>> > I'm attempting to use cephadm and Pacific, currently on
debian
 >> buster,
 >> >>> > mostly because centos7 ain't supported any more and
cenotos8  ain't
  >
>>> > support
> >>> > by some of my hardware.
> >>>
> >>> Who says centos7 is not supported any more? Afaik centos7/el7 is 
being
 >> >>> supported till its EOL 2024. By then maybe a good alternative for
 >> >>> el8/stream has surfaced.
 >> >>>
 >> >>
 >> >> Not supported by ceph Pacific, it's our os of choice otherwise.
 >> >>
 >> >> My testing says the version available of podman, docker and 
python3,
  do
 > >> not work with Pacific.
> >>
> >> Given I've needed to upgrade docker on buster can we please have a
 list
 > of
> >> versions that work with cephadm, maybe even have cephadm say no, 
please
 >> >> upgrade unless your running the right version or better.
 >> >>
 >> >>
 >> >>
 >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some
 reason
  > the
> >>> > osd
> >>> > daemons don't start, the disks get formatted and the osd are
 created
 > but
> >>> > the daemons never come up.
> >>>
> >>> what if you try with
> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ?
> >>>
> >>
> >> I'll have a go.
> >>
> >>
> >>> > They are probably the wrong spec for ceph (48gb of memory and
 only 4
 > >>> > cores)
> >>>
> >>> You can always start with just configuring a few disks per node. 
That
 >> >>> should always work.
 >> >>>
 >> >>
 >> >> That was my thought too.
 >> >>
 >> >> Thanks
 >> >>
 >> >> Peter
 >> >>
 >> >>
 >> >>> > but I was expecting them to start and be either dirt slow or
 crash
  >
>>> > later,
> >>> > anyway I've got upto 30 of them, so I was hoping on getting at
 least
 > get
> >>> > 6PB of raw storage out of them.
> >>> >
> >>> > As yet I've not spotted any helpful error messages.
> >>> >
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users(a)ceph.io
> >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>>
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> 

 _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io
  _______________________________________________
 ceph-users mailing list -- ceph-users(a)ceph.io
 To unsubscribe send an email to ceph-users-leave(a)ceph.io

Fwd: Re: Ceph osd will not start.