Ceph osd will not start.

List overview All Threads
Download

newer

older

Messed up placement of MDS

rebalancing after node more

Peter Childs

25 May 2021 25 May '21

1:21 a.m.

I'm attempting to get get ceph up and running, and currently feel like I'm going around in circles. I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware. Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up. They are probably the wrong spec for ceph (48gb of memory and only 4 cores) but I was expecting them to start and be either dirt slow or crash later, anyway I've got upto 30 of them, so I was hoping on getting at least get 6PB of raw storage out of them. As yet I've not spotted any helpful error messages. This is for a archive / slow ceph cluster so I'm not expecting speed. Thanks in advance. Peter.

Show replies by date

Marc

25 May 25 May

1:36 a.m.

...

I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware.

Who says centos7 is not supported any more? Afaik centos7/el7 is being supported till its EOL 2024. By then maybe a good alternative for el8/stream has surfaced.

...

Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up.

what if you try with ceph-volume lvm create --data /dev/sdi --dmcrypt ?

...

They are probably the wrong spec for ceph (48gb of memory and only 4 cores)

You can always start with just configuring a few disks per node. That should always work.

...

but I was expecting them to start and be either dirt slow or crash later, anyway I've got upto 30 of them, so I was hoping on getting at least get 6PB of raw storage out of them. As yet I've not spotted any helpful error messages.

Peter Childs

11:27 a.m.

On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote:

...

I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware.

Who says centos7 is not supported any more? Afaik centos7/el7 is being supported till its EOL 2024. By then maybe a good alternative for el8/stream has surfaced.

Not supported by ceph Pacific, it's our os of choice otherwise. My testing says the version available of podman, docker and python3, do not work with Pacific. Given I've needed to upgrade docker on buster can we please have a list of versions that work with cephadm, maybe even have cephadm say no, please upgrade unless your running the right version or better.

...

Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up.

what if you try with ceph-volume lvm create --data /dev/sdi --dmcrypt ?

I'll have a go.

...

They are probably the wrong spec for ceph (48gb of memory and only 4 cores)

You can always start with just configuring a few disks per node. That should always work.

That was my thought too. Thanks Peter

...

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Peter Childs

1:21 p.m.

Not sure what I'm doing wrong, I suspect its the way I'm running ceph-volume. root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda --dmcrypt Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: --> RuntimeError: No valid ceph configuration file was loaded. Traceback (most recent call last): File "/usr/sbin/cephadm", line 8029, in <module> main() File "/usr/sbin/cephadm", line 8017, in main r = ctx.func(ctx) File "/usr/sbin/cephadm", line 1678, in _infer_fsid return func(ctx) File "/usr/sbin/cephadm", line 1738, in _infer_image return func(ctx) File "/usr/sbin/cephadm", line 4514, in command_ceph_volume out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity) File "/usr/sbin/cephadm", line 1464, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t root@drywood12:~# cephadm shell Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Inferring config /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 70054a5c-c176-463a-a0ac-b44c5db0987c stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef405b378) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef405ef20) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file or directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef8f0bea0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id root@drywood12:/# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 7.3T 0 disk As far as I can see cephadm gets a little further than this as the disks have lvm volumes on them just the osd's daemons are not created or started. So maybe I'm invoking ceph-volume incorrectly. On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds(a)bcs.org> wrote:

...

On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote:

I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware.

Who says centos7 is not supported any more? Afaik centos7/el7 is being supported till its EOL 2024. By then maybe a good alternative for el8/stream has surfaced.

Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up.

what if you try with ceph-volume lvm create --data /dev/sdi --dmcrypt ?

I'll have a go.

They are probably the wrong spec for ceph (48gb of memory and only 4 cores)

You can always start with just configuring a few disks per node. That should always work.

That was my thought too. Thanks Peter

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Eugen Block

26 May 26 May

12:52 p.m.

Hi, I believe your current issue is due to a missing keyring for client.bootstrap-osd on the OSD node. But even after fixing that you'll probably still won't be able to deploy an OSD manually with ceph-volume because 'ceph-volume activate' is not supported with cephadm [1]. I just tried that in a virtual environment, it fails when activating the systemd-unit: ---snip--- [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running command: /usr/bin/systemctl enable ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr Failed to connect to bus: No such file or directory [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm activate was unable to complete, while creating the OSD Traceback (most recent call last): File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", line 32, in create Activate([]).activate(args) File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 294, in activate activate_bluestore(lvs, args.no_systemd) File "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", line 214, in activate_bluestore systemctl.enable_volume(osd_id, osd_fsid, 'lvm') File "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", line 82, in enable_volume return enable(volume_unit % (device_type, id_, fsid)) File "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", line 22, in enable process.run(['systemctl', 'enable', unit]) File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", line 153, in run raise RuntimeError(msg) RuntimeError: command returned non-zero exit status: 1 [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO ] will rollback OSD ID creation [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8 --yes-i-really-mean-it [2021-05-26 06:47:17,597][ceph_volume.process][INFO ] stderr purged osd.8 ---snip--- There's a workaround described in [2] that's not really an option for dozens of OSDs. I think your best approach is to bring cephadm to activate the OSDs for you. You wrote you didn't find any helpful error messages, but did cephadm even try to deploy OSDs? What does your osd spec file look like? Did you explicitly run 'ceph orch apply osd -i specfile.yml'? This should trigger cephadm and you should see at least some output like this: Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service osd.ssd-hdd-mix on host pacific2... Mai 26 08:21:49 pacific1 conmon[31009]: cephadm 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 : cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2... Regards, Eugen [1] https://tracker.ceph.com/issues/49159 [2] https://tracker.ceph.com/issues/46691 Zitat von Peter Childs <pchilds(a)bcs.org>rg>:

...

On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote:

I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware.

Who says centos7 is not supported any more? Afaik centos7/el7 is being supported till its EOL 2024. By then maybe a good alternative for el8/stream has surfaced.

Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up.

what if you try with ceph-volume lvm create --data /dev/sdi --dmcrypt ?

I'll have a go.

They are probably the wrong spec for ceph (48gb of memory and only 4 cores)

You can always start with just configuring a few disks per node. That should always work.

That was my thought too. Thanks Peter

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Peter Childs

2:06 p.m.

After a bit of messing around. I managed to get it somewhat working. If I add the osd daemons one at a time with ceph orch daemon add osd drywood12:/dev/sda It does actually work, I suspect what's happening is when my rule for creating osds run and creates them all-at-once it ties the orch it overloads cephadm and it can't cope. service_type: osd service_name: osd.drywood-disks placement: host_pattern: 'drywood*' spec: data_devices: size: "7TB:" objectstore: bluestore I suspect what I might need to do at least to work around the issue is set "limit:" and bring it up until it stops working. I did work out how to get ceph-volume to nearly work manually. cephadm shell ceph auth get client.bootstrap-osd -o /var/lib/ceph/bootstrap-osd/ceph.keyring ceph-volume lvm create --data /dev/sda --dmcrypt but given I've now got "add osd" to work, I suspect I just need to fine tune my osd creation rules, so it does not try and create too many osds on the same node at the same time. On Wed, 26 May 2021 at 08:25, Eugen Block <eblock(a)nde.ag> wrote:

...

Not sure what I'm doing wrong, I suspect its the way I'm running ceph-volume. root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda

--dmcrypt

Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key /usr/bin/docker: --> RuntimeError: No valid ceph configuration file was loaded. Traceback (most recent call last): File "/usr/sbin/cephadm", line 8029, in <module> main() File "/usr/sbin/cephadm", line 8017, in main r = ctx.func(ctx) File "/usr/sbin/cephadm", line 1678, in _infer_fsid return func(ctx) File "/usr/sbin/cephadm", line 1738, in _infer_image return func(ctx) File "/usr/sbin/cephadm", line 4514, in command_ceph_volume out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity) File "/usr/sbin/cephadm", line 1464, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/sbin/ceph-volume --privileged

--group-add=disk

--init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t root@drywood12:~# cephadm shell Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea Inferring config /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config Using recent ceph image ceph/ceph@sha256 :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 70054a5c-c176-463a-a0ac-b44c5db0987c stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to

find

a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file

directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef405b378) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to

find

a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file

directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef405ef20) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to

find

a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file

directory stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 AuthRegistry(0x7fdef8f0bea0) no keyring found at /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1] stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication stderr: [errno 13] RADOS permission denied (error connecting to the cluster) --> RuntimeError: Unable to create a new OSD id root@drywood12:/# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 7.3T 0 disk As far as I can see cephadm gets a little further than this as the disks have lvm volumes on them just the osd's daemons are not created or

started.

So maybe I'm invoking ceph-volume incorrectly. On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds(a)bcs.org> wrote: > > > On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote: > >> > >> > I'm attempting to use cephadm and Pacific, currently on debian

buster,

>> > mostly because centos7 ain't supported any more and cenotos8 ain't >> > support >> > by some of my hardware. >> >> Who says centos7 is not supported any more? Afaik centos7/el7 is being >> supported till its EOL 2024. By then maybe a good alternative for >> el8/stream has surfaced. >> > > Not supported by ceph Pacific, it's our os of choice otherwise. > > My testing says the version available of podman, docker and python3, do > not work with Pacific. > > Given I've needed to upgrade docker on buster can we please have a list

> versions that work with cephadm, maybe even have cephadm say no, please > upgrade unless your running the right version or better. > > > >> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason

the

>> > osd >> > daemons don't start, the disks get formatted and the osd are created

but

>> > the daemons never come up. >> >> what if you try with >> ceph-volume lvm create --data /dev/sdi --dmcrypt ? >> > > I'll have a go. > > >> > They are probably the wrong spec for ceph (48gb of memory and only 4 >> > cores) >> >> You can always start with just configuring a few disks per node. That >> should always work. >> > > That was my thought too. > > Thanks > > Peter > > >> > but I was expecting them to start and be either dirt slow or crash >> > later, >> > anyway I've got upto 30 of them, so I was hoping on getting at least

get

> 6PB of raw storage out of them. > > As yet I've not spotted any helpful error messages. > _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Eugen Block

3:19 p.m.

...

If I add the osd daemons one at a time with ceph orch daemon add osd drywood12:/dev/sda It does actually work,

Great!

...

I suspect what's happening is when my rule for creating osds run and creates them all-at-once it ties the orch it overloads cephadm and it can't cope.

It's possible, I guess.

...

I suspect what I might need to do at least to work around the issue is set "limit:" and bring it up until it stops working.

It's worth a try, yes, although the docs state you should try to avoid it, it's possible that it doesn't work properly, in that case create a bug report. ;-)

...

I did work out how to get ceph-volume to nearly work manually. cephadm shell ceph auth get client.bootstrap-osd -o /var/lib/ceph/bootstrap-osd/ceph.keyring ceph-volume lvm create --data /dev/sda --dmcrypt but given I've now got "add osd" to work, I suspect I just need to fine tune my osd creation rules, so it does not try and create too many osds on the same node at the same time.

I agree, no need to do it manually if there is an automated way, especially if you're trying to bring up dozens of OSDs. Zitat von Peter Childs <pchilds(a)bcs.org>rg>: > After a bit of messing around. I managed to get it somewhat working. >

...

If I add the osd daemons one at a time with ceph orch daemon add osd drywood12:/dev/sda It does actually work,

...

I suspect what's happening is when my rule for creating osds run and creates them all-at-once it ties the orch it overloads cephadm and it can't cope.

> > service_type: osd > service_name: osd.drywood-disks > placement: > host_pattern: 'drywood*' > spec: > data_devices: > size: "7TB:" > objectstore: bluestore >

...

I suspect what I might need to do at least to work around the issue is set "limit:" and bring it up until it stops working.

...

> > > > On Wed, 26 May 2021 at 08:25, Eugen Block <eblock(a)nde.ag> wrote: > >> Hi, >> >> I believe your current issue is due to a missing keyring for >> client.bootstrap-osd on the OSD node. But even after fixing that >> you'll probably still won't be able to deploy an OSD manually with >> ceph-volume because 'ceph-volume activate' is not supported with >> cephadm [1]. I just tried that in a virtual environment, it fails when >> activating the systemd-unit: >> >> ---snip--- >> [2021-05-26 06:47:16,677][ceph_volume.process][INFO ] Running >> command: /usr/bin/systemctl enable >> ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456 >> [2021-05-26 06:47:16,692][ceph_volume.process][INFO ] stderr Failed >> to connect to bus: No such file or directory >> [2021-05-26 06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm >> activate was unable to complete, while creating the OSD >> Traceback (most recent call last): >> File >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py", >> line 32, in create >> Activate([]).activate(args) >> File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", >> line 16, in is_root >> return func(*a, **kw) >> File >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", >> line >> 294, in activate >> activate_bluestore(lvs, args.no_systemd) >> File >> "/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py", >> line >> 214, in activate_bluestore >> systemctl.enable_volume(osd_id, osd_fsid, 'lvm') >> File >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", >> line 82, in enable_volume >> return enable(volume_unit % (device_type, id_, fsid)) >> File >> "/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py", >> line 22, in enable >> process.run(['systemctl', 'enable', unit]) >> File "/usr/lib/python3.6/site-packages/ceph_volume/process.py", >> line 153, in run >> raise RuntimeError(msg) >> RuntimeError: command returned non-zero exit status: 1 >> [2021-05-26 06:47:16,694][ceph_volume.devices.lvm.create][INFO ] will >> rollback OSD ID creation >> [2021-05-26 06:47:16,697][ceph_volume.process][INFO ] Running >> command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd >> --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8 >> --yes-i-really-mean-it >> [2021-05-26 06:47:17,597][ceph_volume.process][INFO ] stderr purged osd.8 >> ---snip--- >> >> There's a workaround described in [2] that's not really an option for >> dozens of OSDs. I think your best approach is to bring cephadm to >> activate the OSDs for you. >> You wrote you didn't find any helpful error messages, but did cephadm >> even try to deploy OSDs? What does your osd spec file look like? Did >> you explicitly run 'ceph orch apply osd -i specfile.yml'? This should >> trigger cephadm and you should see at least some output like this: >> >> Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000 >> 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service >> osd.ssd-hdd-mix on host pacific2... >> Mai 26 08:21:49 pacific1 conmon[31009]: cephadm >> 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 : >> cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2... >> >> Regards, >> Eugen >> >> [1] https://tracker.ceph.com/issues/49159 >> [2] https://tracker.ceph.com/issues/46691 >> >> >> Zitat von Peter Childs <pchilds(a)bcs.org>rg>: >> >> > Not sure what I'm doing wrong, I suspect its the way I'm running >> > ceph-volume. >> > >> > root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda >> --dmcrypt >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea >> > Using recent ceph image ceph/ceph@sha256 >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key >> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool --gen-print-key >> > /usr/bin/docker: --> RuntimeError: No valid ceph configuration file was >> > loaded. >> > Traceback (most recent call last): >> > File "/usr/sbin/cephadm", line 8029, in <module> >> > main() >> > File "/usr/sbin/cephadm", line 8017, in main >> > r = ctx.func(ctx) >> > File "/usr/sbin/cephadm", line 1678, in _infer_fsid >> > return func(ctx) >> > File "/usr/sbin/cephadm", line 1738, in _infer_image >> > return func(ctx) >> > File "/usr/sbin/cephadm", line 4514, in command_ceph_volume >> > out, err, code = call_throws(ctx, c.run_cmd(), verbosity=verbosity) >> > File "/usr/sbin/cephadm", line 1464, in call_throws >> > raise RuntimeError('Failed command: %s' % ' '.join(command)) >> > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host >> > --net=host --entrypoint /usr/sbin/ceph-volume --privileged >> --group-add=disk >> > --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t >> > >> > root@drywood12:~# cephadm shell >> > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea >> > Inferring config >> > /var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config >> > Using recent ceph image ceph/ceph@sha256 >> > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 >> > root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt >> > Running command: /usr/bin/ceph-authtool --gen-print-key >> > Running command: /usr/bin/ceph-authtool --gen-print-key >> > Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd >> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new >> > 70054a5c-c176-463a-a0ac-b44c5db0987c >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to >> find >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file >> or >> > directory >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> > AuthRegistry(0x7fdef405b378) no keyring found at >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to >> find >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file >> or >> > directory >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> > AuthRegistry(0x7fdef405ef20) no keyring found at >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to >> find >> > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such file >> or >> > directory >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 >> > AuthRegistry(0x7fdef8f0bea0) no keyring found at >> > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1 monclient(hunting): >> > handle_auth_bad_method server allowed_methods [2] but i only support [1] >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1 monclient(hunting): >> > handle_auth_bad_method server allowed_methods [2] but i only support [1] >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1 monclient(hunting): >> > handle_auth_bad_method server allowed_methods [2] but i only support [1] >> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: >> > authenticate NOTE: no keyring found; disabled cephx authentication >> > stderr: [errno 13] RADOS permission denied (error connecting to the >> > cluster) >> > --> RuntimeError: Unable to create a new OSD id >> > root@drywood12:/# lsblk /dev/sda >> > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT >> > sda 8:0 0 7.3T 0 disk >> > >> > As far as I can see cephadm gets a little further than this as the disks >> > have lvm volumes on them just the osd's daemons are not created or >> started. >> > So maybe I'm invoking ceph-volume incorrectly. >> > >> > >> > On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds(a)bcs.org> wrote: >> > >> >> >> >> >> >> On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote: >> >> >> >>> > >> >>> > I'm attempting to use cephadm and Pacific, currently on debian >> buster, >> >>> > mostly because centos7 ain't supported any more and cenotos8 ain't >> >>> > support >> >>> > by some of my hardware. >> >>> >> >>> Who says centos7 is not supported any more? Afaik centos7/el7 is being >> >>> supported till its EOL 2024. By then maybe a good alternative for >> >>> el8/stream has surfaced. >> >>> >> >> >> >> Not supported by ceph Pacific, it's our os of choice otherwise. >> >> >> >> My testing says the version available of podman, docker and python3, do >> >> not work with Pacific. >> >> >> >> Given I've needed to upgrade docker on buster can we please have a list >> of >> >> versions that work with cephadm, maybe even have cephadm say no, please >> >> upgrade unless your running the right version or better. >> >> >> >> >> >> >> >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason >> the >> >>> > osd >> >>> > daemons don't start, the disks get formatted and the osd are created >> but >> >>> > the daemons never come up. >> >>> >> >>> what if you try with >> >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ? >> >>> >> >> >> >> I'll have a go. >> >> >> >> >> >>> > They are probably the wrong spec for ceph (48gb of memory and only 4 >> >>> > cores) >> >>> >> >>> You can always start with just configuring a few disks per node. That >> >>> should always work. >> >>> >> >> >> >> That was my thought too. >> >> >> >> Thanks >> >> >> >> Peter >> >> >> >> >> >>> > but I was expecting them to start and be either dirt slow or crash >> >>> > later, >> >>> > anyway I've got upto 30 of them, so I was hoping on getting at least >> get >> >>> > 6PB of raw storage out of them. >> >>> > >> >>> > As yet I've not spotted any helpful error messages. >> >>> > >> >>> _______________________________________________ >> >>> ceph-users mailing list -- ceph-users(a)ceph.io >> >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >>> >> >> >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users(a)ceph.io >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io >> >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>

Peter Childs

27 May 27 May

11:52 p.m.

In the end it looks like I might be able to get the node up to about 30 odds before it stops creating any more. Or more it formats the disks but freezes up starting the daemons. I suspect I'm missing somthing I can tune to get it working better. If I could see any error messages that might help, but I'm yet to spit anything. Peter. On Wed, 26 May 2021, 10:57 Eugen Block, <eblock(a)nde.ag> wrote:

...

If I add the osd daemons one at a time with ceph orch daemon add osd drywood12:/dev/sda It does actually work,

Great!

I suspect what's happening is when my rule for creating osds run and creates them all-at-once it ties the orch it overloads cephadm and it

can't

cope.

It's possible, I guess.

I suspect what I might need to do at least to work around the issue is

set

"limit:" and bring it up until it stops working.

It's worth a try, yes, although the docs state you should try to avoid it, it's possible that it doesn't work properly, in that case create a bug report. ;-)

the same node at the same time.

If I add the osd daemons one at a time with ceph orch daemon add osd drywood12:/dev/sda It does actually work,

I suspect what's happening is when my rule for creating osds run and creates them all-at-once it ties the orch it overloads cephadm and it

can't

cope.

> > service_type: osd > service_name: osd.drywood-disks > placement: > host_pattern: 'drywood*' > spec: > data_devices: > size: "7TB:" > objectstore: bluestore >

I suspect what I might need to do at least to work around the issue is

set

"limit:" and bring it up until it stops working.

the same node at the same time.

> ---snip--- > > There's a workaround described in [2] that's not really an option for > dozens of OSDs. I think your best approach is to bring cephadm to > activate the OSDs for you. > You wrote you didn't find any helpful error messages, but did cephadm > even try to deploy OSDs? What does your osd spec file look like? Did > you explicitly run 'ceph orch apply osd -i specfile.yml'? This should > trigger cephadm and you should see at least some output like this: > > Mai 26 08:21:48 pacific1 conmon[31446]: 2021-05-26T06:21:48.466+0000 > 7effc15ff700 0 log_channel(cephadm) log [INF] : Applying service > osd.ssd-hdd-mix on host pacific2... > Mai 26 08:21:49 pacific1 conmon[31009]: cephadm > 2021-05-26T06:21:48.469611+0000 mgr.pacific1.whndiw (mgr.14166) 1646 : > cephadm [INF] Applying service osd.ssd-hdd-mix on host pacific2... > > Regards, > Eugen > > [1] https://tracker.ceph.com/issues/49159 > [2] https://tracker.ceph.com/issues/46691 > > > Zitat von Peter Childs <pchilds(a)bcs.org>rg>: > > > Not sure what I'm doing wrong, I suspect its the way I'm running > > ceph-volume. > > > > root@drywood12:~# cephadm ceph-volume lvm create --data /dev/sda > --dmcrypt > > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > > Using recent ceph image ceph/ceph@sha256 > > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > > /usr/bin/docker: Running command: /usr/bin/ceph-authtool

--gen-print-key

> > /usr/bin/docker: Running command: /usr/bin/ceph-authtool

--gen-print-key

> > /usr/bin/docker: --> RuntimeError: No valid ceph configuration file

was

> > loaded. > > Traceback (most recent call last): > > File "/usr/sbin/cephadm", line 8029, in <module> > > main() > > File "/usr/sbin/cephadm", line 8017, in main > > r = ctx.func(ctx) > > File "/usr/sbin/cephadm", line 1678, in _infer_fsid > > return func(ctx) > > File "/usr/sbin/cephadm", line 1738, in _infer_image > > return func(ctx) > > File "/usr/sbin/cephadm", line 4514, in command_ceph_volume > > out, err, code = call_throws(ctx, c.run_cmd(),

verbosity=verbosity)

> > File "/usr/sbin/cephadm", line 1464, in call_throws > > raise RuntimeError('Failed command: %s' % ' '.join(command)) > > RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host > > --net=host --entrypoint /usr/sbin/ceph-volume --privileged > --group-add=disk > > --init -e CONTAINER_IMAGE=ceph/ceph@sha256:54e95ae1e11404157d7b329d0t > > > > root@drywood12:~# cephadm shell > > Inferring fsid 1518c8e0-bbe4-11eb-9772-001e67dc85ea > > Inferring config > >

/var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config

> > Using recent ceph image ceph/ceph@sha256 > > :54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949 > > root@drywood12:/# ceph-volume lvm create --data /dev/sda --dmcrypt > > Running command: /usr/bin/ceph-authtool --gen-print-key > > Running command: /usr/bin/ceph-authtool --gen-print-key > > Running command: /usr/bin/ceph --cluster ceph --name

client.bootstrap-osd

> > --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new > > 70054a5c-c176-463a-a0ac-b44c5db0987c > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > find > > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such

file

> or > > directory > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > AuthRegistry(0x7fdef405b378) no keyring found at > > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > find > > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such

file

> or > > directory > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > AuthRegistry(0x7fdef405ef20) no keyring found at > > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable to > find > > a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such

file

> or > > directory > > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 > > AuthRegistry(0x7fdef8f0bea0) no keyring found at > > /var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx > > stderr: 2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1

monclient(hunting):

> > handle_auth_bad_method server allowed_methods [2] but i only support

[1]

> > stderr: 2021-05-25T07:46:18.188+0000 7fdef259c700 -1

monclient(hunting):

> > handle_auth_bad_method server allowed_methods [2] but i only support

[1]

> > stderr: 2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1

monclient(hunting):

> > handle_auth_bad_method server allowed_methods [2] but i only support

[1]

> > stderr: 2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient: > > authenticate NOTE: no keyring found; disabled cephx authentication > > stderr: [errno 13] RADOS permission denied (error connecting to the > > cluster) > > --> RuntimeError: Unable to create a new OSD id > > root@drywood12:/# lsblk /dev/sda > > NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT > > sda 8:0 0 7.3T 0 disk > > > > As far as I can see cephadm gets a little further than this as the

disks

> > have lvm volumes on them just the osd's daemons are not created or > started. > > So maybe I'm invoking ceph-volume incorrectly. > > > > > > On Tue, 25 May 2021 at 06:57, Peter Childs <pchilds(a)bcs.org> wrote: > > > >> > >> > >> On Mon, 24 May 2021, 21:08 Marc, <Marc(a)f1-outsourcing.eu> wrote: > >> > >>> > > >>> > I'm attempting to use cephadm and Pacific, currently on debian > buster, > >>> > mostly because centos7 ain't supported any more and cenotos8 ain't > >>> > support > >>> > by some of my hardware. > >>> > >>> Who says centos7 is not supported any more? Afaik centos7/el7 is

being

> >>> supported till its EOL 2024. By then maybe a good alternative for > >>> el8/stream has surfaced. > >>> > >> > >> Not supported by ceph Pacific, it's our os of choice otherwise. > >> > >> My testing says the version available of podman, docker and python3,

> >> not work with Pacific. > >> > >> Given I've needed to upgrade docker on buster can we please have a

list

> of > >> versions that work with cephadm, maybe even have cephadm say no,

please

> >> upgrade unless your running the right version or better. > >> > >> > >> > >>> > Anyway I have a few nodes with 59x 7.2TB disks but for some reason > the > >>> > osd > >>> > daemons don't start, the disks get formatted and the osd are

created

> but > >>> > the daemons never come up. > >>> > >>> what if you try with > >>> ceph-volume lvm create --data /dev/sdi --dmcrypt ? > >>> > >> > >> I'll have a go. > >> > >> > >>> > They are probably the wrong spec for ceph (48gb of memory and

only 4

> >>> > cores) > >>> > >>> You can always start with just configuring a few disks per node.

That

> >>> should always work. > >>> > >> > >> That was my thought too. > >> > >> Thanks > >> > >> Peter > >> > >> > >>> > but I was expecting them to start and be either dirt slow or crash > >>> > later, > >>> > anyway I've got upto 30 of them, so I was hoping on getting at

least

> get > >>> > 6PB of raw storage out of them. > >>> > > >>> > As yet I've not spotted any helpful error messages. > >>> > > >>> _______________________________________________ > >>> ceph-users mailing list -- ceph-users(a)ceph.io > >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >>> > >> > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io > > To unsubscribe send an email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

胡玮文

25 May 25 May

8:28 a.m.

...

在 2021年5月25日，03:52，Peter Childs <pchilds(a)bcs.org> 写道： I'm attempting to get get ceph up and running, and currently feel like I'm going around in circles. I'm attempting to use cephadm and Pacific, currently on debian buster, mostly because centos7 ain't supported any more and cenotos8 ain't support by some of my hardware. Anyway I have a few nodes with 59x 7.2TB disks but for some reason the osd daemons don't start, the disks get formatted and the osd are created but the daemons never come up. They are probably the wrong spec for ceph (48gb of memory and only 4 cores) but I was expecting them to start and be either dirt slow or crash later, anyway I've got upto 30 of them, so I was hoping on getting at least get 6PB of raw storage out of them. As yet I've not spotted any helpful error messages.

You can check “journalctl” to see if there are some useful error messages.

...

This is for a archive / slow ceph cluster so I'm not expecting speed. Thanks in advance. Peter. _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1065

days inactive

1068

days old

ceph-users@ceph.io

Manage subscription

8 comments

4 participants

tags (0)

participants (4)

Eugen Block
Marc
Peter Childs
胡玮文