[ceph-users] Re: Fwd: Re: Ceph osd will not start.

1 Jun 2021

ormandj/ceph:v16.2.4-mgrfix <-- pushed to dockerhub.

Try bootstrap with: --image "docker.io/ormandj/ceph:v16.2.4-mgrfix" if
you want to give it a shot, or you can set CEPHADM_IMAGE. We think
these should both work during any cephadm command, even if the
documentation doesn't make it clear.

On Tue, Jun 1, 2021 at 2:30 AM David Orman &lt;ormandj(a)corenode.com&gt; wrote:
>
> I do not believe it was in 16.2.4. I will build another patched version of the image
tomorrow based on that version. I do agree, I feel this breaks new deploys as well as
existing, and hope a point release will come soon that includes the fix.
>
> On May 31, 2021, at 15:33, Marco Pizzolo &lt;marcopizzolo(a)gmail.com&gt; wrote:
>
> 
> David,
>
> What I can confirm is that if this fix is already in 16.2.4 and 15.2.13, then
there's another issue resulting in the same situation, as it continues to happen in
the latest available images.
> We are going to try and see if we can install a 15.2.x release and subsequently
upgrade using a fixed image.  We were not finding a good way to bootstrap directly with a
custom image, but maybe we missed something.  cephadm bootstrap command didn't seem to
support image path.
>
> Thanks for your help thus far.  I'll update later today or tomorrow when we get
the chance to go the upgrade route.
>
> Seems tragic that when an all-stopping, immediately reproducible issue such as this
occurs, adopters are allowed to flounder for so long.  Ceph has had a tremendously
positive impact for us since we began using it in luminous/mimic, but situations such as
this are hard to look past.  It's really unfortunate as our existing production
clusters have been rock solid thus far, but this does shake one's confidence, and I
would wager that I'm not alone.
>
> Marco
>
>
>
>
>
>
> On Mon, May 31, 2021 at 3:57 PM David Orman &lt;ormandj(a)corenode.com&gt; wrote:
>>
>> Does the image we built fix the problem for you? That's how we worked
>> around it. Unfortunately, it even bites you with less OSDs if you have
>> DB/WAL on other devices, we have 24 rotational drives/OSDs, but split
>> DB/WAL onto multiple NVMEs. We're hoping the remoto fix (since it's
>> merged upstream and pushed) will land in the next point release of
>> 16.x (and it sounds like 15.x), since this is a blocking issue without
>> using patched containers. I guess testing isn't done against clusters
>> with these kinds of configurations, as we can replicate it on any of
>> our dev/test clusters with this type of drive configuration. We
>> weren't able to upgrade any clusters/deploy new hosts on any clusters,
>> so it caused quite an issue until we figured out the problem and
>> resolved it.
>>
>> If you want to build your own images, this is the simple Dockerfile we
>> used to get beyond this issue:
>>
>> $ cat Dockerfile
>> FROM docker.io/ceph/ceph:v16.2.3
>> COPY process.py /lib/python3.6/site-packages/remoto/process.py
>>
>> The process.py is the patched version we submitted here:
>>
https://github.com/alfredodeza/remoto/pull/63/commits/6f98078a1479de1f246f9…
>> (merged upstream).
>>
>> Hope this helps,
>> David
>>
>> On Mon, May 31, 2021 at 11:43 AM Marco Pizzolo &lt;marcopizzolo(a)gmail.com&gt;
wrote:
>> >
>> > Unfortunately Ceph 16.2.4 is still not working for us.  We continue to have
issues where the 26th OSD is not fully created and started.  We've confirmed that we
do get the flock as described in:
>> >
>> > https://tracker.ceph.com/issues/50526
>> >
>> > -----
>> >
>> > I have verified in our labs a way to reproduce easily the problem:
>> >
>> > 0. Please stop the cephadm orchestrator:
>> >
>> > In your bootstrap node:
>> >
>> > # cephadm shell
>> > # ceph mgr module disable cephadm
>> >
>> > 1. In one of the hosts where you want to create osds and you have a big
amount of devices:
>> >
>> > See if you have a "cephadm" filelock:
>> > for example:
>> >
>> > # lslocks | grep cephadm
>> > python3         1098782  FLOCK   0B WRITE 0     0   0
/run/cephadm/9fa2b396-adb5-11eb-a2d3-bc97e17cf960.lock
>> >
>> > if that is the case. just kill the process to start with a "clean"
situation
>> >
>> > 2. Go to the folder: /var/lib/ceph/<your_ceph_cluster_fsid>
>> >
>> > you will find there a file called "cephadm.xxxxxxxxxxxxxx".
>> >
>> > execute:
>> >
>> > # python3 cephadm.xxxxxxxxxxxxxx ceph-volume inventory
>> >
>> > 3. If the problem is present in your cephadm file, you will have the command
blocked and you will see again a cephadm filelock
>> >
>> > 4. In the case that the modification was not present. Change your
cephadm.xxxxxxxxxx file to include the modification I did (is just to remove the verbosity
parameter in the call_throws call)
>> >
>> >
https://github.com/ceph/ceph/blob/2f4dc3147712f1991242ef0d059690b5fa3d8463/…
>> >
>> > go to step 1, to clean the filelock and try again... with the modification
in place it must work.
>> >
>> > -----
>> >
>> > For us, it takes a few seconds but then the manual execution does come back,
and there are no file locks, however we remain unable to add any further OSDs.
>> >
>> > Furthermore, this is happening as part of the creation of a new Pacific
Cluster creation post bootstrap and adding one OSD daemon at a time and allowing each OSD
to be created, set in, and brought up.
>> >
>> > How is everyone else managing to get past this, or are we the only ones
(aside from David) using >25 OSDs per host?
>> >
>> > Our luck has been the same with 15.2.13 and 16.2.4, and using both Docker
and Podman on Ubuntu 20.04.2
>> >
>> > Thanks,
>> > Marco
>> >
>> >
>> >
>> > On Sun, May 30, 2021 at 7:33 AM Peter Childs &lt;pchilds(a)bcs.org&gt; wrote:
>> >>
>> >> I've actually managed to get a little further with my problem.
>> >>
>> >> As I've said before these servers are slightly distorted in config.
>> >>
>> >> 63 drives and only 48g if memory.
>> >>
>> >> Once I create about 15-20 osds it continues to format the disks but
won't actually create the containers or start any service.
>> >>
>> >> Worse than that on reboot the disks disappear, not stop working but not
detected by Linux, which makes me think I'm hitting some kernel limit.
>> >>
>> >> At this point I'm going to cut my loses and give up and use the
small slightly more powerful 30x drive systems I have (with 256g memory), maybe
transplanting the larger disks if I need more capacity.
>> >>
>> >> Peter
>> >>
>> >> On Sat, 29 May 2021, 23:19 Marco Pizzolo, &lt;marcopizzolo(a)gmail.com&gt;
wrote:
>> >>>
>> >>> Thanks David
>> >>> We will investigate the bugs as per your suggestion, and then will
look to test with the custom image.
>> >>>
>> >>> Appreciate it.
>> >>>
>> >>> On Sat, May 29, 2021, 4:11 PM David Orman
&lt;ormandj(a)corenode.com&gt; wrote:
>> >>>>
>> >>>> You may be running into the same issue we ran into (make sure to
read
>> >>>> the first issue, there's a few mingled in there), for which
we
>> >>>> submitted a patch:
>> >>>>
>> >>>> https://tracker.ceph.com/issues/50526
>> >>>> https://github.com/alfredodeza/remoto/issues/62
>> >>>>
>> >>>> If you're brave (YMMV, test first non-prod), we pushed an
image with
>> >>>> the issue we encountered fixed as per above here:
>> >>>>
https://hub.docker.com/repository/docker/ormandj/ceph/tags?page=1 . We
>> >>>> 'upgraded' to this when we encountered the mgr hanging
on us after
>> >>>> updating ceph to v16 and experiencing this issue using:
"ceph orch
>> >>>> upgrade start --image
docker.io/ormandj/ceph:v16.2.3-mgrfix". I've not
>> >>>> tried to boostrap a new cluster with a custom image, and I
don't know
>> >>>> when 16.2.4 will be released with this change (hopefully)
integrated
>> >>>> as remoto accepted the patch upstream.
>> >>>>
>> >>>> I'm not sure if this is your exact issue, see the bug
reports and see
>> >>>> if you see the lock/the behavior matches, if so - then it may
help you
>> >>>> out. The only change in that image is that patch to remoto
being
>> >>>> overlaid on the default 16.2.3 image.
>> >>>>
>> >>>> On Fri, May 28, 2021 at 1:15 PM Marco Pizzolo
&lt;marcopizzolo(a)gmail.com&gt; wrote:
>> >>>> >
>> >>>> > Peter,
>> >>>> >
>> >>>> > We're seeing the same issues as you are.  We have 2 new
hosts Intel(R)
>> >>>> > Xeon(R) Gold 6248R CPU @ 3.00GHz w/ 48 cores, 384GB RAM,
and 60x 10TB SED
>> >>>> > drives and we have tried both 15.2.13 and 16.2.4
>> >>>> >
>> >>>> > Cephadm does NOT properly deploy and activate OSDs on
Ubuntu 20.04.2 with
>> >>>> > Docker.
>> >>>> >
>> >>>> > Seems to be a bug in Cephadm and a product regression, as
we have 4 near
>> >>>> > identical nodes on Centos running Nautilus (240 x 10TB SED
drives) and had
>> >>>> > no problems.
>> >>>> >
>> >>>> > FWIW we had no luck yet with one-by-one OSD daemon
additions through ceph
>> >>>> > orch either.  We also reproduced the issue easily in a
virtual lab using
>> >>>> > small virtual disks on a single ceph VM with 1 mon.
>> >>>> >
>> >>>> > We are now looking into whether we can get past this with a
manual buildout.
>> >>>> >
>> >>>> > If you, or anyone, has hit the same stumbling block and
gotten past it, I
>> >>>> > would really appreciate some guidance.
>> >>>> >
>> >>>> > Thanks,
>> >>>> > Marco
>> >>>> >
>> >>>> > On Thu, May 27, 2021 at 2:23 PM Peter Childs
&lt;pchilds(a)bcs.org&gt; wrote:
>> >>>> >
>> >>>> > > In the end it looks like I might be able to get the
node up to about 30
>> >>>> > > odds before it stops creating any more.
>> >>>> > >
>> >>>> > > Or more it formats the disks but freezes up starting
the daemons.
>> >>>> > >
>> >>>> > > I suspect I'm missing somthing I can tune to get
it working better.
>> >>>> > >
>> >>>> > > If I could see any error messages that might help, but
I'm yet to spit
>> >>>> > > anything.
>> >>>> > >
>> >>>> > > Peter.
>> >>>> > >
>> >>>> > > On Wed, 26 May 2021, 10:57 Eugen Block,
&lt;eblock(a)nde.ag&gt; wrote:
>> >>>> > >
>> >>>> > > > > If I add the osd daemons one at a time with
>> >>>> > > > >
>> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda
>> >>>> > > > >
>> >>>> > > > > It does actually work,
>> >>>> > > >
>> >>>> > > > Great!
>> >>>> > > >
>> >>>> > > > > I suspect what's happening is when my
rule for creating osds run and
>> >>>> > > > > creates them all-at-once it ties the orch it
overloads cephadm and it
>> >>>> > > > can't
>> >>>> > > > > cope.
>> >>>> > > >
>> >>>> > > > It's possible, I guess.
>> >>>> > > >
>> >>>> > > > > I suspect what I might need to do at least
to work around the issue is
>> >>>> > > > set
>> >>>> > > > > "limit:" and bring it up until it
stops working.
>> >>>> > > >
>> >>>> > > > It's worth a try, yes, although the docs
state you should try to avoid
>> >>>> > > > it, it's possible that it doesn't work
properly, in that case create a
>> >>>> > > > bug report. ;-)
>> >>>> > > >
>> >>>> > > > > I did work out how to get ceph-volume to
nearly work manually.
>> >>>> > > > >
>> >>>> > > > > cephadm shell
>> >>>> > > > > ceph auth get client.bootstrap-osd -o
>> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
>> >>>> > > > > ceph-volume lvm create --data /dev/sda
--dmcrypt
>> >>>> > > > >
>> >>>> > > > > but given I've now got "add
osd" to work, I suspect I just need to fine
>> >>>> > > > > tune my osd creation rules, so it does not
try and create too many osds
>> >>>> > > > on
>> >>>> > > > > the same node at the same time.
>> >>>> > > >
>> >>>> > > > I agree, no need to do it manually if there is an
automated way,
>> >>>> > > > especially if you're trying to bring up
dozens of OSDs.
>> >>>> > > >
>> >>>> > > >
>> >>>> > > > Zitat von Peter Childs &lt;pchilds(a)bcs.org&gt;rg>:
>> >>>> > > >
>> >>>> > > > > After a bit of messing around. I managed to
get it somewhat working.
>> >>>> > > > >
>> >>>> > > > > If I add the osd daemons one at a time with
>> >>>> > > > >
>> >>>> > > > > ceph orch daemon add osd drywood12:/dev/sda
>> >>>> > > > >
>> >>>> > > > > It does actually work,
>> >>>> > > > >
>> >>>> > > > > I suspect what's happening is when my
rule for creating osds run and
>> >>>> > > > > creates them all-at-once it ties the orch it
overloads cephadm and it
>> >>>> > > > can't
>> >>>> > > > > cope.
>> >>>> > > > >
>> >>>> > > > > service_type: osd
>> >>>> > > > > service_name: osd.drywood-disks
>> >>>> > > > > placement:
>> >>>> > > > >   host_pattern: 'drywood*'
>> >>>> > > > > spec:
>> >>>> > > > >   data_devices:
>> >>>> > > > >     size: "7TB:"
>> >>>> > > > >   objectstore: bluestore
>> >>>> > > > >
>> >>>> > > > > I suspect what I might need to do at least
to work around the issue is
>> >>>> > > > set
>> >>>> > > > > "limit:" and bring it up until it
stops working.
>> >>>> > > > >
>> >>>> > > > > I did work out how to get ceph-volume to
nearly work manually.
>> >>>> > > > >
>> >>>> > > > > cephadm shell
>> >>>> > > > > ceph auth get client.bootstrap-osd -o
>> >>>> > > > > /var/lib/ceph/bootstrap-osd/ceph.keyring
>> >>>> > > > > ceph-volume lvm create --data /dev/sda
--dmcrypt
>> >>>> > > > >
>> >>>> > > > > but given I've now got "add
osd" to work, I suspect I just need to fine
>> >>>> > > > > tune my osd creation rules, so it does not
try and create too many osds
>> >>>> > > > on
>> >>>> > > > > the same node at the same time.
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > >
>> >>>> > > > > On Wed, 26 May 2021 at 08:25, Eugen Block
&lt;eblock(a)nde.ag&gt; wrote:
>> >>>> > > > >
>> >>>> > > > >> Hi,
>> >>>> > > > >>
>> >>>> > > > >> I believe your current issue is due to a
missing keyring for
>> >>>> > > > >> client.bootstrap-osd on the OSD node.
But even after fixing that
>> >>>> > > > >> you'll probably still won't be
able to deploy an OSD manually with
>> >>>> > > > >> ceph-volume because 'ceph-volume
activate' is not supported with
>> >>>> > > > >> cephadm [1]. I just tried that in a
virtual environment, it fails when
>> >>>> > > > >> activating the systemd-unit:
>> >>>> > > > >>
>> >>>> > > > >> ---snip---
>> >>>> > > > >> [2021-05-26
06:47:16,677][ceph_volume.process][INFO  ] Running
>> >>>> > > > >> command: /usr/bin/systemctl enable
>> >>>> > > > >>
ceph-volume@lvm-8-1a8fc8ae-8f4c-4f91-b044-d5636bb52456
>> >>>> > > > >> [2021-05-26
06:47:16,692][ceph_volume.process][INFO  ] stderr Failed
>> >>>> > > > >> to connect to bus: No such file or
directory
>> >>>> > > > >> [2021-05-26
06:47:16,693][ceph_volume.devices.lvm.create][ERROR ] lvm
>> >>>> > > > >> activate was unable to complete, while
creating the OSD
>> >>>> > > > >> Traceback (most recent call last):
>> >>>> > > > >>    File
>> >>>> > > > >>
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/create.py",
>> >>>> > > > >> line 32, in create
>> >>>> > > > >>      Activate([]).activate(args)
>> >>>> > > > >>    File
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py",
>> >>>> > > > >> line 16, in is_root
>> >>>> > > > >>      return func(*a, **kw)
>> >>>> > > > >>    File
>> >>>> > > > >>
>> >>>> > >
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
>> >>>> > > > >> line
>> >>>> > > > >> 294, in activate
>> >>>> > > > >>      activate_bluestore(lvs,
args.no_systemd)
>> >>>> > > > >>    File
>> >>>> > > > >>
>> >>>> > >
"/usr/lib/python3.6/site-packages/ceph_volume/devices/lvm/activate.py",
>> >>>> > > > >> line
>> >>>> > > > >> 214, in activate_bluestore
>> >>>> > > > >>      systemctl.enable_volume(osd_id,
osd_fsid, 'lvm')
>> >>>> > > > >>    File
>> >>>> > > > >>
"/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
>> >>>> > > > >> line 82, in enable_volume
>> >>>> > > > >>      return enable(volume_unit %
(device_type, id_, fsid))
>> >>>> > > > >>    File
>> >>>> > > > >>
"/usr/lib/python3.6/site-packages/ceph_volume/systemd/systemctl.py",
>> >>>> > > > >> line 22, in enable
>> >>>> > > > >>      process.run(['systemctl',
'enable', unit])
>> >>>> > > > >>    File
"/usr/lib/python3.6/site-packages/ceph_volume/process.py",
>> >>>> > > > >> line 153, in run
>> >>>> > > > >>      raise RuntimeError(msg)
>> >>>> > > > >> RuntimeError: command returned non-zero
exit status: 1
>> >>>> > > > >> [2021-05-26
06:47:16,694][ceph_volume.devices.lvm.create][INFO  ] will
>> >>>> > > > >> rollback OSD ID creation
>> >>>> > > > >> [2021-05-26
06:47:16,697][ceph_volume.process][INFO  ] Running
>> >>>> > > > >> command: /usr/bin/ceph --cluster ceph
--name client.bootstrap-osd
>> >>>> > > > >> --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.8
>> >>>> > > > >> --yes-i-really-mean-it
>> >>>> > > > >> [2021-05-26
06:47:17,597][ceph_volume.process][INFO  ] stderr purged
>> >>>> > > > osd.8
>> >>>> > > > >> ---snip---
>> >>>> > > > >>
>> >>>> > > > >> There's a workaround described in
[2] that's not really an option for
>> >>>> > > > >> dozens of OSDs. I think your best
approach is to bring cephadm to
>> >>>> > > > >> activate the OSDs for you.
>> >>>> > > > >> You wrote you didn't find any
helpful error messages, but did cephadm
>> >>>> > > > >> even try to deploy OSDs? What does your
osd spec file look like? Did
>> >>>> > > > >> you explicitly run 'ceph orch apply
osd -i specfile.yml'? This should
>> >>>> > > > >> trigger cephadm and you should see at
least some output like this:
>> >>>> > > > >>
>> >>>> > > > >> Mai 26 08:21:48 pacific1 conmon[31446]:
2021-05-26T06:21:48.466+0000
>> >>>> > > > >> 7effc15ff700  0 log_channel(cephadm) log
[INF] : Applying service
>> >>>> > > > >> osd.ssd-hdd-mix on host pacific2...
>> >>>> > > > >> Mai 26 08:21:49 pacific1 conmon[31009]:
cephadm
>> >>>> > > > >> 2021-05-26T06:21:48.469611+0000
mgr.pacific1.whndiw (mgr.14166) 1646 :
>> >>>> > > > >> cephadm [INF] Applying service
osd.ssd-hdd-mix on host pacific2...
>> >>>> > > > >>
>> >>>> > > > >> Regards,
>> >>>> > > > >> Eugen
>> >>>> > > > >>
>> >>>> > > > >> [1]
https://tracker.ceph.com/issues/49159
>> >>>> > > > >> [2]
https://tracker.ceph.com/issues/46691
>> >>>> > > > >>
>> >>>> > > > >>
>> >>>> > > > >> Zitat von Peter Childs
&lt;pchilds(a)bcs.org&gt;rg>:
>> >>>> > > > >>
>> >>>> > > > >> > Not sure what I'm doing wrong,
I suspect its the way I'm running
>> >>>> > > > >> > ceph-volume.
>> >>>> > > > >> >
>> >>>> > > > >> > root@drywood12:~# cephadm
ceph-volume lvm create --data /dev/sda
>> >>>> > > > >> --dmcrypt
>> >>>> > > > >> > Inferring fsid
1518c8e0-bbe4-11eb-9772-001e67dc85ea
>> >>>> > > > >> > Using recent ceph image
ceph/ceph@sha256
>> >>>> > > > >> >
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>> >>>> > > > >> > /usr/bin/docker: Running command:
/usr/bin/ceph-authtool
>> >>>> > > > --gen-print-key
>> >>>> > > > >> > /usr/bin/docker: Running command:
/usr/bin/ceph-authtool
>> >>>> > > > --gen-print-key
>> >>>> > > > >> > /usr/bin/docker: --> 
RuntimeError: No valid ceph configuration file
>> >>>> > > > was
>> >>>> > > > >> > loaded.
>> >>>> > > > >> > Traceback (most recent call last):
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 8029, in <module>
>> >>>> > > > >> >     main()
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 8017, in main
>> >>>> > > > >> >     r = ctx.func(ctx)
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 1678, in _infer_fsid
>> >>>> > > > >> >     return func(ctx)
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 1738, in _infer_image
>> >>>> > > > >> >     return func(ctx)
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 4514, in command_ceph_volume
>> >>>> > > > >> >     out, err, code =
call_throws(ctx, c.run_cmd(),
>> >>>> > > > verbosity=verbosity)
>> >>>> > > > >> >   File
"/usr/sbin/cephadm", line 1464, in call_throws
>> >>>> > > > >> >     raise RuntimeError('Failed
command: %s' % ' '.join(command))
>> >>>> > > > >> > RuntimeError: Failed command:
/usr/bin/docker run --rm --ipc=host
>> >>>> > > > >> > --net=host --entrypoint
/usr/sbin/ceph-volume --privileged
>> >>>> > > > >> --group-add=disk
>> >>>> > > > >> > --init -e
CONTAINER_IMAGE=ceph/ceph@sha256
>> >>>> > > :54e95ae1e11404157d7b329d0t
>> >>>> > > > >> >
>> >>>> > > > >> > root@drywood12:~# cephadm shell
>> >>>> > > > >> > Inferring fsid
1518c8e0-bbe4-11eb-9772-001e67dc85ea
>> >>>> > > > >> > Inferring config
>> >>>> > > > >> >
>> >>>> > > >
/var/lib/ceph/1518c8e0-bbe4-11eb-9772-001e67dc85ea/mon.drywood12/config
>> >>>> > > > >> > Using recent ceph image
ceph/ceph@sha256
>> >>>> > > > >> >
:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949
>> >>>> > > > >> > root@drywood12:/# ceph-volume lvm
create --data /dev/sda --dmcrypt
>> >>>> > > > >> > Running command:
/usr/bin/ceph-authtool --gen-print-key
>> >>>> > > > >> > Running command:
/usr/bin/ceph-authtool --gen-print-key
>> >>>> > > > >> > Running command: /usr/bin/ceph
--cluster ceph --name
>> >>>> > > > client.bootstrap-osd
>> >>>> > > > >> > --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
>> >>>> > > > >> >
70054a5c-c176-463a-a0ac-b44c5db0987c
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef405b378) no
keyring found at
>> >>>> > > > >> >
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef405ef20) no
keyring found at
>> >>>> > > > >> >
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 auth: unable
>> >>>> > > to
>> >>>> > > > >> find
>> >>>> > > > >> > a keyring on
/var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such
>> >>>> > > > file
>> >>>> > > > >> or
>> >>>> > > > >> > directory
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1
>> >>>> > > > >> > AuthRegistry(0x7fdef8f0bea0) no
keyring found at
>> >>>> > > > >> >
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef2d9d700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server
allowed_methods [2] but i only support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef259c700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server
allowed_methods [2] but i only support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef1d9b700 -1
>> >>>> > > > monclient(hunting):
>> >>>> > > > >> > handle_auth_bad_method server
allowed_methods [2] but i only support
>> >>>> > > > [1]
>> >>>> > > > >> >  stderr:
2021-05-25T07:46:18.188+0000 7fdef8f0d700 -1 monclient:
>> >>>> > > > >> > authenticate NOTE: no keyring
found; disabled cephx authentication
>> >>>> > > > >> >  stderr: [errno 13] RADOS
permission denied (error connecting to the
>> >>>> > > > >> > cluster)
>> >>>> > > > >> > -->  RuntimeError: Unable to
create a new OSD id
>> >>>> > > > >> > root@drywood12:/# lsblk /dev/sda
>> >>>> > > > >> > NAME MAJ:MIN RM  SIZE RO TYPE
MOUNTPOINT
>> >>>> > > > >> > sda    8:0    0  7.3T  0 disk
>> >>>> > > > >> >
>> >>>> > > > >> > As far as I can see cephadm gets a
little further than this as the
>> >>>> > > > disks
>> >>>> > > > >> > have lvm volumes on them just the
osd's daemons are not created or
>> >>>> > > > >> started.
>> >>>> > > > >> > So maybe I'm invoking
ceph-volume incorrectly.
>> >>>> > > > >> >
>> >>>> > > > >> >
>> >>>> > > > >> > On Tue, 25 May 2021 at 06:57, Peter
Childs &lt;pchilds(a)bcs.org&gt; wrote:
>> >>>> > > > >> >
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >> On Mon, 24 May 2021, 21:08
Marc, &lt;Marc(a)f1-outsourcing.eu&gt; wrote:
>> >>>> > > > >> >>
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>> > I'm attempting to
use cephadm and Pacific, currently on debian
>> >>>> > > > >> buster,
>> >>>> > > > >> >>> > mostly because centos7
ain't supported any more and cenotos8
>> >>>> > > ain't
>> >>>> > > > >> >>> > support
>> >>>> > > > >> >>> > by some of my
hardware.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> Who says centos7 is not
supported any more? Afaik centos7/el7 is
>> >>>> > > > being
>> >>>> > > > >> >>> supported till its EOL
2024. By then maybe a good alternative for
>> >>>> > > > >> >>> el8/stream has surfaced.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> Not supported by ceph Pacific,
it's our os of choice otherwise.
>> >>>> > > > >> >>
>> >>>> > > > >> >> My testing says the version
available of podman, docker and
>> >>>> > > python3,
>> >>>> > > > do
>> >>>> > > > >> >> not work with Pacific.
>> >>>> > > > >> >>
>> >>>> > > > >> >> Given I've needed to
upgrade docker on buster can we please have a
>> >>>> > > > list
>> >>>> > > > >> of
>> >>>> > > > >> >> versions that work with
cephadm, maybe even have cephadm say no,
>> >>>> > > > please
>> >>>> > > > >> >> upgrade unless your running the
right version or better.
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > Anyway I have a few
nodes with 59x 7.2TB disks but for some
>> >>>> > > reason
>> >>>> > > > >> the
>> >>>> > > > >> >>> > osd
>> >>>> > > > >> >>> > daemons don't
start, the disks get formatted and the osd are
>> >>>> > > > created
>> >>>> > > > >> but
>> >>>> > > > >> >>> > the daemons never come
up.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> what if you try with
>> >>>> > > > >> >>> ceph-volume lvm create
--data /dev/sdi --dmcrypt ?
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> I'll have a go.
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > They are probably the
wrong spec for ceph (48gb of memory and
>> >>>> > > > only 4
>> >>>> > > > >> >>> > cores)
>> >>>> > > > >> >>>
>> >>>> > > > >> >>> You can always start with
just configuring a few disks per node.
>> >>>> > > > That
>> >>>> > > > >> >>> should always work.
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >> That was my thought too.
>> >>>> > > > >> >>
>> >>>> > > > >> >> Thanks
>> >>>> > > > >> >>
>> >>>> > > > >> >> Peter
>> >>>> > > > >> >>
>> >>>> > > > >> >>
>> >>>> > > > >> >>> > but I was expecting
them to start and be either dirt slow or
>> >>>> > > crash
>> >>>> > > > >> >>> > later,
>> >>>> > > > >> >>> > anyway I've got
upto 30 of them, so I was hoping on getting at
>> >>>> > > > least
>> >>>> > > > >> get
>> >>>> > > > >> >>> > 6PB of raw storage out
of them.
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>> > As yet I've not
spotted any helpful error messages.
>> >>>> > > > >> >>> >
>> >>>> > > > >> >>>
_______________________________________________
>> >>>> > > > >> >>> ceph-users mailing list --
ceph-users(a)ceph.io
>> >>>> > > > >> >>> To unsubscribe send an
email to ceph-users-leave(a)ceph.io
>> >>>> > > > >> >>>
>> >>>> > > > >> >>
>> >>>> > > > >> >
_______________________________________________
>> >>>> > > > >> > ceph-users mailing list --
ceph-users(a)ceph.io
>> >>>> > > > >> > To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>> >>>> > > > >>
>> >>>> > > > >>
>> >>>> > > > >>
_______________________________________________
>> >>>> > > > >> ceph-users mailing list --
ceph-users(a)ceph.io
>> >>>> > > > >> To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>> >>>> > > > >>
>> >>>> > > >
>> >>>> > > >
>> >>>> > > > _______________________________________________
>> >>>> > > > ceph-users mailing list -- ceph-users(a)ceph.io
>> >>>> > > > To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>> >>>> > > >
>> >>>> > > _______________________________________________
>> >>>> > > ceph-users mailing list -- ceph-users(a)ceph.io
>> >>>> > > To unsubscribe send an email to
ceph-users-leave(a)ceph.io
>> >>>> > >
>> >>>> > _______________________________________________
>> >>>> > ceph-users mailing list -- ceph-users(a)ceph.io
>> >>>> > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Fwd: Re: Ceph osd will not start.