Thanks for the replies folks.
This one was resolved, I wish I could tell you I know what I changed to fix
it, but there were several undocumented changes to the deployment script
I'm using whilst I was distracted by something else.. Tearing down and
redeploying today seems to not be suffering from this particular issue.
I do have a new thing though, less concerning. I'll start a new thread..
On Tue, 8 Jun 2021 at 12:48, Robert W. Eckert <rob(a)rob.eckert.name> wrote:
When I had issues with the monitors, it was access on
the monitor folder
under /var/lib/ceph/<guid of ceph installation>/mon.<servername>/store.db,
make sure it is owned by the ceph user.
My issues originated from a hardware issue - the memory needed 1.3 v, but
the mother board was only reading 1.2 (The memory had the issue, the
firmware said 1.2v required, the sticker on the side said 1.3). So I had a
script that copied the store across and fixed the permissions.
The other thing that helped a lot compared to the crash logs, was to edit
the unit.run and remove -rm parameter from the command. That lets you see
the podman logs using podman logs <container> it was a bit more detailed.
When you do this, you will need to restore that afterwards, and clean up
the 'cid' and 'pid' files from
/run/ceph-<guid>(a)mon.<server>.service-cid
and /run/ceph-<guid>(a)mon.<server>.service-pid
My reference is from Redhat enterprise 8, so things may be a bit different
on ubuntu.
If you get a message about the store.db files being off, its easiest to
stop the working node, copy them over , set the user id/group to ceph and
start things up.
Rob
-----Original Message-----
From: Phil Merricks <seffyroff(a)gmail.com>
Sent: Tuesday, June 8, 2021 3:18 PM
To: ceph-users <ceph-users(a)ceph.io>
Subject: [ceph-users] Mon crash when client mounts CephFS
Hey folks,
I have deployed a 3 node dev cluster using cephadm. Deployment went
smoothly and all seems well.
If I try to mount a CephFS from a client node, 2/3 mons crash however.
I've begun picking through the logs to see what I can see, but so far
other than seeing the crash in the log itself, it's unclear what the cause
of the crash is.
Here's a log. <https://termbin.com/isaz>. You can see where the crash is
occurring around the line that begins with "Jun 08 18:56:04 okcomputer
podman[790987]:"
I would welcome any advice on either what the cause may be, or how I can
advance the analysis of what's wrong.
Best regards
Phil
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io