The OpenFileTable objects are safe to delete while the MDS is offline
anyways, the RADOS object names are mds*_openfiles*
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
croit GmbH
Freseniusstr. 31h
81247 München
Tel: +49 89 1896585 90
On Fri, May 1, 2020 at 9:04 PM Marco Pizzolo <marcopizzolo(a)gmail.com> wrote:
Also seeing errors such as this:
[2020-05-01 13:15:20,970][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:20,970][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:20,974][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:20,989][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:20,989][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:20,998][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:21,014][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:21,014][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:21,019][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.9 with osd_fsid
32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:21,035][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:21,035][systemd][WARNING] failed activating OSD, retries
left: 11
[2020-05-01 13:15:25,972][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:25,994][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:26,020][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:26,040][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:26,388][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.1 with osd_fsid
0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:26,389][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:26,391][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:26,402][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,403][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,403][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,404][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,404][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,405][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:26,411][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.9 with osd_fsid
32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:26,424][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:26,424][systemd][WARNING] failed activating OSD, retries
left: 10
[2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 1-0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:31,408][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 5-4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:31,409][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 13-dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:31,429][ceph_volume.process][INFO ] Running command:
/usr/sbin/ceph-volume lvm trigger 9-32f4a716-f26e-4579-a074-5d6452c22e34
[2020-05-01 13:15:31,743][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.5 with osd_fsid
4eaf2baa-60f2-4045-8964-6152608c742a
[2020-05-01 13:15:31,750][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.13 with osd_fsid
dd49cd80-418e-4a8c-8ebf-a33d339663ff
[2020-05-01 13:15:31,752][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,752][systemd][WARNING] failed activating OSD, retries
left: 9
[2020-05-01 13:15:31,754][ceph_volume.process][INFO ] stderr -->
RuntimeError: could not find osd.1 with osd_fsid
0f0e6dd7-9dd8-4b48-beaa-084f55f73b32
[2020-05-01 13:15:31,761][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,762][systemd][WARNING] failed activating OSD, retries
left: 9
[2020-05-01 13:15:31,764][systemd][WARNING] command returned non-zero exit
status: 1
[2020-05-01 13:15:31,765][systemd][WARNING] failed activating OSD, retries
left: 9
On Fri, May 1, 2020 at 2:23 PM Marco Pizzolo <marcopizzolo(a)gmail.com>
wrote:
Hi Ashley,
Thanks for your response. Nothing that I can think of would have
happened. We are using max_mds =1. We do have 4 so used to have 3
standby. Within minutes they all crash.
On Fri, May 1, 2020 at 2:21 PM Ashley Merrick <singapore(a)amerrick.co.uk>
wrote:
> Quickly checking the code that calls that assert
>
>
>
>
> if (version > omap_version) {
>
> omap_version = version;
>
> omap_num_objs = num_objs;
>
> omap_num_items.resize(omap_num_objs);
>
> journal_state = jstate;
>
> } else if (version == omap_version) {
>
> ceph_assert(omap_num_objs == num_objs);
>
> if (jstate > journal_state)
>
> journal_state = jstate;
>
> }
>
> }
>
>
> Im not a dev, but not sure if this will help, seems could mean that MDS
> thinks its behind on omaps/too far ahead.
>
>
> Anything happened recently? Just running a single MDS?
>
>
> Hopefully someone else may see this and shine some light on what could
be
> causing it.
>
>
>
> ---- On Sat, 02 May 2020 02:10:58 +0800 marcopizzolo(a)gmail.com wrote
----
>
>
> Hello,
>
> Hoping you can help me.
>
> Ceph had been largely problem free for us for the better part of a year.
> We have a high file count in a single CephFS filesystem, and are seeing
> this error in the logs:
>
>
>
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-14.2.9/src/mds/OpenFileTable.cc:
> 777: FAILED ceph_assert(omap_num_objs ==
num_objs)
>
> The issued seemed to occur this morning, and restarting the MDS as well
as
rebooting
the servers doesn't correct the problem.
Not really sure where to look next as the MDS daemons crash.
Appreciate any help you can provide
Marco
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io