Patrick,

This PR https://github.com/ceph/ceph/pull/30285 introduced the slow ops fix recently.

I haven't been able to reproduce the issue that you hit so far. Can you please let me
know the "--subset" you used for your run (pdonnell-2019-09-14_22:39:31-fs-master-distro-basic-smithi)
on the fs suite?

Just to clarify, before the fix, even if there were legitimate slow ops on 1 or more osds,
they would not get reported as part of the "ceph -s" output. The above fix just addressed the issue
of the slow ops not getting shown as part of "ceph -s" command if were any reported.

Since the tests actually parse the cluster logs for slow ops, it doesn't seem that the above fix
introduced this issue. The thing that needs investigation is the slow ops being reported with a
timestamp of 0.000000. Therefore, it would help if you can share the "--subset" option that
you used for your run.

Thanks,
-Sridhar



On Thu, Sep 26, 2019 at 3:08 AM Gregory Farnum <gfarnum@redhat.com> wrote:
On Tue, Sep 24, 2019 at 10:32 AM Casey Bodley <cbodley@redhat.com> wrote:
>
> The tests are detecting these failures by grepping the cluster log. It
> looks like ceph-mon is responsible for writing these 'Health check
> failed' warnings there, so I'm not sure that ceph-mgr bug is involved
> here - but maybe someone more familiar with ceph-mgr could say for sure?

I believe the slow op warnings are one of the things that the monitor
sources from the manager as one of the collating services it performs.
-Greg


>
> On 9/23/19 3:21 AM, Dan van der Ster wrote:
> > Since mimic, OSD slow ops have not been displayed by the cluster
> > health [1] -- this was fixed recently:
> >
> > https://github.com/ceph/ceph/pull/30285/commits/02cc60f6935a5005aa461da183c6c4332503be83
> >
> > -- Dan
> >
> > [1] original report: https://tracker.ceph.com/issues/40993
> >
> > On Mon, Sep 23, 2019 at 8:48 AM Patrick Donnelly <pdonnell@redhat.com> wrote:
> >> https://tracker.ceph.com/issues/41834
> >>
> >> This is broadly affecting Ceph QA. Hoping this mail will get the
> >> notice of the person whose changes maybe broke it.
> >>
> >> --
> >> Patrick Donnelly, Ph.D.
> >> He / Him / His
> >> Senior Software Engineer
> >> Red Hat Sunnyvale, CA
> >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
> >> _______________________________________________
> >> Dev mailing list -- dev@ceph.io
> >> To unsubscribe send an email to dev-leave@ceph.io
> > _______________________________________________
> > Dev mailing list -- dev@ceph.io
> > To unsubscribe send an email to dev-leave@ceph.io
> _______________________________________________
> Dev mailing list -- dev@ceph.io
> To unsubscribe send an email to dev-leave@ceph.io
_______________________________________________
Dev mailing list -- dev@ceph.io
To unsubscribe send an email to dev-leave@ceph.io


--

Sridhar Seshasayee

Principal Software Engineer

Red Hat