-----Original message-----
From: Yan, Zheng <ukernel(a)gmail.com>
Sent: Wed 06-11-2019 14:16
Subject: Re: [ceph-users] mds crash loop
To: Karsten Nielsen <karsten(a)foo-bar.dk>;
CC: ceph-users(a)ceph.io;
> On Wed, Nov 6, 2019 at 4:42 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote:
> >
> > -----Original message-----
> > From: Yan, Zheng <ukernel(a)gmail.com>
> > Sent: Wed 06-11-2019 08:15
> > Subject: Re: [ceph-users] mds crash loop
> > To: Karsten Nielsen <karsten(a)foo-bar.dk>;
> > CC: ceph-users(a)ceph.io;
> > > On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote:
> > > >
> > > > Hi,
> > > >
> > > > Last week I upgraded my ceph cluster from luminus to mimic 13.2.6
> > > > It was running fine for a while but yesterday my mds went into a crash
> loop.
> > > >
> > > > I have 1 active and 1 standby mds for my cephfs both of which is running
> the
> > > same crash loop.
> > > > I am running ceph based on https://hub.docker.com/r/ceph/daemon version
> > > v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv store.
> > > >
> > > > Log details are: https://paste.debian.net/1113943/
> > > >
> > >
> > > please try again with debug_mds=20. Thanks
> > >
> > > Yan, Zheng
> >
> > Yes I have set that and had to move to pastebin.com as debian apperently only
> supports 150k
> >
> >
> > https://pastebin.com/Gv7c5h54
> >
>
> Looks like on-disk root inode is corrupted. have you encountered any
> unusually things during the upgrade?
>
> please run 'rados -p <cephfs metadata pool> stat 1.00000000.inode' ,
> check if the object is modified before or after the 'luminous ->
> 13.2.6' upgrade.
The fil was modified before the upgrade.
> To fix the corrupted object. Run 'cephfs-data-scan init
> --force-init'. Then restart mds. After mds become active, run 'ceph
> daemon mds.x scrub_path / force repair'
>
>
> > - Karsten
> >
> > >
> > > > Thanks for any hints
> > > > - Karsten
> > > > _______________________________________________
> > > > ceph-users mailing list -- ceph-users(a)ceph.io
> > > > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> > >
> > >
>
>
Hi,
I recently upgraded my 3-node cluster to proxmox 6 / debian-10 and
recreated my ceph cluster with a new release (14.2.4 bluestore) -
basically hoping to gain some I/O speed.
The installation went flawlessly, reading is faster than before (~ 80
MB/s), however, the write speed is still really slow (~ 3,5 MB/s).
I wonder if I can do anything to speed things up?
My Hardware is as the following:
3 Nodes with Supermicro X8DTT-HIBQF Mainboard each,
2 OSD per node (2TB SATA harddisks, WDC WD2000F9YZ-0),
interconnected via Infiniband 40
The network should be reasonably fast, I measure ~ 16 GBit/s with iperf,
so this seems fine.
I use ceph for RBD only, so my measurement is simply doing a very simple
"dd" read and write test within a virtual machine (Debian 8) like the
following:
read:
dd if=/dev/vdb | pv | dd of=/dev/null
-> 80 MB/s
write:
dd if=/dev/zero | pv | dd of=/dev/vdb
-> 3.5 MB/s
When I do the same on the virtual machine on a disk that is on a NFS
storage, I get something about 30 MB/s for reading and writing.
If I disable the write cache on all OSD disks via "hdparm -W 0
/dev/sdX", I gain a little bit of performance, write speed is then 4.3 MB/s.
Thanks to your help from the list I plan to install a second ceph
cluster which is SSD based (Samsung PM1725b) which should be much
faster, however, I still wonder if there is any way to speed up my
harddisk based cluster?
Thank you in advance for any help,
Best Regards,
Hermann
--
hermann(a)qwer.tk
PGP/GPG: 299893C7 (on keyservers)
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io
Well, even after restarting the MGR service the relevant log is spoiled
with this error messages:
2019-11-06 17:46:22.363 7f81ffdcc700 0 auth: could not find secret_id=3865
2019-11-06 17:46:22.363 7f81ffdcc700 0 cephx: verify_authorizer could
not get service secret for service mgr secret_id=3865
As you can see the secret_id changes.
However I have no idea what's the related service for this secret_id.
And in my opinion these errors are preventing the MGR from doing it's
job: bringing the cluster to Healthy state.
Am 06.11.2019 um 17:41 schrieb Mac Wynkoop:
> I actually just had some unresponsive mgr daemons. If It happens
> again, I'll see if it's the same error if it happens again. Restarting
> them fixed the issue.
> Mac Wynkoop
>
>
>
>
> On Wed, Nov 6, 2019 at 8:43 AM Thomas Schneider <74cmonty(a)gmail.com
> <mailto:74cmonty@gmail.com>> wrote:
>
> Hi,
>
> does anybody get this error messages in MGR log?
> 2019-11-06 15:41:44.765 7f10db740700 0 auth: could not find
> secret_id=3863
> 2019-11-06 15:41:44.765 7f10db740700 0 cephx: verify_authorizer could
> not get service secret for service mgr secret_id=3863
>
>
> THX
>
> Am 06.11.2019 um 10:43 schrieb Oliver Freyermuth:
> > Hi together,
> >
> > interestingly, now that the third mon is missing for almost a week
> > (those planned interventions always take longer than expected...),
> > we get mgr failovers (but without crashes).
> >
> > In the mgr log, I find:
> >
> > 2019-11-06 07:50:05.409 7fce8a0dc700 0 client.0 ms_handle_reset on
> > v2:10.160.16.1:6800/618072 <http://10.160.16.1:6800/618072>
> > ...
> > ... the normal churning ...
> > ...
> > 2019-11-06 07:52:44.113 7fce8a0dc700 -1 mgr handle_mgr_map I was
> > active but no longer am
> > 2019-11-06 07:52:44.113 7fce8a0dc700 1 mgr respawn e:
> > '/usr/bin/ceph-mgr'
> >
> > In the mon log, I see:
> > ...
> > 2019-11-06 07:44:11.565 7f1f44453700 4 rocksdb:
> > [db/db_impl_files.cc:356] [JOB 225] Try to delete WAL files size
> > 10830909, prev total WAL file size 10839895, number of live WAL
> files 2.
> >
> > 2019-11-06 07:44:11.565 7f1f3a43f700 4 rocksdb:
> > [db/db_impl_compaction_flush.cc:1403] [default] Manual compaction
> > starting
> > 2019-11-06 07:44:11.565 7f1f44c54700 4 rocksdb: (Original Log Time
> > 2019/11/06-07:44:11.565802) [db/db_impl_compaction_flush.cc:2374]
> > [default] Manual compaction from level-0 to level-6 from 'mgrstat ..
> > 'mgrstat; will stop at (end)
> > ...
> > 2019-11-06 07:50:36.734 7f1f3a43f700 4 rocksdb:
> > [db/db_impl_compaction_flush.cc:1403] [default] Manual compaction
> > starting
> > 2019-11-06 07:52:27.046 7f1f4144d700 0 log_channel(cluster) log
> [INF]
> > : Manager daemon mon001 is unresponsive, replacing it with standby
> > daemon mon002
> > ...
> >
> > There's a lot of compaction going on (probably due to the prolonged
> > HEALTH_WARN state, so not really unexpected)
> > so I wonder whether the actual cause for identifying the mgr as
> > "unresponsive" is the heavy compaction on the mons.
> > It will be interesting to see what happens when we finally have the
> > third mon back and the cluster becomes healthy again...
> >
> > Did somebody see something similar after running for a week or more
> > with Nautilus on old and slow hardware?
> >
> > Cheers,
> > Oliver
> >
> > Am 02.11.19 um 18:20 schrieb Oliver Freyermuth:
> >> Dear Sage,
> >>
> >> good news - it happened again, with debug logs!
> >> There's nothing obvious to my eye, it's uploaded as:
> >> 0b2d0c09-46f3-4126-aa27-e2d2e8572741
> >> It seems the failure was roughly in parallel to me wanting to
> access
> >> the dashboard. It must have happened within the last ~5-10
> minutes of
> >> the log.
> >>
> >> I'll now go back to "stable operation", in case you need anything
> >> else, just let me know.
> >>
> >> Cheers and all the best,
> >> Oliver
> >>
> >> Am 02.11.19 um 17:38 schrieb Oliver Freyermuth:
> >>> Dear Sage,
> >>>
> >>> at least for the simple case:
> >>> ceph device get-health-metrics osd.11
> >>> => mgr crashes (but in that case, it crashes fully, i.e. the
> process
> >>> is gone)
> >>> I have now uploaded a verbose log as:
> >>> ceph-post-file: e3bd60ad-cbce-4308-8b07-7ebe7998572e
> >>>
> >>> One potential cause of this (and maybe the other issues) might be
> >>> because some of our OSDs are on non-JBOD controllers and hence are
> >>> made by forming a Raid 0 per disk,
> >>> so a simple smartctl on the device will not work (but
> >>> -dmegaraid,<number> would be needed).
> >>>
> >>> Now I have both mgrs active again, debug logging on, device health
> >>> metrics on again,
> >>> and am waiting for them to become silent again. Let's hope the
> issue
> >>> reappears before the disks run full of logs ;-).
> >>>
> >>> Cheers,
> >>> Oliver
> >>>
> >>> Am 02.11.19 um 02:56 schrieb Sage Weil:
> >>>> On Sat, 2 Nov 2019, Oliver Freyermuth wrote:
> >>>>> Dear Cephers,
> >>>>>
> >>>>> interestingly, after:
> >>>>> ceph device monitoring off
> >>>>> the mgrs seem to be stable now - the active one still went
> silent
> >>>>> a few minutes later,
> >>>>> but the standby took over and was stable, and restarting the
> >>>>> broken one, it's now stable since an hour, too,
> >>>>> so probably, a restart of the mgr is needed after disabling
> device
> >>>>> monitoring to get things stable again.
> >>>>>
> >>>>> So it seems to be caused by a problem with the device health
> >>>>> metrics. In case this is a red herring and mgrs become instable
> >>>>> again in the next days,
> >>>>> I'll let you know.
> >>>>
> >>>> If this seems to stabilize things, and you can tolerate
> inducing the
> >>>> failure again, reproducing the problem with mgr logs cranked up
> >>>> (debug_mgr
> >>>> = 20, debug_ms = 1) would probably give us a good idea of why the
> >>>> mgr is
> >>>> hanging. Let us know!
> >>>>
> >>>> Thanks,
> >>>> sage
> >>>>
> >>>> >
> >>>>> Cheers,
> >>>>> Oliver
> >>>>>
> >>>>> Am 01.11.19 um 23:09 schrieb Oliver Freyermuth:
> >>>>>> Dear Cephers,
> >>>>>>
> >>>>>> this is a 14.2.4 cluster with device health metrics enabled -
> >>>>>> since about a day, all mgr daemons go "silent" on me after
> a few
> >>>>>> hours, i.e. "ceph -s" shows:
> >>>>>>
> >>>>>> cluster:
> >>>>>> id: 269cf2b2-7e7c-4ceb-bd1b-a33d915ceee9
> >>>>>> health: HEALTH_WARN
> >>>>>> no active mgr
> >>>>>> 1/3 mons down, quorum mon001,mon002
> >>>>>> services:
> >>>>>> mon: 3 daemons, quorum mon001,mon002 (age 57m), out
> >>>>>> of quorum: mon003
> >>>>>> mgr: no daemons active (since 56m)
> >>>>>> ...
> >>>>>> (the third mon has a planned outage and will come back in a few
> >>>>>> days)
> >>>>>>
> >>>>>> Checking the logs of the mgr daemons, I find some "reset"
> >>>>>> messages at the time when it goes "silent", first for the
> first mgr:
> >>>>>>
> >>>>>> 2019-11-01 21:34:40.286 7f2df6a6b700 0
> log_channel(cluster) log
> >>>>>> [DBG] : pgmap v1798: 1585 pgs: 1585 active+clean; 1.1 TiB data,
> >>>>>> 2.3 TiB used, 136 TiB / 138 TiB avail
> >>>>>> 2019-11-01 21:34:41.458 7f2e0d59b700 0 client.0
> ms_handle_reset
> >>>>>> on v2:10.160.16.1:6800/401248 <http://10.160.16.1:6800/401248>
> >>>>>> 2019-11-01 21:34:42.287 7f2df6a6b700 0
> log_channel(cluster) log
> >>>>>> [DBG] : pgmap v1799: 1585 pgs: 1585 active+clean; 1.1 TiB data,
> >>>>>> 2.3 TiB used, 136 TiB / 138 TiB avail
> >>>>>>
> >>>>>> and a bit later, on the standby mgr:
> >>>>>>
> >>>>>> 2019-11-01 22:18:14.892 7f7bcc8ae700 0
> log_channel(cluster) log
> >>>>>> [DBG] : pgmap v1798: 1585 pgs: 166 active+clean+snaptrim, 858
> >>>>>> active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3
> >>>>>> TiB used, 136 TiB / 138 TiB avail
> >>>>>> 2019-11-01 22:18:16.022 7f7be9e72700 0 client.0
> ms_handle_reset
> >>>>>> on v2:10.160.16.2:6800/352196 <http://10.160.16.2:6800/352196>
> >>>>>> 2019-11-01 22:18:16.893 7f7bcc8ae700 0
> log_channel(cluster) log
> >>>>>> [DBG] : pgmap v1799: 1585 pgs: 166 active+clean+snaptrim, 858
> >>>>>> active+clean+snaptrim_wait, 561 active+clean; 1.1 TiB data, 2.3
> >>>>>> TiB used, 136 TiB / 138 TiB avail
> >>>>>>
> >>>>>> Interestingly, the dashboard still works, but presents outdated
> >>>>>> information, and for example zero I/O going on.
> >>>>>> I believe this started to happen mainly after the third mon
> went
> >>>>>> into the known downtime, but I am not fully sure if this
> was the
> >>>>>> trigger, since the cluster is still growing.
> >>>>>> It may also have been the addition of 24 more OSDs.
> >>>>>>
> >>>>>>
> >>>>>> I also find other messages in the mgr logs which seem
> >>>>>> problematic, but I am not sure they are related:
> >>>>>> ------------------------------
> >>>>>> 2019-11-01 21:17:09.849 7f2df4266700 0 mgr[devicehealth] Error
> >>>>>> reading OMAP: [errno 22] Failed to operate read op for oid
> >>>>>> Traceback (most recent call last):
> >>>>>> File "/usr/share/ceph/mgr/devicehealth/module.py", line 396,
> >>>>>> in put_device_metrics
> >>>>>> ioctx.operate_read_op(op, devid)
> >>>>>> File "rados.pyx", line 516, in
> >>>>>> rados.requires.wrapper.validate_func
> >>>>>>
> (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUIL
> >>>>>> D/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:4721)
> >>>>>> File "rados.pyx", line 3474, in rados.Ioctx.operate_read_op
> >>>>>>
> (/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.4/rpm/el7/BUILD/ceph-14.2.4/build/src/pybind/rados/pyrex/rados.c:36554)
> >>>>>> InvalidArgumentError: [errno 22] Failed to operate read op
> for oid
> >>>>>> ------------------------------
> >>>>>> or:
> >>>>>> ------------------------------
> >>>>>> 2019-11-01 21:33:53.977 7f7bd38bc700 0 mgr[devicehealth]
> Fail to
> >>>>>> parse JSON result from daemon osd.51 ()
> >>>>>> 2019-11-01 21:33:53.978 7f7bd38bc700 0 mgr[devicehealth]
> Fail to
> >>>>>> parse JSON result from daemon osd.52 ()
> >>>>>> 2019-11-01 21:33:53.979 7f7bd38bc700 0 mgr[devicehealth]
> Fail to
> >>>>>> parse JSON result from daemon osd.53 ()
> >>>>>> ------------------------------
> >>>>>>
> >>>>>> The reason why I am cautious about the health metrics is that I
> >>>>>> observed a crash when trying to query them:
> >>>>>> ------------------------------
> >>>>>> 2019-11-01 20:21:23.661 7fa46314a700 0 log_channel(audit) log
> >>>>>> [DBG] : from='client.174136 -' entity='client.admin'
> >>>>>> cmd=[{"prefix": "device get-health-metrics", "devid": "osd.11",
> >>>>>> "target": ["mgr", ""]}]: dispatch
> >>>>>> 2019-11-01 20:21:23.661 7fa46394b700 0 mgr[devicehealth]
> >>>>>> handle_command
> >>>>>> 2019-11-01 20:21:23.663 7fa46394b700 -1 *** Caught signal
> >>>>>> (Segmentation fault) **
> >>>>>> in thread 7fa46394b700 thread_name:mgr-fin
> >>>>>>
> >>>>>> ceph version 14.2.4
> (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
> >>>>>> nautilus (stable)
> >>>>>> 1: (()+0xf5f0) [0x7fa488cee5f0]
> >>>>>> 2: (PyEval_EvalFrameEx()+0x1a9) [0x7fa48aeb50f9]
> >>>>>> 3: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d]
> >>>>>> 4: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d]
> >>>>>> 5: (PyEval_EvalFrameEx()+0x67bd) [0x7fa48aebb70d]
> >>>>>> 6: (PyEval_EvalCodeEx()+0x7ed) [0x7fa48aebe08d]
> >>>>>> 7: (()+0x709c8) [0x7fa48ae479c8]
> >>>>>> 8: (PyObject_Call()+0x43) [0x7fa48ae22ab3]
> >>>>>> 9: (()+0x5aaa5) [0x7fa48ae31aa5]
> >>>>>> 10: (PyObject_Call()+0x43) [0x7fa48ae22ab3]
> >>>>>> 11: (()+0x4bb95) [0x7fa48ae22b95]
> >>>>>> 12: (PyObject_CallMethod()+0xbb) [0x7fa48ae22ecb]
> >>>>>> 13: (ActivePyModule::handle_command(std::map<std::string,
> >>>>>> boost::variant<std::string, bool, long, double,
> >>>>>> std::vector<std::string, std::allocator<std::string> >,
> >>>>>> std::vector<long, std::allocator<long> >, std::vector<double,
> >>>>>> std::allocator<double> > >, std::less<void>,
> >>>>>> std::allocator<std::pair<std::string const,
> >>>>>> boost::variant<std::string, bool, long, double,
> >>>>>> std::vector<std::string, std::allocator<std::string> >,
> >>>>>> std::vector<long, std::allocator<long> >, std::vector<double,
> >>>>>> std::allocator<double> > > > > > const&,
> >>>>>> ceph::buffer::v14_2_0::list const&,
> std::basic_stringstream<char,
> >>>>>> std::char_traits<char>, std::allocator<char> >*,
> >>>>>> std::basic_stringstream<char, std::char_traits<char>,
> >>>>>> std::allocator<char> >*)+0x20e) [0x55c3c1fefc5e]
> >>>>>> 14: (()+0x16c23d) [0x55c3c204023d]
> >>>>>> 15: (FunctionContext::finish(int)+0x2c) [0x55c3c2001eac]
> >>>>>> 16: (Context::complete(int)+0x9) [0x55c3c1ffe659]
> >>>>>> 17: (Finisher::finisher_thread_entry()+0x156)
> [0x7fa48b439cc6]
> >>>>>> 18: (()+0x7e65) [0x7fa488ce6e65]
> >>>>>> 19: (clone()+0x6d) [0x7fa48799488d]
> >>>>>> NOTE: a copy of the executable, or `objdump -rdS
> <executable>`
> >>>>>> is needed to interpret this.
> >>>>>> ------------------------------
> >>>>>>
> >>>>>> I have issued:
> >>>>>> ceph device monitoring off
> >>>>>> for now and will keep waiting to see if mgrs go silent
> again. If
> >>>>>> there are any better ideas or this issue is known, let me know.
> >>>>>>
> >>>>>> Cheers,
> >>>>>> Oliver
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> >>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
> >>>>>>
> >>>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
> >>
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
>
-----Original message-----
From: Yan, Zheng <ukernel(a)gmail.com>
Sent: Wed 06-11-2019 08:15
Subject: Re: [ceph-users] mds crash loop
To: Karsten Nielsen <karsten(a)foo-bar.dk>;
CC: ceph-users(a)ceph.io;
> On Tue, Nov 5, 2019 at 5:29 PM Karsten Nielsen <karsten(a)foo-bar.dk> wrote:
> >
> > Hi,
> >
> > Last week I upgraded my ceph cluster from luminus to mimic 13.2.6
> > It was running fine for a while but yesterday my mds went into a crash loop.
> >
> > I have 1 active and 1 standby mds for my cephfs both of which is running the
> same crash loop.
> > I am running ceph based on https://hub.docker.com/r/ceph/daemon version
> v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv store.
> >
> > Log details are: https://paste.debian.net/1113943/
> >
>
> please try again with debug_mds=20. Thanks
>
> Yan, Zheng
Yes I have set that and had to move to pastebin.com as debian apperently only supports 150k
https://pastebin.com/Gv7c5h54
- Karsten
>
> > Thanks for any hints
> > - Karsten
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Hi,
Last week I upgraded my ceph cluster from luminus to mimic 13.2.6
It was running fine for a while but yesterday my mds went into a crash loop.
I have 1 active and 1 standby mds for my cephfs both of which is running the same crash loop.
I am running ceph based on https://hub.docker.com/r/ceph/daemon version v3.2.7-stable-3.2-minic-centos-7-x86_64 with a etcd kv store.
Log details are: https://paste.debian.net/1113943/
Thanks for any hints
- Karsten
Hi,
I'm trying ceph for the first time.
I'm trying to use the repository below:
deb https://download.ceph.com/debian-nautilus/ stretch main
But it seems that this repository only has the ceph-deploy package,
not the rest of ceph.
Why is that? How can I get all updated nautilus packages?
Regards,
Rodrigo Severo
pgs: 14.377% pgs not active
3749681/537818808 objects misplaced (0.697%)
810 active+clean
156 down
124 active+remapped+backfilling
1 active+remapped+backfill_toofull
1 down+inconsistent
when looking at the down pg's all disks are online
41.3db 53775 0 0 0 401643186092 0
0 3044 down 6m 161222'303144 162913:4630171
[32,96,128,115,86,129,113,124,57,109]p32
[32,96,128,115,86,129,113,124,57,109]p32 2019-11-03
Any way to see why the pg is down ?
Hi All,
I'm using ceph-14.2.4 and testing in FIPS enable cluster. Downloading
objects are works but ceph raised segmentation exception while uploading.
Please help me here. And please provide debugging stage, So I could take
in development environment.
Thanks,
Amit G