Hi list,

After the nodes ran OOM and after reboot, we are not able to restart the ceph-osd@x services anymore. (Details about the setup at the end).

I am trying to do this manually, so we can see the error but all i see is several crash dumps - this is just one of the OSDs which is not starting. Any idea how to get past this??
[root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id 83 --setuser ceph --setgroup ceph  > /tmp/dump 2>&1
starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83 /var/lib/ceph/osd/ceph-83/journal
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0)
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b]
 2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 8: (OSD::init()+0xc99) [0x5555559238e9]
 9: (main()+0x23a3) [0x5555558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x5555558d9900]
2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b]
 2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 8: (OSD::init()+0xc99) [0x5555559238e9]
 9: (main()+0x23a3) [0x5555558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x5555558d9900]

*** Caught signal (Aborted) **
 in thread 2aaaaaaf5540 thread_name:ceph-osd
 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468]
 5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 11: (OSD::init()+0xc99) [0x5555559238e9]
 12: (main()+0x23a3) [0x5555558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x5555558d9900]
2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) **
 in thread 2aaaaaaf5540 thread_name:ceph-osd


 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468]
 5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 11: (OSD::init()+0xc99) [0x5555559238e9]
 12: (main()+0x23a3) [0x5555558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x5555558d9900]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/
x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b]
 2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 8: (OSD::init()+0xc99) [0x5555559238e9]
 9: (main()+0x23a3) [0x5555558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x5555558d9900]

  -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) **
 in thread 2aaaaaaf5540 thread_name:ceph-osd

ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468]
 5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 11: (OSD::init()+0xc99) [0x5555559238e9]
 12: (main()+0x23a3) [0x5555558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x5555558d9900]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

  -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0)

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b]
 2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 8: (OSD::init()+0xc99) [0x5555559238e9]
 9: (main()+0x23a3) [0x5555558017a3]
 10: (__libc_start_main()+0xf5) [0x2aaab77de495]
 11: (()+0x385900) [0x5555558d9900]

  -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) **
 in thread 2aaaaaaf5540 thread_name:ceph-osd

 ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable)
 1: (()+0xf5d0) [0x2aaab69765d0]
 2: (gsignal()+0x37) [0x2aaab77f22c7]
 3: (abort()+0x148) [0x2aaab77f39b8]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468]
 5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d]
 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a]
 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100]
 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb]
 10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
 11: (OSD::init()+0xc99) [0x5555559238e9]
 12: (main()+0x23a3) [0x5555558017a3]
 13: (__libc_start_main()+0xf5) [0x2aaab77de495]
 14: (()+0x385900) [0x5555558d9900]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

Environment:
[root@ceph001 ~]# uname -r
3.10.0-957.27.2.el7.x86_64
[root@ceph001 ~]# cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
[root@ceph001 ~]# rpm -qa | grep -i ceph
cm-config-ceph-release-mimic-8.2-73_cm8.2.noarch
ceph-13.2.6-0.el7.x86_64
ceph-selinux-13.2.6-0.el7.x86_64
ceph-base-13.2.6-0.el7.x86_64
ceph-osd-13.2.6-0.el7.x86_64
cm-config-ceph-radosgw-systemd-8.2-6_cm8.2.noarch
libcephfs2-13.2.6-0.el7.x86_64
ceph-common-13.2.6-0.el7.x86_64
ceph-mgr-13.2.6-0.el7.x86_64
cm-config-ceph-systemd-8.2-12_cm8.2.noarch
ceph-mon-13.2.6-0.el7.x86_64
python-cephfs-13.2.6-0.el7.x86_64
ceph-mds-13.2.6-0.el7.x86_64

ceph osd tree:
ID  CLASS WEIGHT    TYPE NAME        STATUS REWEIGHT PRI-AFF
 -1       785.95801 root default                            
 -5       261.98599     host ceph001                        
  1   hdd   7.27699         osd.1        up  1.00000 1.00000
  3   hdd   7.27699         osd.3      down  1.00000 1.00000
  6   hdd   7.27699         osd.6      down  1.00000 1.00000
  9   hdd   7.27699         osd.9      down        0 1.00000
 12   hdd   7.27699         osd.12     down  1.00000 1.00000
 15   hdd   7.27699         osd.15       up  1.00000 1.00000
 18   hdd   7.27699         osd.18     down  1.00000 1.00000
 21   hdd   7.27699         osd.21     down  1.00000 1.00000
 24   hdd   7.27699         osd.24       up  1.00000 1.00000
 27   hdd   7.27699         osd.27     down  1.00000 1.00000
 30   hdd   7.27699         osd.30     down  1.00000 1.00000
 35   hdd   7.27699         osd.35     down  1.00000 1.00000
 37   hdd   7.27699         osd.37     down  1.00000 1.00000
 40   hdd   7.27699         osd.40     down  1.00000 1.00000
 44   hdd   7.27699         osd.44     down  1.00000 1.00000
 47   hdd   7.27699         osd.47       up  1.00000 1.00000
 50   hdd   7.27699         osd.50       up  1.00000 1.00000
 53   hdd   7.27699         osd.53     down  1.00000 1.00000
 56   hdd   7.27699         osd.56     down  1.00000 1.00000
 59   hdd   7.27699         osd.59       up  1.00000 1.00000
 62   hdd   7.27699         osd.62     down        0 1.00000
 65   hdd   7.27699         osd.65     down  1.00000 1.00000
 68   hdd   7.27699         osd.68     down  1.00000 1.00000
 71   hdd   7.27699         osd.71     down  1.00000 1.00000
 74   hdd   7.27699         osd.74     down  1.00000 1.00000
 77   hdd   7.27699         osd.77       up  1.00000 1.00000
 80   hdd   7.27699         osd.80     down  1.00000 1.00000
 83   hdd   7.27699         osd.83       up  1.00000 1.00000
 86   hdd   7.27699         osd.86     down  1.00000 1.00000
 88   hdd   7.27699         osd.88     down  1.00000 1.00000
 91   hdd   7.27699         osd.91     down  1.00000 1.00000
 94   hdd   7.27699         osd.94     down  1.00000 1.00000
 97   hdd   7.27699         osd.97     down  1.00000 1.00000
100   hdd   7.27699         osd.100    down        0 1.00000
103   hdd   7.27699         osd.103    down  1.00000 1.00000
106   hdd   7.27699         osd.106      up  1.00000 1.00000
 -3       261.98599     host ceph002                        
  0   hdd   7.27699         osd.0      down        0 1.00000
  4   hdd   7.27699         osd.4        up  1.00000 1.00000
  7   hdd   7.27699         osd.7        up  1.00000 1.00000
 11   hdd   7.27699         osd.11     down  1.00000 1.00000
 13   hdd   7.27699         osd.13       up  1.00000 1.00000
 16   hdd   7.27699         osd.16     down  1.00000 1.00000
 19   hdd   7.27699         osd.19     down        0 1.00000
 23   hdd   7.27699         osd.23       up  1.00000 1.00000
 26   hdd   7.27699         osd.26     down        0 1.00000
 29   hdd   7.27699         osd.29     down        0 1.00000
 32   hdd   7.27699         osd.32     down        0 1.00000
 33   hdd   7.27699         osd.33     down        0 1.00000
 36   hdd   7.27699         osd.36     down        0 1.00000
 39   hdd   7.27699         osd.39     down  1.00000 1.00000
 43   hdd   7.27699         osd.43       up  1.00000 1.00000
 46   hdd   7.27699         osd.46       up  1.00000 1.00000
 49   hdd   7.27699         osd.49     down  1.00000 1.00000
 52   hdd   7.27699         osd.52     down  1.00000 1.00000
 55   hdd   7.27699         osd.55     down        0 1.00000
 58   hdd   7.27699         osd.58       up  1.00000 1.00000
 61   hdd   7.27699         osd.61     down  1.00000 1.00000
 64   hdd   7.27699         osd.64     down  1.00000 1.00000
 67   hdd   7.27699         osd.67       up  1.00000 1.00000
 70   hdd   7.27699         osd.70     down  1.00000 1.00000
 73   hdd   7.27699         osd.73     down  1.00000 1.00000
 76   hdd   7.27699         osd.76       up  1.00000 1.00000
 78   hdd   7.27699         osd.78     down  1.00000 1.00000
 81   hdd   7.27699         osd.81     down  1.00000 1.00000
 84   hdd   7.27699         osd.84     down        0 1.00000
 87   hdd   7.27699         osd.87     down  1.00000 1.00000
 90   hdd   7.27699         osd.90     down        0 1.00000
 93   hdd   7.27699         osd.93     down  1.00000 1.00000
 96   hdd   7.27699         osd.96     down        0 1.00000
 99   hdd   7.27699         osd.99     down        0 1.00000
102   hdd   7.27699         osd.102    down        0 1.00000
105   hdd   7.27699         osd.105      up  1.00000 1.00000
 -7       261.98599     host ceph003                        
  2   hdd   7.27699         osd.2        up  1.00000 1.00000
  5   hdd   7.27699         osd.5      down  1.00000 1.00000
  8   hdd   7.27699         osd.8        up  1.00000 1.00000
 10   hdd   7.27699         osd.10     down        0 1.00000
 14   hdd   7.27699         osd.14     down        0 1.00000
 17   hdd   7.27699         osd.17       up  1.00000 1.00000
 20   hdd   7.27699         osd.20     down        0 1.00000
 22   hdd   7.27699         osd.22     down        0 1.00000
 25   hdd   7.27699         osd.25       up  1.00000 1.00000
 28   hdd   7.27699         osd.28       up  1.00000 1.00000
 31   hdd   7.27699         osd.31     down        0 1.00000
 34   hdd   7.27699         osd.34     down        0 1.00000
 38   hdd   7.27699         osd.38     down        0 1.00000
 41   hdd   7.27699         osd.41     down  1.00000 1.00000
 42   hdd   7.27699         osd.42     down        0 1.00000
 45   hdd   7.27699         osd.45       up  1.00000 1.00000
 48   hdd   7.27699         osd.48       up  1.00000 1.00000
 51   hdd   7.27699         osd.51     down  1.00000 1.00000
 54   hdd   7.27699         osd.54       up  1.00000 1.00000
 57   hdd   7.27699         osd.57     down  1.00000 1.00000
 60   hdd   7.27699         osd.60     down  1.00000 1.00000
 63   hdd   7.27699         osd.63       up  1.00000 1.00000
 66   hdd   7.27699         osd.66     down  1.00000 1.00000
 69   hdd   7.27699         osd.69       up  1.00000 1.00000
 72   hdd   7.27699         osd.72       up  1.00000 1.00000
 75   hdd   7.27699         osd.75     down  1.00000 1.00000
 79   hdd   7.27699         osd.79       up  1.00000 1.00000
 82   hdd   7.27699         osd.82     down  1.00000 1.00000
 85   hdd   7.27699         osd.85     down  1.00000 1.00000
 89   hdd   7.27699         osd.89     down        0 1.00000
 92   hdd   7.27699         osd.92     down  1.00000 1.00000
 95   hdd   7.27699         osd.95     down        0 1.00000
 98   hdd   7.27699         osd.98     down        0 1.00000
101   hdd   7.27699         osd.101    down  1.00000 1.00000
104   hdd   7.27699         osd.104    down        0 1.00000
107   hdd   7.27699         osd.107      up  1.00000 1.00000

Ceph status;
[root@ceph001 ~]# ceph status  
  cluster:
    id:     54052e72-6835-410e-88a9-af4ac17a8113
    health: HEALTH_WARN
            1 filesystem is degraded
            1 MDSs report slow metadata IOs
            48 osds down
            Reduced data availability: 2053 pgs inactive, 2043 pgs down, 7 pgs peering, 3 pgs incomplete, 126 pgs stale
            Degraded data redundancy: 18473/27200783 objects degraded (0.068%), 106 pgs degraded, 103 pgs undersized
            too many PGs per OSD (258 > max 250)
 
  services:
    mon: 3 daemons, quorum filler001,filler002,bezavrdat-master01
    mgr: bezavrdat-master01(active), standbys: filler002, filler001
    mds: cephfs-1/1/1 up  {0=filler002=up:replay}, 1 up:standby
    osd: 108 osds: 32 up, 80 in; 16 remapped pgs

  data:
    pools:   2 pools, 2176 pgs
    objects: 2.73 M objects, 1.7 TiB
    usage:   2.3 TiB used, 580 TiB / 582 TiB avail
    pgs:     94.347% pgs not active
             18473/27200783 objects degraded (0.068%)
             1951 down
             79   active+undersized+degraded
             76   stale+down
             23   stale+active+undersized+degraded
             14   down+remapped
             14   stale+active+clean
             6    stale+peering
             3    active+clean
             3    stale+active+recovery_wait+degraded
             2    incomplete
             2    stale+down+remapped
             1    stale+incomplete
             1    stale+remapped+peering
             1    active+recovering+undersized+degraded+remapped

Thank you in advance!

Regards,

Atos logo

Andrea Del Monaco
HPC Consultant – Big Data & Security
M: +31 612031174
Burgemeester Rijnderslaan 30 – 1185 MC Amstelveen – The Netherlands
atos.net
LinkedIn icon Twitter icon Facebook icon Youtube icon 

This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.