October 2019 - ceph-users

by Frank Schilder

I'm running a cepf fs with an 8+2 EC data pool. Disks are on 10 hosts and failure domain is host. Version is mimic 13.2.2. Today I added a few OSDs to one of the hosts and observed that a lot of PGs became inactive even though 9 out of 10 hosts were up all the time. After getting the 10th host and all disks up, I still ended up with a large amount of undersized PGs and degraded objects, which I don't understand as no OSD was removed. Here some details about the steps taken on the host with new disks, main questions at the end: - shut down OSDs (systemctl stop docker) - reboot host (this is necessary due to OS deployment via warewulf) Devices got renamed and not all disks came back up (4 OSDs remained down). This is expected, I need to re-deploy the containers to adjust for device name changes. Around this point PGs started peering and some failed waiting for 1 of the down OSDs. I don't understand why they didn't just remain active with 9 out of 10 disks. Until this moment of some OSDs coming up, all PGs were active. With min_size=9 I would expect all PGs to remain active with no changes to 9 out of the 10 hosts. - redeploy docker containers - all disks/OSDs come up, including the 4 OSDs from above - inactive PGs complete peering and become active - now I have a los of degraded Objects and undersized PGs even though not a single OSD was removed I don't understand why I have degraded objects. I should just have misplaced objects: HEALTH_ERR 22995992/145698909 objects misplaced (15.783%) Degraded data redundancy: 5213734/145698909 objects degraded (3.578%), 208 pgs degraded, 208 pgs undersized Degraded data redundancy (low space): 169 pgs backfill_toofull Note: The backfill_toofull with low utilization (usage: 38 TiB used, 1.5 PiB / 1.5 PiB avail) is a known issue in ceph (https://tracker.ceph.com/issues/39555) Also, I should be able to do whatever with 1 out of 10 hosts without loosing data access. What could be the problem here? Questions summary: Why does peering not succeed to keep all PGs active with 9 out of 10 OSDs up and in? Why do undersized PGs arise even though all OSDs are up? Why do degraded objects arise even though no OSD was removed? Thanks! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

4 years, 7 months

2
2
0 0

OSD down when deleting CephFS files/leveldb compaction

by Robert LeBlanc

I have one or two more stability issues I'm trying to solve in a cluster that I inherited that I just can't seem to figure out. One issue may be the cause for the other. This is a Jewel 10.2.11 cluster with ~760 - 10TB HDDs and 5GB journals on SSD. When a large number of files are deleted from CephFS (and possibly when leveldb compacts), the OSD will stop responding to heartbeats and get marked down, then come back and start recovery and then other OSDs will have the same issue until client load on the cluster eases up then it settles down. Is there a way to have leveldb compact more frequently or cause it to come up for air more frequently and respond to heartbeats and process some IO? I thought splitting PGs would help, but we are still seeing the problem (previously ~20 PGs per OSD to now ~150). I still have some space on the SSDs that I can double, almost triple the journal, but not sure if that will help in this situation. The other issue I'm seeing is that some IO just gets stuck when the OSDs are getting marked down and coming back through the cluster. Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

4 years, 7 months

2
2
0 0

Re: one read/write, many read only

by iwesley＠mail.de

There is a lock for object exists. If the file was not writing close, other ones can read only. Regards > Am Oct 2, 2019 - 12:09 AM schrieb khaled.atteya(a)gmail.com: > > > Hi, > > Is it possible to do this scenario : > If one open a file first , he will get read/write permissions and other will get read-only permission if they open the file after the first one. > > Thanks >

4 years, 7 months

1
0
0 0

Re: RAM recommendation with large OSDs?

by Paul Emmerich

The problem with lots of OSDs per node is that this usually means you have too few nodes. It's perfectly fine to run 60 OSDs per node if you got a total of 1000 OSDs or so. But I've seen too many setups with 3-5 nodes where each node runs 60 OSDs which makes no sense (and usually isn't even cheaper than more nodes, especially once you consider the lost opportunity for running erasure coding). The usual backup cluster we are seeing is in the single-digit petabyte range with about 12 to 24 disks per server running ~8+3 erasure coding. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Oct 2, 2019 at 12:53 AM Darrell Enns <darrelle(a)knowledge.ca> wrote: > > Thanks Paul. I was speaking more about total OSDs and RAM, rather than a single node. However, I am considering building a cluster with a large OSD/node count. This would be for archival use, with reduced performance and availability requirements. What issues would you anticipate with a large OSD/node count? Is the concern just the large rebalance if a node fails and takes out a large portion of the OSDs at once? > > -----Original Message----- > From: Paul Emmerich <paul.emmerich(a)croit.io> > Sent: Tuesday, October 01, 2019 3:00 PM > To: Darrell Enns <darrelle(a)knowledge.ca> > Cc: ceph-users(a)ceph.io > Subject: Re: [ceph-users] RAM recommendation with large OSDs? > > On Tue, Oct 1, 2019 at 6:12 PM Darrell Enns <darrelle(a)knowledge.ca> wrote: > > > > The standard advice is “1GB RAM per 1TB of OSD”. Does this actually still hold with large OSDs on bluestore? > > No > > > Can it be reasonably reduced with tuning? > > Yes > > > > From the docs, it looks like bluestore should target the “osd_memory_target” value by default. This is a fixed value (4GB by default), which does not depend on OSD size. So shouldn’t the advice really by “4GB per OSD”, rather than “1GB per TB”? Would it also be reasonable to reduce osd_memory_target for further RAM savings? > > Yes > > > For example, suppose we have 90 12TB OSD drives: > > Please don't put 90 drives in one node, that's not a good idea in 99.9% of the use cases. > > > > > “1GB per TB” rule: 1080GB RAM > > “4GB per OSD” rule: 360GB RAM > > “2GB per OSD” (osd_memory_target reduced to 2GB): 180GB RAM > > > > > > > > Those are some massively different RAM values. Perhaps the old advice was for filestore? Or there is something to consider beyond the bluestore memory target? What about when using very dense nodes (for example, 60 12TB OSDs on a single node)? > > Keep in mind that it's only a target value, it will use more during recovery if you set a low value. > We usually set a target of 3 GB per OSD and recommend 4 GB of RAM per OSD. > > RAM saving trick: use fewer PGs than recommended. > > > Paul > > > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > email to ceph-users-leave(a)ceph.io

4 years, 7 months

1
0
0 0

Re: RAM recommendation with large OSDs?

by Paul Emmerich

On Tue, Oct 1, 2019 at 6:12 PM Darrell Enns <darrelle(a)knowledge.ca> wrote: > > The standard advice is “1GB RAM per 1TB of OSD”. Does this actually still hold with large OSDs on bluestore? No > Can it be reasonably reduced with tuning? Yes > From the docs, it looks like bluestore should target the “osd_memory_target” value by default. This is a fixed value (4GB by default), which does not depend on OSD size. So shouldn’t the advice really by “4GB per OSD”, rather than “1GB per TB”? Would it also be reasonable to reduce osd_memory_target for further RAM savings? Yes > For example, suppose we have 90 12TB OSD drives: Please don't put 90 drives in one node, that's not a good idea in 99.9% of the use cases. > > “1GB per TB” rule: 1080GB RAM > “4GB per OSD” rule: 360GB RAM > “2GB per OSD” (osd_memory_target reduced to 2GB): 180GB RAM > > > > Those are some massively different RAM values. Perhaps the old advice was for filestore? Or there is something to consider beyond the bluestore memory target? What about when using very dense nodes (for example, 60 12TB OSDs on a single node)? Keep in mind that it's only a target value, it will use more during recovery if you set a low value. We usually set a target of 3 GB per OSD and recommend 4 GB of RAM per OSD. RAM saving trick: use fewer PGs than recommended. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 years, 7 months

2
1
0 0

one read/write, many read only

by khaled atteya

Hi, Is it possible to do this scenario : If one open a file first , he will get read/write permissions and other will get read-only permission if they open the file after the first one. Thanks

4 years, 7 months

2
1
0 0

Doubt about ceph-iscsi and Vmware

by Gesiel Galvão Bernardes

Hi, I'm testing Ceph with Vmware, using Ceph-iscsi gateway. I reading documentation* and have doubts some points: - If I understanded, in general terms, for each VMFS datastore in VMware will match the an RBD image. (consequently in an RBD image I will possible have many VMWare disks). Its correct? - In documentation is this: "gwcli requires a pool with the name rbd, so it can store metadata like the iSCSI configuration". In part 4 of "Configuration", have: "Add a RBD image with the name disk_1 in the pool rbd". In this part, the use of "rbd" pool is a example and I could use any pool for storage of image, or the pool should be "rbd"? Resuming: gwcli require "rbd" pool for metadata and I could use any pool for image, or i will use just "rbd pool" for storage image and metadata? - How much memory ceph-iscsi use? Which is a good number of RAM? Regards Gesiel * https://docs.ceph.com/docs/master/rbd/iscsi-target-cli/

4 years, 7 months

4
5
0 0

Nautilus pg autoscale, data lost?

by Raymond Berg Hansen

Hi. I am new to ceph but have set it up on my homelab and started using it. It seemed very good intil I desided to try pg autoscale. After enabling autoscale to 3 of my pools, autoscale tried(?) to reduce the number of PGs and the pools are now unaccessible. I have tried to turn it off again, but no luck! Please help. ceph status: https://pastebin.com/88qNivJi (do not know why it lists 4 pools, I have 3. Maybe one of the pools I created after and deleted are in limbo?) ceph osd pool ls detail: https://pastebin.com/HZLz6yHL ceph health detail: https://pastebin.com/Kqd2YMtm

4 years, 7 months

5
9
1 0

ceph-osd@n crash dumps

by Del Monaco, Andrea

Hi list, After the nodes ran OOM and after reboot, we are not able to restart the ceph-osd@x services anymore. (Details about the setup at the end). I am trying to do this manually, so we can see the error but all i see is several crash dumps - this is just one of the OSDs which is not starting. Any idea how to get past this?? [root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id 83 --setuser ceph --setgroup ceph > /tmp/dump 2>&1 starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83 /var/lib/ceph/osd/ceph-83/journal /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/ x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Environment: [root@ceph001 ~]# uname -r 3.10.0-957.27.2.el7.x86_64 [root@ceph001 ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@ceph001 ~]# rpm -qa | grep -i ceph cm-config-ceph-release-mimic-8.2-73_cm8.2.noarch ceph-13.2.6-0.el7.x86_64 ceph-selinux-13.2.6-0.el7.x86_64 ceph-base-13.2.6-0.el7.x86_64 ceph-osd-13.2.6-0.el7.x86_64 cm-config-ceph-radosgw-systemd-8.2-6_cm8.2.noarch libcephfs2-13.2.6-0.el7.x86_64 ceph-common-13.2.6-0.el7.x86_64 ceph-mgr-13.2.6-0.el7.x86_64 cm-config-ceph-systemd-8.2-12_cm8.2.noarch ceph-mon-13.2.6-0.el7.x86_64 python-cephfs-13.2.6-0.el7.x86_64 ceph-mds-13.2.6-0.el7.x86_64 ceph osd tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 785.95801 root default -5 261.98599 host ceph001 1 hdd 7.27699 osd.1 up 1.00000 1.00000 3 hdd 7.27699 osd.3 down 1.00000 1.00000 6 hdd 7.27699 osd.6 down 1.00000 1.00000 9 hdd 7.27699 osd.9 down 0 1.00000 12 hdd 7.27699 osd.12 down 1.00000 1.00000 15 hdd 7.27699 osd.15 up 1.00000 1.00000 18 hdd 7.27699 osd.18 down 1.00000 1.00000 21 hdd 7.27699 osd.21 down 1.00000 1.00000 24 hdd 7.27699 osd.24 up 1.00000 1.00000 27 hdd 7.27699 osd.27 down 1.00000 1.00000 30 hdd 7.27699 osd.30 down 1.00000 1.00000 35 hdd 7.27699 osd.35 down 1.00000 1.00000 37 hdd 7.27699 osd.37 down 1.00000 1.00000 40 hdd 7.27699 osd.40 down 1.00000 1.00000 44 hdd 7.27699 osd.44 down 1.00000 1.00000 47 hdd 7.27699 osd.47 up 1.00000 1.00000 50 hdd 7.27699 osd.50 up 1.00000 1.00000 53 hdd 7.27699 osd.53 down 1.00000 1.00000 56 hdd 7.27699 osd.56 down 1.00000 1.00000 59 hdd 7.27699 osd.59 up 1.00000 1.00000 62 hdd 7.27699 osd.62 down 0 1.00000 65 hdd 7.27699 osd.65 down 1.00000 1.00000 68 hdd 7.27699 osd.68 down 1.00000 1.00000 71 hdd 7.27699 osd.71 down 1.00000 1.00000 74 hdd 7.27699 osd.74 down 1.00000 1.00000 77 hdd 7.27699 osd.77 up 1.00000 1.00000 80 hdd 7.27699 osd.80 down 1.00000 1.00000 83 hdd 7.27699 osd.83 up 1.00000 1.00000 86 hdd 7.27699 osd.86 down 1.00000 1.00000 88 hdd 7.27699 osd.88 down 1.00000 1.00000 91 hdd 7.27699 osd.91 down 1.00000 1.00000 94 hdd 7.27699 osd.94 down 1.00000 1.00000 97 hdd 7.27699 osd.97 down 1.00000 1.00000 100 hdd 7.27699 osd.100 down 0 1.00000 103 hdd 7.27699 osd.103 down 1.00000 1.00000 106 hdd 7.27699 osd.106 up 1.00000 1.00000 -3 261.98599 host ceph002 0 hdd 7.27699 osd.0 down 0 1.00000 4 hdd 7.27699 osd.4 up 1.00000 1.00000 7 hdd 7.27699 osd.7 up 1.00000 1.00000 11 hdd 7.27699 osd.11 down 1.00000 1.00000 13 hdd 7.27699 osd.13 up 1.00000 1.00000 16 hdd 7.27699 osd.16 down 1.00000 1.00000 19 hdd 7.27699 osd.19 down 0 1.00000 23 hdd 7.27699 osd.23 up 1.00000 1.00000 26 hdd 7.27699 osd.26 down 0 1.00000 29 hdd 7.27699 osd.29 down 0 1.00000 32 hdd 7.27699 osd.32 down 0 1.00000 33 hdd 7.27699 osd.33 down 0 1.00000 36 hdd 7.27699 osd.36 down 0 1.00000 39 hdd 7.27699 osd.39 down 1.00000 1.00000 43 hdd 7.27699 osd.43 up 1.00000 1.00000 46 hdd 7.27699 osd.46 up 1.00000 1.00000 49 hdd 7.27699 osd.49 down 1.00000 1.00000 52 hdd 7.27699 osd.52 down 1.00000 1.00000 55 hdd 7.27699 osd.55 down 0 1.00000 58 hdd 7.27699 osd.58 up 1.00000 1.00000 61 hdd 7.27699 osd.61 down 1.00000 1.00000 64 hdd 7.27699 osd.64 down 1.00000 1.00000 67 hdd 7.27699 osd.67 up 1.00000 1.00000 70 hdd 7.27699 osd.70 down 1.00000 1.00000 73 hdd 7.27699 osd.73 down 1.00000 1.00000 76 hdd 7.27699 osd.76 up 1.00000 1.00000 78 hdd 7.27699 osd.78 down 1.00000 1.00000 81 hdd 7.27699 osd.81 down 1.00000 1.00000 84 hdd 7.27699 osd.84 down 0 1.00000 87 hdd 7.27699 osd.87 down 1.00000 1.00000 90 hdd 7.27699 osd.90 down 0 1.00000 93 hdd 7.27699 osd.93 down 1.00000 1.00000 96 hdd 7.27699 osd.96 down 0 1.00000 99 hdd 7.27699 osd.99 down 0 1.00000 102 hdd 7.27699 osd.102 down 0 1.00000 105 hdd 7.27699 osd.105 up 1.00000 1.00000 -7 261.98599 host ceph003 2 hdd 7.27699 osd.2 up 1.00000 1.00000 5 hdd 7.27699 osd.5 down 1.00000 1.00000 8 hdd 7.27699 osd.8 up 1.00000 1.00000 10 hdd 7.27699 osd.10 down 0 1.00000 14 hdd 7.27699 osd.14 down 0 1.00000 17 hdd 7.27699 osd.17 up 1.00000 1.00000 20 hdd 7.27699 osd.20 down 0 1.00000 22 hdd 7.27699 osd.22 down 0 1.00000 25 hdd 7.27699 osd.25 up 1.00000 1.00000 28 hdd 7.27699 osd.28 up 1.00000 1.00000 31 hdd 7.27699 osd.31 down 0 1.00000 34 hdd 7.27699 osd.34 down 0 1.00000 38 hdd 7.27699 osd.38 down 0 1.00000 41 hdd 7.27699 osd.41 down 1.00000 1.00000 42 hdd 7.27699 osd.42 down 0 1.00000 45 hdd 7.27699 osd.45 up 1.00000 1.00000 48 hdd 7.27699 osd.48 up 1.00000 1.00000 51 hdd 7.27699 osd.51 down 1.00000 1.00000 54 hdd 7.27699 osd.54 up 1.00000 1.00000 57 hdd 7.27699 osd.57 down 1.00000 1.00000 60 hdd 7.27699 osd.60 down 1.00000 1.00000 63 hdd 7.27699 osd.63 up 1.00000 1.00000 66 hdd 7.27699 osd.66 down 1.00000 1.00000 69 hdd 7.27699 osd.69 up 1.00000 1.00000 72 hdd 7.27699 osd.72 up 1.00000 1.00000 75 hdd 7.27699 osd.75 down 1.00000 1.00000 79 hdd 7.27699 osd.79 up 1.00000 1.00000 82 hdd 7.27699 osd.82 down 1.00000 1.00000 85 hdd 7.27699 osd.85 down 1.00000 1.00000 89 hdd 7.27699 osd.89 down 0 1.00000 92 hdd 7.27699 osd.92 down 1.00000 1.00000 95 hdd 7.27699 osd.95 down 0 1.00000 98 hdd 7.27699 osd.98 down 0 1.00000 101 hdd 7.27699 osd.101 down 1.00000 1.00000 104 hdd 7.27699 osd.104 down 0 1.00000 107 hdd 7.27699 osd.107 up 1.00000 1.00000 Ceph status; [root@ceph001 ~]# ceph status cluster: id: 54052e72-6835-410e-88a9-af4ac17a8113 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs 48 osds down Reduced data availability: 2053 pgs inactive, 2043 pgs down, 7 pgs peering, 3 pgs incomplete, 126 pgs stale Degraded data redundancy: 18473/27200783 objects degraded (0.068%), 106 pgs degraded, 103 pgs undersized too many PGs per OSD (258 > max 250) services: mon: 3 daemons, quorum filler001,filler002,bezavrdat-master01 mgr: bezavrdat-master01(active), standbys: filler002, filler001 mds: cephfs-1/1/1 up {0=filler002=up:replay}, 1 up:standby osd: 108 osds: 32 up, 80 in; 16 remapped pgs data: pools: 2 pools, 2176 pgs objects: 2.73 M objects, 1.7 TiB usage: 2.3 TiB used, 580 TiB / 582 TiB avail pgs: 94.347% pgs not active 18473/27200783 objects degraded (0.068%) 1951 down 79 active+undersized+degraded 76 stale+down 23 stale+active+undersized+degraded 14 down+remapped 14 stale+active+clean 6 stale+peering 3 active+clean 3 stale+active+recovery_wait+degraded 2 incomplete 2 stale+down+remapped 1 stale+incomplete 1 stale+remapped+peering 1 active+recovering+undersized+degraded+remapped Thank you in advance! Regards, [Atos logo] Andrea Del Monaco HPC Consultant – Big Data & Security M: +31 612031174 Burgemeester Rijnderslaan 30 – 1185 MC Amstelveen – The Netherlands atos.net<https://atos.net/> [LinkedIn icon]<https://www.linkedin.com/company/1259/> [Twitter icon] <https://twitter.com/atos> [Facebook icon] <https://www.facebook.com/Atos/> [Youtube icon] <https://www.youtube.com/user/Atos> This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.

4 years, 7 months

1
1
0 0

moving EC pool from HDD to SSD without downtime

by Frank Schilder

I need to move a 6+2 EC pool from HDDs to SSDs while storage must remain accessible. All SSDs and HDDs are within the same failure domains. The crush rule in question is rule sr-rbd-data-one { id 5 type erasure min_size 3 max_size 8 step set_chooseleaf_tries 50 step set_choose_tries 1000 step take ServerRoom class hdd step chooseleaf indep 0 type host step emit } and I would be inclined just to change the entry "step take ServerRoom class hdd" to "step take ServerRoom class ssd" and wait for the dust to settle. However, this will almost certainly lead to all PGs being undersized and inaccessible as all objects are in the wrong place. I noticed that this is not an issue with PGs created by replicated rules as they can contain more OSDs than the replication factor while objects are moved. The same does not apply to EC rules. I suspect this is due to the setting "max_size 8", which does not allow for more than 6+2=8 OSDs being a member of a PG. What is the correct way to do what I need to do? Can I just set "max_size 16" and go? Will this work with EC rules? If not, what are my options? Thanks! ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

4 years, 7 months

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2019