January 2023 - ceph-users

Image corrupt after restoring snapshot via Proxmox

by Roel van Meer

Hi, We have had a situation three times where rbd images seem to be corrupt after restoring a snapshot, and I'm looking for advice on how to investigate this. We're running Proxmox 7 with Ceph Octopus (Proxmox build, 15.2.17-pve1). Every time the problem has happened, it has happened after these actions were done with the VM: (Yesterday) - VM stopped - Snapshot created - VM started - VM stopped - Snapshot restored - VM started (OK) - Nightly backup with vzdump to Proxmox Backup Server (Today) - VM stopped - Snapshot restored - VM does not start On previous occasions we tried to find a solution and when we couldn't, we restored the VM from backup, which solved the problem. Now this happened to a test system, so we've left the situation as is and maybe get to the root cause. Some observations: * We're using krbd * The PBS backups don't allow file restore if the backup was made from a "broken" image * After mapping the current image, it doesn't seem to contain a partition table There's a thread on the Proxmox forum about this issues as well[1]. If anyone could give some advice about how to proceed from here, I'd be very grateful. Best regards, Roel PS: An upgrade to Pacific has already been planned. [1] https://forum.proxmox.com/threads/vm-disks-corrupt-after-reverting-to-snaps… -- Wij zijn ISO 27001 gecertificeerd 1A First Alternative BV T: +31 (0)88 0016405 W: https://www.1afa.com

1 year, 3 months

2
1
0 0

Problems with autoscaler (overlapping roots) after changing the pool class

by Massimo Sgaravatto

Dear all I have just changed the crush rule for all the replicated pools in the following way: ceph osd crush rule create-replicated replicated_hdd default host hdd ceph osd pool set <poolname> crush_rule replicated_hdd See also this [*] thread Before applying this change, these pools were all using the replicated_ruleset rule where the class is not specified. I am noticing now a problem with the autoscaler: "ceph osd pool autoscale-status" doesn't report any output and the mgr log complains about overlapping roots: [pg_autoscaler ERROR root] pool xyz has overlapping roots: {-18, -1} Indeed: # ceph osd crush tree --show-shadow ID CLASS WEIGHT TYPE NAME -18 hdd 1329.26501 root default~hdd -17 hdd 329.14154 rack Rack11-PianoAlto~hdd -15 hdd 54.56085 host ceph-osd-04~hdd 30 hdd 5.45609 osd.30 31 hdd 5.45609 osd.31 ... ... -1 1329.26501 root default -7 329.14154 rack Rack11-PianoAlto -8 54.56085 host ceph-osd-04 30 hdd 5.45609 osd.30 31 hdd 5.45609 osd.31 ... I have already read about this behavior but I have no clear ideas how to fix the problem. I read somewhere that the problem happens when there are rules that force some pools to only use one class and there are also pools which does not make any distinction between device classes All the replicated pools are using the replicated_hdd pool but I also have some EC pools which are using a profile where the class is not specified. As far I understand, I can't force these pools to use only the hdd class: according to the doc I can't change this profile specifying the hdd class (or at least the change wouldn't be applied to the existing EC pools) Any suggestions ? The crush map is available at https://cernbox.cern.ch/s/gIyjbQbmoTFHCrr, if you want to have a look Many thanks, Massimo [*] https://www.mail-archive.com/ceph-users@ceph.io/msg18534.html

1 year, 3 months

2
2
0 0

ceph cluster iops low

by petersun＠raksmart.com

I have my ceph IOPS very low with over 48 SSD backed on NVMs for DB/WAL on four physical servers. The whole cluster has only about 20K IO total. Looks the IOs are suppressed over bottleneck somewhere. Dstat shows a lots csw and interrupts over 150K, while I am using FIO bench 4K 128QD test. I check SSD throughput only about 40M at 250 ios each. Network are total 20G and not full of traffic. CPU are around 50% idle on 2*E5 2950v2 each node. Is it normal to get that high and how to reduce it? where else could be the bottleneck?

1 year, 3 months

3
3
0 0

Re: 16.2.11 pacific QE validation status

by Yuri Weinstein

Happy New Year all! This release remains to be in "progress"/"on hold" status as we are sorting all infrastructure-related issues. Unless I hear objections, I suggest doing a full rebase/retest QE cycle (adding PRs merged lately) since it's taking much longer than anticipated when sepia is back online. Objections? Thx YuriW On Thu, Dec 15, 2022 at 9:14 AM Yuri Weinstein <yweinste(a)redhat.com> wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/58257#note-1 > Release Notes - TBD > > Seeking approvals for: > > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still being > tested and will be merged soon) > rook - Sébastien Han > cephadm - Adam > dashboard - Ernesto > rgw - Casey (rwg will be rerun on the latest SHA1) > rbd - Ilya, Deepika > krbd - Ilya, Deepika > fs - Venky, Patrick > upgrade/nautilus-x (pacific) - Neha, Laura > upgrade/octopus-x (pacific) - Neha, Laura > upgrade/pacific-p2p - Neha - Neha, Laura > powercycle - Brad > ceph-volume - Guillaume, Adam K > > Thx > YuriW

1 year, 3 months

12
21
0 0

Integrating openstack/swift to ceph cluster

by Michel Niyoyita

Hello team,, I have deployed ceph pacific cluster using ceph-ansible running on ubuntu 20.04 which have 3 OSD hosts and 3 mons on each OSD host we have 20 osd . I am integrating swift in the cluster but I fail to find the policy and upload objects in the container . I have deployed rgwloadbalancer on top below is my ceph.conf configuration and rgw logs. Kindly help if I am missing configs in my ceph.conf and advise. [client] rbd_default_features = 1 [client.rgw.ceph-osd1] rgw_dns_name = ceph-osd1 [client.rgw.ceph-osd1.rgw0] host = ceph-osd1 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-osd1.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-osd1.rgw0.log rgw frontends = beast endpoint=10.10.13.13:8080 rgw thread pool size = 512 rgw_dns_name = ceph-osd1 rgw_frontends = "beast port=8080" rgw_enable_usage_log = true rgw_thread_pool_size = 512 rgw_keystone_api_version = 3 rgw_keystone_url = http://10.10.13.31:5000 rgw_keystone_admin_user = admin rgw_keystone_admin_password = WyB9v1GPAqtxsySrCjZpa1L2U2JmYVF6zviP6bKk rgw_keystone_admin_domain = default rgw_keystone_admin_project = admin rgw_keystone_accepted_roles = admin,Member,_member_,member rgw_keystone_verify_ssl = false rgw_s3_auth_use_keystone = true [client.rgw.ceph-osd2] rgw_dns_name = ceph-osd2 [client.rgw.ceph-osd2.rgw0] host = ceph-osd2 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-osd2.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-osd2.rgw0.log rgw frontends = beast endpoint=10.10.13.14:8080 rgw thread pool size = 512 rgw_dns_name = ceph-osd2 rgw_frontends = "beast port=8080" rgw_enable_usage_log = true rgw_thread_pool_size = 512 rgw_keystone_api_version = 3 rgw_keystone_url = http://10.10.13.31:5000 rgw_keystone_admin_user = admin rgw_keystone_admin_password = WyB9v1GPAqtxsySrCjZpa1L2U2JmYVF6zviP6bKk rgw_keystone_admin_domain = default rgw_keystone_admin_project = admin rgw_keystone_accepted_roles = admin,Member,_member_,member rgw_keystone_verify_ssl = false rgw_s3_auth_use_keystone = true [client.rgw.ceph-osd3] rgw_dns_name = ceph-osd3 [client.rgw.ceph-osd3.rgw0] host = ceph-osd3 keyring = /var/lib/ceph/radosgw/ceph-rgw.ceph-osd3.rgw0/keyring log file = /var/log/ceph/ceph-rgw-ceph-osd3.rgw0.log rgw frontends = beast endpoint=10.10.13.15:8080 rgw thread pool size = 512 rgw_dns_name = ceph-osd3 rgw_frontends = "beast port=8080" rgw_enable_usage_log = true rgw_thread_pool_size = 512 rgw_keystone_api_version = 3 rgw_keystone_url = http://10.10.13.31:5000 rgw_keystone_admin_user = admin rgw_keystone_admin_password = WyB9v1GPAqtxsySrCjZpa1L2U2JmYVF6zviP6bKk rgw_keystone_admin_domain = default rgw_keystone_admin_project = admin rgw_keystone_accepted_roles = admin,Member,_member_,member rgw_keystone_verify_ssl = false rgw_s3_auth_use_keystone = true # Please do not change this file directly since it is managed by Ansible and will be overwritten [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster network = 10.10.13.0/24 fsid = c8ffac77-6e6e-479a-92be-08f9b46453e7 mon host = [v2:10.10.13.10:3300,v1:10.10.13.10:6789],[v2:10.10.13.11:3300 ,v1:10.10.13.11:6789],[v2:10.10.13.12:3300,v1:10.10.13.12:6789] mon initial members = ceph-mon1,ceph-mon2,ceph-mon3 mon_allow_pool_delete = True mon_max_pg_per_osd = 400 osd pool default crush rule = -1 osd_pool_default_min_size = 2 osd_pool_default_size = 3 public network = 0.0.0.0/0 2023-01-24T15:06:33.035+0000 7f0b4fe7f700 1 beast: 0x7f0d1055c6e0: 10.10.13.15 - anonymous [24/Jan/2023:15:06:33.035 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.000000000s 2023-01-24T15:06:33.855+0000 7f0b50e81700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:33.855+0000 7f0b50e81700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:33.855+0000 7f0b50e81700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:33.855+0000 7f0b50e81700 1 beast: 0x7f0d1055c6e0: 10.10.13.13 - anonymous [24/Jan/2023:15:06:33.855 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:33.907+0000 7f0b41e63700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:33.907+0000 7f0b42664700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:33.907+0000 7f0b42664700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:33.907+0000 7f0b42664700 1 beast: 0x7f0d1055c6e0: 10.10.13.14 - anonymous [24/Jan/2023:15:06:33.907 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:35.039+0000 7f0b37e4f700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:35.039+0000 7f0b37e4f700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:35.039+0000 7f0b37e4f700 1 beast: 0x7f0d1055c6e0: 10.10.13.15 - anonymous [24/Jan/2023:15:06:35.039 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.000000000s 2023-01-24T15:06:35.855+0000 7f0b32644700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:35.855+0000 7f0b32644700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:35.855+0000 7f0b32644700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:35.855+0000 7f0b32644700 1 beast: 0x7f0d1055c6e0: 10.10.13.13 - anonymous [24/Jan/2023:15:06:35.855 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:35.911+0000 7f0b31642700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:35.911+0000 7f0b31642700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:35.911+0000 7f0b31642700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:35.911+0000 7f0b31642700 1 beast: 0x7f0d1055c6e0: 10.10.13.14 - anonymous [24/Jan/2023:15:06:35.911 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:37.039+0000 7f0b2fe3f700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:37.043+0000 7f0b2fe3f700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.003999876s ====== 2023-01-24T15:06:37.043+0000 7f0b2fe3f700 1 beast: 0x7f0d1055c6e0: 10.10.13.15 - anonymous [24/Jan/2023:15:06:37.039 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.003999876s 2023-01-24T15:06:37.859+0000 7f0b28630700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:37.859+0000 7f0b28630700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:37.859+0000 7f0b28630700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:37.859+0000 7f0b28630700 1 beast: 0x7f0d1055c6e0: 10.10.13.13 - anonymous [24/Jan/2023:15:06:37.859 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:37.911+0000 7f0b1be17700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:37.911+0000 7f0b1be17700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:37.911+0000 7f0b1be17700 1 beast: 0x7f0d1055c6e0: 10.10.13.14 - anonymous [24/Jan/2023:15:06:37.911 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.000000000s 2023-01-24T15:06:39.043+0000 7f0b15e0b700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:39.043+0000 7f0b15e0b700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:39.043+0000 7f0b15e0b700 1 beast: 0x7f0d1055c6e0: 10.10.13.15 - anonymous [24/Jan/2023:15:06:39.043 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.000000000s 2023-01-24T15:06:39.859+0000 7f0b13e07700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:39.859+0000 7f0b13e07700 0 ERROR: client_io->complete_request() returned Connection reset by peer 2023-01-24T15:06:39.859+0000 7f0b13e07700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:39.859+0000 7f0b13e07700 1 beast: 0x7f0d1055c6e0: 10.10.13.13 - anonymous [24/Jan/2023:15:06:39.859 +0000] "HEAD / HTTP/1.0" 200 0 - - - latency=0.000000000s 2023-01-24T15:06:39.915+0000 7f0b12604700 1 ====== starting new request req=0x7f0d1055c6e0 ===== 2023-01-24T15:06:39.915+0000 7f0b12604700 1 ====== req done req=0x7f0d1055c6e0 op status=0 http_status=200 latency=0.000000000s ====== 2023-01-24T15:06:39.915+0000 7f0b12604700 1 beast: 0x7f0d1055c6e0: 10.10.13.14 - anonymous [24/Jan/2023:15:06:39.915 +0000] "HEAD / HTTP/1.0" 200 5 - - - latency=0.000000000s Best Regards Michel

1 year, 3 months

1
0
0 0

Mds crash at cscs

by Lo Re Giuseppe

Dear all, We have started to use more intensively cephfs for some wlcg related workload. We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, most of the other configs are default ones. One of them has crashed this night leaving the log below. Do you have any hint on what could be the cause and how to avoid it? Regards, Giuseppe [root@naret-monitor03 ~]# journalctl -u ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service ... Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific > Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 2: abort() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 6: __gxx_personality_v0() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 8: _Unwind_Resume() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 11: gsignal() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 12: abort() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf] Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: 33: clone() Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede> Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: --- begin dump of recent events --- Jan 19 04:49:40 naret-monitor03 ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]: terminate called recursively Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service: Main process exited, code=exited, status=127/n/a Jan 19 04:49:43 naret-monitor03 systemd[1]: ceph-63334166-d991-11eb-99de-40a6b72108d0(a)mds.cephfs.naret-monitor03.lqppte.service: Failed with result 'exit-code'.

1 year, 3 months

2
1
0 0

rbd_mirroring_delete_delay not removing images with snaps

by Tyler Brekke

We use the rbd-mirror as a way to migrate volumes between clusters. The process is enable mirroring on the image to migrate, demote on the primary cluster, promote on the secondary cluster, and then disable mirroring on the image. When we started using `rbd_mirroring_delete_delay` so we could retain a backup of the source image, we noticed volumes with unprotected snaps do not get purged from the trash. Previously, the image and all its snaps would be successfully removed after disabling mirroring. I would expect a similar function when using `rbd_mirroring_delete_delay` as well. Is rbd trash just overly cautious here? -- Tyler Brekke Senior Engineer I tbrekke(a)digitalocean.com ------------------------------ We're Hiring! <https://do.co/careers> | @digitalocean <https://twitter.com/digitalocean> | YouTube <https://www.youtube.com/digitalocean>

1 year, 3 months

1
0
0 0

Pools and classes

by Massimo Sgaravatto

Dear all I have a ceph cluster where so far all OSDs have been rotational hdd disks (actually there are some SSDs, used only for block.db and wal.db) I now want to add some SSD disks to be used as OSD. My use case is: 1) for the existing pools keep using only hdd disks 2) create some new pools using only sdd disks Let's start with 1 (I didn't have added yet the ssd disks in the cluster) I have some replicated pools and some ec pools. The replicated pools are using a replicated_ruleset rule [*]. I created a new "replicated_hdd" rule [**] using the command: ceph osd crush rule create-replicated replicated_hdd default host hdd I then changed the crush rule of a existing pool (that was using 'replicated_ruleset') using the command: ceph osd pool set <poolname> crush_rule replicated_hdd This triggered the remapping of some pgs and therefore some data movement. Is this normal/expected, since for the time being I have only hdd osds ? Thanks, Massimo [*] rule replicated_ruleset { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } [**] rule replicated_hdd { id 7 type replicated min_size 1 max_size 10 step take default class hdd step chooseleaf firstn 0 type host step emit }

1 year, 3 months

2
3
0 0

trouble deploying custom config OSDs

by seccentral

Hi. I'm new to ceph, been toying around in a virtual environment (for now) trying to understand how to manage it. I made 3 vms in proxmox and provisioned a bunch of virtual drives to each. Bootstrapped following the quincy-branch official documentation. These are the drives: > /dev/sdb 128.00 GB sdb True False QEMU HARDDISK (HDD) > /dev/sdc 128.00 GB sdc True False QEMU HARDDISK (HDD) > /dev/sdd 32.00 GB sdd False False QEMU HARDDISK (SSD) This is the lvdisplay on /dev/sdd after creating two lvs: > db-0 dev0-db-0 -wi-a----- 16.00g > > db-1 dev0-db-0 -wi-a----- <16.00g My curiosity was to have OSDs with data=raw + block.db=lv created like this: > ceph-volume raw prepare --bluestore --data /dev/sdd --block.db /dev/mapper/dev0--db--0--db--0 This required tinkering with permissions and temporarily modifying /etc/ceph/ceph.keyring because by default it wasn't allowing access, RADOS complained about unauthorized client.boostrap-osd something but I got it to work eventually. (By the way, In a real environment, would RAW be of any benefit vs lvm everywhere ?) So now I have created 2 OSDs, each with the journal on the SSD and the data on the HDD. I repeated the steps on my other two boxes (btw, can't this be done from the local box via ceph cli ?) Now I am trying (and failing) to start OSD daemons on this host. I tried apply osd --all-available-devices, it tells me "Scheduled osd.all-available-devices update..." but nothing happens. I'm also not sure how to apply osds from a yaml file since that would provision them and .. they're already provisioned using the ceph-volume command above... right ? I'm having trouble getting a lot of things to work, this is just one of them and even if I feel nostalgic using mailing lists, It's inefficient. Is there any interactive community where I can find some people usually online and talk to them realtime like discord/slack etc ? I tried irc but most are afk. Thanks Sent with [Proton Mail](https://proton.me/) secure email.

1 year, 3 months

2
3
0 0

Re: Ceph Disk Prediction module issues

by Nikhil Shah

Hey, did you ever find a resolution for this?

1 year, 3 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2023