June 2020 - ceph-users - lists.ceph.io

I like to understand why I have the "ceph mds slow requests" / "failing to respond to cache pressure" / "failing to respond to capability release" warnings

by Marc Roos

I am wondering what the problem is with these error messages I am having. I think it is not really related to capability release because those messages I am getting at a later time. I have two processes being affected by this a rsync on a ceph fuse mount and a rsync on a nfs-ganesha mount running at the same time. The ceph fuse mount gets at a later time the cache pressure notification. I think everything starts to go wrong after the mds is reporting on the xlock. This message is being logged: 2020-06-13 03:38:36.981 7fb5edd82700 0 log_channel(cluster) log [WRN] : slow request 244.377406 seconds old, received at 2020-06-13 03:34:32.604412: client_request(client.4019800:20354 setattr mtime=2020-04-22 12:58:47.000000 atime=2020-06-13 03:34:32.000000 #0x100001b9177 2020-06-13 03:34:32.604119 caller_uid=500, caller_gid=500{500,1,2,3,4,6,10,}) currently failed to xlock, waiting From the charts here you can see that caps are climbing and inodes are dropping. https://snapshot.raintank.io/dashboard/snapshot/4ij6AF1JoDzdZNI6WzCyewkn7Oq… (I am not entirely sure about the correctness of the used units, and the ino/caps chart should have dual y-axis) It looks like that if I start multiple concurrent rsyncs on the nfs-ganesha I can trigger this problem. Every time it seems I am getting the same xlock listed at the mds log. 2020-06-13 17:32:24.920 7fb5edd82700 0 log_channel(cluster) log [WRN] : slow request 240.294505 seconds old, received at 2020-06-13 17:28:24.626608: client_request(client.4021284:2468 setattr mtime=2020-04-22 12:58:47.000000 atime=2020-06-13 17:28:24.000000 #0x100001b9177 2020-06-13 17:28:24.626527 caller_uid=500, caller_gid=500{500,1,2,3,4,6,10,}) currently failed to xlock, waiting Question: I have seen this work around/solution[1] being offered a lot, but I do not get why I have the xlock. When does one get an xlock? I think when I can prevent the xlock, I do not need to set osd op queue options. [1] https://www.mail-archive.com/ceph-users@ceph.io/msg04421.html With 'work around': osd op queue = wpq osd op queue cut off = high

3 years, 10 months

1
0
0 0

ceph on rhel7 / centos7 till eol?

by Marc Roos

Will there be a ceph release available on rhel7 until the eol of rhel7?

3 years, 10 months

2
1
0 0

ceph grafana dashboards: osd device details keeps loading.

by Marc Roos

I have sometimes that the dashboard keeps loading when switching to 3 hours range. However I do not see any load on the prometheus server. Anyone having something similar?

3 years, 10 months

1
0
0 0

ceph grafana dashboards: rbd overview empty

by Marc Roos

The grafana dashboard 'rbd overview' is empty. Queries have measurements 'ceph_rbd_write_ops' that do not exist in prometheus (I think). Should I enable something more than just 'ceph mgr module enable prometheus' I am on Nautilus

3 years, 10 months

1
0
0 0

ceph grafana dashboards on git

by Marc Roos

I was wondering if I can fork and do pull request on the grafana dashboards at git[1] Clean up a bit inconsistent naming use of labels etc. [1] https://github.com/ceph/ceph/tree/master/monitoring/grafana/dashboards

3 years, 10 months

1
0
0 0

Upload speed slow for 7MB file cephfs+Samaba

by Amudhan P

Hi, I have Ceph Octopus 4 Node, each node has 12 disk cluster, which is configured with cephfs (replica 2) + exposed via samba for windows 10G client. When a user copies a folder containing 1000's of 7MB files from windows 10 client getting only a speed of 40MB/s. Client and Ceph nodes all connected in 10G. In the same setup copying 1GB file from windows client to samba getting 90 MB/s. Are there any kernel or network tunning needs to be done? Any suggestions? regards Amudhan P

3 years, 10 months

1
0
0 0

help with failed osds after reboot

by Seth Duncan

I had 5 of 10 osds fail on one of my nodes, after reboot the other 5 osds failed to start. I have tried running ceph-disk activate-all and get back and error message about the cluster fsid not matching in /etc/ceph/ceph.conf Has anyone experienced an issue such as this? ******************************************************************* IMPORTANT MESSAGE FOR RECIPIENTS IN THE U.S.A.: This message may constitute an advertisement of a BD group's products or services or a solicitation of interest in them. If this is such a message and you would like to opt out of receiving future advertisements or solicitations from this BD group, please forward this e-mail to optoutbygroup(a)bd.com. [BD.v1.0] ******************************************************************* This message (which includes any attachments) is intended only for the designated recipient(s). It may contain confidential or proprietary information and may be subject to the attorney-client privilege or other confidentiality protections. If you are not a designated recipient, you may not review, use, copy or distribute this message. If you received this in error, please notify the sender by reply e-mail and delete this message. Thank you. ******************************************************************* Corporate Headquarters Mailing Address: BD (Becton, Dickinson and Company) 1 Becton Drive Franklin Lakes, NJ 07417 U.S.A.

3 years, 10 months

1
0
0 0

Adding OSDs

by Will Payne

Hi, Total newbie question - I'm new to Ceph and am setting up a small test cluster. I've set up five nodes and can see the available drives but I'm unsure on exactly how I can add an OSD and specify the locations for WAL+DB. Maybe my Google-fu is weak but the only guides I can find refer to ceph-deploy which, as far as I can see, is deprecated. Guides which talk about using cephadm / ceph only mention adding a drive but not specifying the WAL+DB locations. I want to add HDDs as OSDs and put the WAL and DB onto separate LVs on an SSD. How? Will

3 years, 10 months

2
7
0 0

failing to respond to capability release / MDSs report slow requests / xlock?

by Marc Roos

I am still having this issue with nfs-ganesha with nautilus. I assume I do not have to change configuration of nfs-ganesha as mentioned here[1]. Since I did not have any issues with Luminous. Does someone with similar symptoms have also this xlock, waiting message? How to resolve this? 2020-06-11 14:14:19.403 7fb5f0587700 1 mds.a Updating MDS map to version 32035 from mon.0 2020-06-11 14:19:35.287 7fb5edd82700 0 log_channel(cluster) log [WRN] : 1 slow requests, 1 included below; oldest blocked for > 33.290954 secs 2020-06-11 14:19:35.287 7fb5edd82700 0 log_channel(cluster) log [WRN] : slow request 33.290953 seconds old, received at 2020-06-11 14:19:01.997415: client_request(client. 4020294:34354 setattr mtime=2020-04-22 12:58:47.000000 atime=2020-06-11 14:19:02.000000 #0x100001b9177 2020-06-11 14:19:01.997149 caller_uid=500, caller_gid=500{500,1,2,3, 4,6,10,}) currently failed to xlock, waiting [1] https://tracker.ceph.com/issues/44976#note-23

3 years, 10 months

1
0
0 0

radosgw-admin sync status output

by DHilsbos＠performair.com

All; We've been running our Ceph clusters (Nautilus / 14.2.8) for a while now (roughly 9 months), and I've become curious about the output of the "radosgw-admin sync status" command. Here's a the output from our secondary zone: realm <guid> (<realm-name>) zonegroup <guid> (<zonegroup name>) zone <guid> (<zone name>) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: <guid> (<master zone name>) syncing full sync: 0/128 shards incremental sync: 128/128 shards 3 shards are recovering recovering shards: [39,41,66] Radosgw is in use, so the active recovery for data doesn't really surprise me. What I am curious about is these 2 lines: full sync: 0/64 shards full sync: 0/128 shards Is this considered normal? If so, why have those lines present in this output? Are they relevant to a different type of replication than what we are doing (I'm not aware of a different type of radosgw replication, but I'm not omniscient)? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. DHilsbos(a)PerformAir.com www.PerformAir.com

3 years, 10 months

4
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2020