October 2019 - ceph-users

HeartbeatMap FAILED assert(0 == "hit suicide timeout")

by 潘东元

hi all, my osd hit suicide timeout. some log: 2019-10-10 03:53:13.017760 7f1ab886e700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=2 pgs=287284 cs=41 l=0 c=0x21431760).fault, initiating reconnect 2019-10-10 03:53:13.017799 7f1ab967c700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.25:6802/24020795 pipe(0x257eb80 sd=69 :47977 s=1 pgs=287284 cs=42 l=0 c=0x21431760).fault 2019-10-10 03:53:15.890773 7f1acdec3700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=2 pgs=423672 cs=85 l=0 c=0x21447900).fault, initiating reconnect 2019-10-10 03:53:15.890890 7f1aba288700 0 -- 192.168.1.5:6810/1028846 >> 192.168.1.19:6804/53020865 pipe(0x37537580 sd=59 :60121 s=1 pgs=423672 cs=86 l=0 c=0x21447900).fault 2019-10-10 03:53:16.209368 7f1addc3e700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f1ac29a3700' had timed out after 15 2019-10-10 03:53:16.209382 7f1addc3e700 1 heartbeat_map is_healthy 'OSD::op_tp thread 0x7f1ac29a3700' had suicide timed out after 150 2019-10-10 03:53:16.210765 7f1addc3e700 -1 common/HeartbeatMap.cc: In function 'bool ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' thread 7f1addc3e700 time 2019-10-10 03:53:16.209415 common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3) 1: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, long)+0x12b) [0xaf2b6b] 2: (ceph::HeartbeatMap::is_healthy()+0xa7) [0xaf3497] 3: (ceph::HeartbeatMap::check_touch_file()+0x28) [0xaf3988] 4: (CephContextServiceThread::entry()+0x13f) [0xb0353f] 5: (()+0x79d1) [0x7f1ae0b3c9d1] 6: (clone()+0x6d) [0x7f1adfaccb5d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. can you give some advice on troubleshooting?

4 years, 6 months

3
4
0 0

Re: MDS rejects clients causing hanging mountpoint on linux kernel client

by Manuel Riel

I noticed a similar issue tonight. Still looking into the details, but here are the client logs I Oct 9 19:27:59 mon5-cx kernel: libceph: mds0 ***:6800 socket closed (con state OPEN) Oct 9 19:28:01 mon5-cx kernel: libceph: mds0 ***:6800 connection reset Oct 9 19:28:01 mon5-cx kernel: libceph: reset on mds0 Oct 9 19:28:01 mon5-cx kernel: ceph: mds0 closed our session Oct 9 19:28:01 mon5-cx kernel: ceph: mds0 reconnect start Oct 9 19:28:01 mon5-cx kernel: ceph: mds0 reconnect denied Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff9109011c9980 1099517142146 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff91096cc788d0 1099517142307 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff9107da741f10 1099517142312 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff9109d5c40e60 1099517141612 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff9108c9337da0 1099517142313 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff9109d5c70340 1099517141565 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff910955acf810 1099517141792 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff91095ff56cf0 1099517142006 Oct 9 19:28:01 mon5-cx kernel: ceph: dropping dirty+flushing Fw state for ffff91096cc7f280 1099517142309 Oct 9 19:28:01 mon5-cx kernel: libceph: mds0 ***:6800 socket closed (con state NEGOTIATING) Oct 9 19:28:02 mon5-cx kernel: ceph: mds0 rejected session Oct 9 19:28:02 mon5-cx monit: Lookup for '/srv/repos' filesystem failed -- not found in /proc/self/mounts Oct 9 19:28:02 mon5-cx monit: Filesystem '/srv/repos' not mounted Oct 9 19:28:02 mon5-cx monit: 'repos' unable to read filesystem '/srv/repos' state ... Oct 9 19:28:09 mon5-cx kernel: ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Oct 9 19:28:24 mon5-cx kernel: ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Oct 9 19:28:39 mon5-cx kernel: ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm ... Oct 9 21:27:09 mon5-cx kernel: ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Oct 9 21:27:24 mon5-cx kernel: ceph: get_quota_realm: ino (1.fffffffffffffffe) null i_snap_realm Oct 9 21:27:27 mon5-cx monit: Lookup for '/srv/repos' filesystem failed -- not found in /proc/self/mounts Oct 9 21:27:27 mon5-cx monit: Filesystem '/srv/repos' not mounted Oct 9 21:27:27 mon5-cx monit: 'repos' unable to read filesystem '/srv/repos' state Oct 9 21:27:27 mon5-cx monit: 'repos' trying to restart

4 years, 6 months

1
0
0 0

Re: Unexpected increase in the memory usage of OSDs

by Anthony D'Atri

>>> Do you have statistics on the size of the OSDMaps or count of them >>> which were being maintained by the OSDs? >> No, I don't think so. How can I find this information? > > Hmm I don't know if we directly expose the size of maps. There are > perfcounters which expose the range of maps being kept around but I > don't know their names off-hand. FWIW I’ve been told that size of an OSDmap is roughly equivalent to `ceph pg dump |wc`, which if true would seem to mean that they’re trivially small for most purposes. Reality of course may be much different and/or nuanced.

4 years, 6 months

1
0
0 0

14.2.4 Deduplication

by The Zombie Hunter

From my initial testing it looks like 14.2.4 fully supports the deduplication mentioned here: https://docs.ceph.com/docs/master/dev/deduplication/ However, I'm not sure where the struct object_manifest script goes in relation to foo and foo-chunk, and I'm not aware of what the offsets/caspool should be. If this still isn't fully implemented how does the dedup tool work? If I remove a file but it exists elsewhere on the volume, will it be purged or would the tool need to run again to clear the data?

4 years, 6 months

3
2
0 0

Can't Modify Zone

by Mac Wynkoop

When trying to modify a zone in one of my clusters to promote it to the master zone, I get this error: ~ $ radosgw-admin zone modify --rgw-zone atl --master failed to update zonegroup: 2019-10-09 15:41:53.409 7f9ecae26840 0 ERROR: found existing zone name atl (94d26f94-d64c-40d1-9a33-56afa948d86a) in zonegroup seast (17) File exists ~ $ Anyone have any ideas what's going on here? Thanks all, Mac

4 years, 6 months

1
0
0 0

Sick Nautilus cluster, OOM killing OSDs, lots of osdmaps

by Aaron Johnson

Hi all I have a smallish test cluster (14 servers, 84 OSDs) running 14.2.4. Monthly OS patching and reboots that go along with it have resulted in the cluster getting very unwell. Many of the servers in the cluster are OOM-killing the ceph-osd processes when they try to start. (6 OSDs per server running on filestore.). Strace shows the ceph-osd processes are spending hours reading through the 220k osdmap files after being started. This behavior started after we recently made it about 72% full to see how things behaved. We also upgraded it to Nautilus 14.2.2 at about the same time. I’ve tried starting just one OSD per server at a time in hopes of avoiding the OOM killer. Also tried setting noin, rebooting the whole cluster, waiting a day, then marking each of the OSDs in manually. The end result is the same either way. About 60% of PGs are still down, 30% are peering, and the rest are in worse shape. Anyone out there have suggestions about how I should go about getting this cluster healthy again? Any ideas appreciated. Thanks! - Aaron

4 years, 6 months

2
1
0 0

Is it possible to have a 2nd cephfs_data volume? [Openstack]

by Jeremi Avenant

Good morning Q: Is it possible to have a 2nd cephfs_data volume and exposing it to the same openstack environment? Reason being: Our current profile is configured with erasure code value of k=3,m=1 (rack level) but we looking to buy another +- 6PB of storage w/ controllers and was thinking of moving to an erasure profile of k=2,m=1 since we're not so big on data redundancy but more on disk space + performance. For what I understand you can't change erasure profiles, therefor we need to essentially build a new ceph cluster but we're trying to understand if we can attach it to the existing openstack platform, then gradually move all the data over from the old cluster into the new cluster, destroy the old cluster and integrated it with the new one. If anyone has any recommendations to get more space out + performance at the cost of data redundancy with at least 1 rack please let me know as well. Regards -- *Jeremi-Ernst Avenant, Mr.*Cloud Infrastructure Specialist Inter-University Institute for Data Intensive Astronomy 5th Floor, Department of Physics and Astronomy, University of Cape Town Tel: 021 959 4137 <0219592327> Web: www.idia.ac.za <http://www.uwc.ac.za/> E-mail (IDIA): jeremi(a)idia.ac.za <mfundo(a)idia.ac.za> Rondebosch, Cape Town, 7600

4 years, 6 months

2
1
0 0

CephFS no permissions for subdir

by Lars Täuber

Hi! Is it possible and if yes how to remove any permission to a subdir for a user. I'd tried to make this: ceph auth caps client.XYZ mon 'allow r' mds 'allow r, allow rws path=/XYZ, allow path=/ABC' osd 'allow rw pool=cephfs_data' but got: Error EINVAL: mds capability parse failed, stopped at ', allow path=/ABC' of 'allow r, allow rws path=/XYZ, allow path=/ABC' Thanks Lars

4 years, 6 months

2
3
0 0

Re: ceph-mgr Module "zabbix" cannot send Data

by Ingo Schmidt

Thx for the hint. I fiddled around with the configuration and found this: > root@vm-2:~# ceph zabbix send > Failed to send data to Zabbix while > root@vm-2:~# zabbix_sender -vv -z 192.168.15.253 -p 10051 -s vm-2 -k ceph.num_osd -o 32 > zabbix_sender [1724513]: DEBUG: answer [{"response":"success","info":"processed: 1; failed: 0; total: 1; seconds spent: 0.000041"}] > info from server: "processed: 1; failed: 0; total: 1; seconds spent: 0.000041" > sent: 1; skipped: 0; total: 1 works just fine. I figured out that it could be a hostname mismatch betweend what "ceph zabbix send" transmits, and the hostname that is configured on the zabbix server. And well... it's almost embarassing that I missed this for about 3 months now but: The hostname the ceph zabbix module was submitting was in capital letters, while the hostname configured in zabbix was lowercase, even though, the hostname for that machine is in fact lowercase. I don't know why the ceph zabbix module makes it uppercase. I configured the host on zabbix with capital letters and now it works... kind regards Ingo Schmidt ---------------------------------------- IT-Department Island municipality Langeoog with in-house operations Tourismus Service and Schiffahrt

4 years, 6 months

2
1
0 0

Large omap objects in radosgw .usage pool: is there a way to reshard the rgw usage log?

by Florian Haas

Hi, I am currently dealing with a cluster that's been in use for 5 years and during that time, has never had its radosgw usage log trimmed. Now that the cluster has been upgraded to Nautilus (and has completed a full deep-scrub), it is in a permanent state of HEALTH_WARN because of one large omap object: $ ceph health detail HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool '.usage' As far as I can tell, there are two thresholds that can trigger that warning: * The default omap object size warning threshold, osd_deep_scrub_large_omap_object_value_sum_threshold, is 1G. * The default omap object key count warning threshold, osd_deep_scrub_large_omap_object_key_threshold, is 200000. In this case, this was the original situation: osd.6 [WRN] : Large omap object found. Object: 15:169282cd:::usage.20:head Key count: 5834118 Size (bytes): 917351868 So that's 5.8M keys (way above threshold) and 875 MiB total object size (below threshold, but not by much). The usage log in this case was no longer needed that far back, so I trimmed it to keep only the entries from this year (radosgw-admin usage trim --end-date 2018-12-31), a process that took upward of an hour. After the trim (and a deep-scrub of the PG in question¹), my situation looks like this: osd.6 [WRN] Large omap object found. Object: 15:169282cd:::usage.20:head Key count: 1185694 Size (bytes): 187061564 So both the key count and the total object size have diminished by about 80%, which is about what you expect when you trim 5 years of usage log down to 1 year of usage log. However, my key count is still almost 6 times the threshold. I am aware that I can silence the warning by increasing osd_deep_scrub_large_omap_object_key_threshold by a factor of 10, but that's not my question. My question is what I can do to prevent the usage log from creating such large omap objects in the first place. Now, there's something else that you should know about this radosgw, which is that it is configured with the defaults for usage log sharding: rgw_usage_max_shards = 32 rgw_usage_max_user_shards = 1 ... and this cluster's radosgw is pretty much being used by a single application user. So the fact that it's happy to shard the usage log 32 ways is irrelevant as long as it puts the usage log for one user all into one shard. So, I am assuming that if I bump rgw_usage_max_user_shards up to, say, 16 or 32, all *new* usage log entries will be sharded. But I am not aware of any way to reshard the *existing* usage log. Is there such a thing? Otherwise, it seems like the only option in this situation would be to clear the usage log altogether, and tweak the sharding knobs, which should at least make the problem not reappear. Or, else, bump osd_deep_scrub_large_omap_object_key_threshold and just live with the large object. Also, is anyone aware of any adverse side effects of increasing these thresholds, and/or changing the usage log sharding settings, that I should keep in mind here? Thanks in advance for your thoughts. Cheers, Florian ¹For anyone reading this in the archives because they've run into the same problem, and wondering how you find out which PGs in a pool have too-large objects, here's a jq one-liner: ceph --format=json pg ls-by-pool <poolname> \ | jq '.pg_stats[]|select(.stat_sum.num_large_omap_objects>0)'

4 years, 6 months

1
1
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2019