June 2021 - ceph-users - lists.ceph.io

by Ramanathan S

Hi all, I just had created a ceph cluster to use cephfs. When i create the a ceph fs pool i get the filesystem below error. # ceph osd pool create cephfs_data 128 pool 'cephfs_data' created # ceph osd pool create cephfs_metadata 128 pool 'cephfs_metadata' created # ceph fs new cephfs cephfs_metadata cephfs_data new fs with metadata pool 6 and data pool 5 # ceph -s cluster: id: 1c27def45-f0f9-494d-sfke-eb4323432fd health: HEALTH_ERR 1 filesystem is offline 1 filesystem is online with fewer MDS than max_mds services: mon: 2 daemons, quorum ceph-mon01,ceph-mon02 mgr: ceph-adm01(active) mds: cephfs-0/0/1 up osd: 12 osds: 12 up, 12 in data: pools: 2 pools, 256 pgs objects: 0 objects, 0 B usage: 12 GiB used, 588 GiB / 600 GiB avail pgs: 256 active+clean but when i check the max_mds for the ceph fs it says 1 # ceph fs get cephfs | grep max_mds max_mds 1 Let anyone know what am i missing here? Any inputs is much appreciated. Regards, Ram Ceph-explorer..

3 weeks, 3 days

3
3
0 0

Re: NoSuchKey on key that is visible in s3 list/radosgw bk

by Eric Ivancich

I have some questions for those who’ve experienced this issue. 1. It seems like those reporting this issue are seeing it strictly after upgrading to Octopus. From what version did each of these sites upgrade to Octopus? From Nautilus? Mimic? Luminous? 2. Does anyone have any lifecycle rules on a bucket experiencing this issue? If so, please describe. 3. Is anyone making copies of the affected objects (to same or to a different bucket) prior to seeing the issue? And if they are making copies, does the destination bucket have lifecycle rules? And if they are making copies, are those copies ever being removed? 4. Is anyone experiencing this issue willing to run their RGWs with 'debug_ms=1'? That would allow us to see a request from an RGW to either remove a tail object or decrement its reference counter (and when its counter reaches 0 it will be deleted). Thanks, Eric > On Nov 12, 2020, at 4:54 PM, huxiaoyu(a)horebdata.cn wrote: > > Looks like this is a very dangerous bug for data safety. Hope the bug would be quickly identified and fixed. > > best regards, > > Samuel > > > > huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> > > From: Janek Bevendorff > Date: 2020-11-12 18:17 > To: huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>; EDH - Manuel Rios; Rafael Lopez > CC: Robin H. Johnson; ceph-users > Subject: Re: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk > I have never seen this on Luminous. I recently upgraded to Octopus and the issue started occurring only few weeks later. > > On 12/11/2020 16:37, huxiaoyu(a)horebdata.cn wrote: > which Ceph versions are affected by this RGW bug/issues? Luminous, Mimic, Octupos, or the latest? > > any idea? > > samuel > > > > huxiaoyu(a)horebdata.cn > > From: EDH - Manuel Rios > Date: 2020-11-12 14:27 > To: Janek Bevendorff; Rafael Lopez > CC: Robin H. Johnson; ceph-users > Subject: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk > This same error caused us to wipe a full cluster of 300TB... will be related to some rados index/database bug not to s3. > > As Janek exposed is a mayor issue, because the error silent happend and you can only detect it with S3, when you're going to delete/purge a S3 bucket. Dropping NoSuchKey. Error is not related to S3 logic .. > > Hope this time dev's can take enought time to find and resolve the issue. Error happens with low ec profiles, even with replica x3 in some cases. > > Regards > > > > -----Mensaje original----- > De: Janek Bevendorff <janek.bevendorff(a)uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de>> > Enviado el: jueves, 12 de noviembre de 2020 14:06 > Para: Rafael Lopez <rafael.lopez(a)monash.edu <mailto:rafael.lopez@monash.edu>> > CC: Robin H. Johnson <robbat2(a)gentoo.org <mailto:robbat2@gentoo.org>>; ceph-users <ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>> > Asunto: [ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk > > Here is a bug report concerning (probably) this exact issue: > https://tracker.ceph.com/issues/47866 <https://tracker.ceph.com/issues/47866> > > I left a comment describing the situation and my (limited) experiences with it. > > > On 11/11/2020 10:04, Janek Bevendorff wrote: >> >> Yeah, that seems to be it. There are 239 objects prefixed >> .8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh in my dump. However, there are none >> of the multiparts from the other file to be found and the head object >> is 0 bytes. >> >> I checked another multipart object with an end pointer of 11. >> Surprisingly, it had way more than 11 parts (39 to be precise) named >> .1, .1_1 .1_2, .1_3, etc. Not sure how Ceph identifies those, but I >> could find them in the dump at least. >> >> I have no idea why the objects disappeared. I ran a Spark job over all >> buckets, read 1 byte of every object and recorded errors. Of the 78 >> buckets, two are missing objects. One bucket is missing one object, >> the other 15. So, luckily, the incidence is still quite low, but the >> problem seems to be expanding slowly. >> >> >> On 10/11/2020 23:46, Rafael Lopez wrote: >>> Hi Janek, >>> >>> What you said sounds right - an S3 single part obj won't have an S3 >>> multipart string as part of the prefix. S3 multipart string looks >>> like "2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme". >>> >>> From memory, single part S3 objects that don't fit in a single rados >>> object are assigned a random prefix that has nothing to do with >>> the object name, and the rados tail/data objects (not the head >>> object) have that prefix. >>> As per your working example, the prefix for that would be >>> '.8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh'. So there would be (239) "shadow" >>> objects with names containing that prefix, and if you add up the >>> sizes it should be the size of your S3 object. >>> >>> You should look at working and non working examples of both single >>> and multipart S3 objects, as they are probably all a bit different >>> when you look in rados. >>> >>> I agree it is a serious issue, because once objects are no longer in >>> rados, they cannot be recovered. If it was a case that there was a >>> link broken or rados objects renamed, then we could work to >>> recover...but as far as I can tell, it looks like stuff is just >>> vanishing from rados. The only explanation I can think of is some >>> (rgw or rados) background process is incorrectly doing something with >>> these objects (eg. renaming/deleting). I had thought perhaps it was a >>> bug with the rgw garbage collector..but that is pure speculation. >>> >>> Once you can articulate the problem, I'd recommend logging a bug >>> tracker upstream. >>> >>> >>> On Wed, 11 Nov 2020 at 06:33, Janek Bevendorff >>> <janek.bevendorff(a)uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de> >>> <mailto:janek.bevendorff@uni-weimar.de <mailto:janek.bevendorff@uni-weimar.de>>> wrote: >>> >>> Here's something else I noticed: when I stat objects that work >>> via radosgw-admin, the stat info contains a "begin_iter" JSON >>> object with RADOS key info like this >>> >>> >>> "key": { >>> "name": >>> "29/items/WIDE-20110924034843-crawl420/WIDE-20110924065228-02544.warc.gz", >>> "instance": "", >>> "ns": "" >>> } >>> >>> >>> and then "end_iter" with key info like this: >>> >>> >>> "key": { >>> "name": >>> ".8naRUHSG2zfgjqmwLnTPvvY1m6DZsgh_239", >>> "instance": "", >>> "ns": "shadow" >>> } >>> >>> However, when I check the broken 0-byte object, the "begin_iter" >>> and "end_iter" keys look like this: >>> >>> >>> "key": { >>> "name": >>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.1", >>> "instance": "", >>> "ns": "multipart" >>> } >>> >>> [...] >>> >>> >>> "key": { >>> "name": >>> "29/items/WIDE-20110903143858-crawl428/WIDE-20110903143858-01166.warc.gz.2~m5Y42lPMIeis5qgJAZJfuNnzOKd7lme.19", >>> "instance": "", >>> "ns": "multipart" >>> } >>> >>> So, it's the full name plus a suffix and the namespace is >>> multipart, not shadow (or empty). This in itself may just be an >>> artefact of whether the object was uploaded in one go or as a >>> multipart object, but the second difference is that I cannot find >>> any of the multipart objects in my pool's object name dump. I >>> can, however, find the shadow RADOS object of the intact S3 object. >>> >>> >>> >>> >>> -- >>> *Rafael Lopez* >>> Devops Systems Engineer >>> Monash University eResearch Centre >>> >>> T: +61 3 9905 9118 <tel:%2B61%203%209905%209118> >>> E: rafael.lopez(a)monash.edu <mailto:rafael.lopez@monash.edu> >>> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 weeks, 3 days

6
22
0 0

Why you might want packages not containers for Ceph deployments

by Matthew Vernon

Hi, In the discussion after the Ceph Month talks yesterday, there was a bit of chat about cephadm / containers / packages. IIRC, Sage observed that a common reason in the recent user survey for not using cephadm was that it only worked on containerised deployments. I think he then went on to say that he hadn't heard any compelling reasons why not to use containers, and suggested that resistance was essentially a user education question[0]. I'd like to suggest, briefly, that: * containerised deployments are more complex to manage, and this is not simply a matter of familiarity * reducing the complexity of systems makes admins' lives easier * the trade-off of the pros and cons of containers vs packages is not obvious, and will depend on deployment needs * Ceph users will benefit from both approaches being supported into the future We make extensive use of containers at Sanger, particularly for scientific workflows, and also for bundling some web apps (e.g. Grafana). We've also looked at a number of container runtimes (Docker, singularity, charliecloud). They do have advantages - it's easy to distribute a complex userland in a way that will run on (almost) any target distribution; rapid "cloud" deployment; some separation (via namespaces) of network/users/processes. For what I think of as a 'boring' Ceph deploy (i.e. install on a set of dedicated hardware and then run for a long time), I'm not sure any of these benefits are particularly relevant and/or compelling - Ceph upstream produce Ubuntu .debs and Canonical (via their Ubuntu Cloud Archive) provide .debs of a couple of different Ceph releases per Ubuntu LTS - meaning we can easily separate out OS upgrade from Ceph upgrade. And upgrading the Ceph packages _doesn't_ restart the daemons[1], meaning that we maintain control over restart order during an upgrade. And while we might briefly install packages from a PPA or similar to test a bugfix, we roll those (test-)cluster-wide, rather than trying to run a mixed set of versions on a single cluster - and I understand this single-version approach is best practice. Deployment via containers does bring complexity; some examples we've found at Sanger (not all Ceph-related, which we run from packages): * you now have 2 process supervision points - dockerd and systemd * docker updates (via distribution unattended-upgrades) have an unfortunate habit of rudely restarting everything * docker squats on a chunk of RFC 1918 space (and telling it not to can be a bore), which coincides with our internal network... * there is more friction if you need to look inside containers (particularly if you have a lot running on a host and are trying to find out what's going on) * you typically need to be root to build docker containers (unlike packages) * we already have package deployment infrastructure (which we'll need regardless of deployment choice) We also currently use systemd overrides to tweak some of the Ceph units (e.g. to do some network sanity checks before bringing up an OSD), and have some tools to pair OSD / journal / LVM / disk device up; I think these would be more fiddly in a containerised deployment. I'd accept that fixing these might just be a SMOP[2] on our part. Now none of this is show-stopping, and I am most definitely not saying "don't ship containers". But I think there is added complexity to your deployment from going the containers route, and that is not simply a "learn how to use containers" learning curve. I do think it is reasonable for an admin to want to reduce the complexity of what they're dealing with - after all, much of my job is trying to automate or simplify the management of complex systems! I can see from a software maintainer's point of view that just building one container and shipping it everywhere is easier than building packages for a number of different distributions (one of my other hats is a Debian developer, and I have a bunch of machinery for doing this sort of thing). But it would be a bit unfortunate if the general thrust of "let's make Ceph easier to set up and manage" was somewhat derailed with "you must use containers, even if they make your life harder". I'm not going to criticise anyone who decides to use a container-based deployment (and I'm sure there are plenty of setups where it's an obvious win), but if I were advising someone who wanted to set up and use a 'boring' Ceph cluster for the medium term, I'd still advise on using packages. I don't think this makes me a luddite :) Regards, and apologies for the wall of text, Matthew [0] I think that's a fair summary! [1] This hasn't always been true... [2] Simple (sic.) Matter of Programming -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

1 month

26
65
0 0

rgw multisite sync not syncing data, error: RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data log shards

by Christian Rohmann

Hey ceph-users, I setup a multisite sync between two freshly setup Octopus clusters. In the first cluster I created a bucket with some data just to test the replication of actual data later. I then followed the instructions on https://docs.ceph.com/en/octopus/radosgw/multisite/#migrating-a-single-site… to add a second zone. Things went well and both zones are now happily reaching each other and the API endpoints are talking. Also the metadata is in sync already - both sides are happy and I can see bucket listings and users are "in sync": > # radosgw-admin sync status > realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst) > zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra) > zone 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn) > metadata sync no sync (zone is master) > data sync source: c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1) > init > full sync: 128/128 shards > full sync: 0 buckets to sync > incremental sync: 0/128 shards > data is behind on 128 shards > behind shards: [0...127] > and on the other side ... > # radosgw-admin sync status > realm 13d1b8cb-dc76-4aed-8578-2ce5d3d010e8 (obst) > zonegroup 17a06c15-2665-484e-8c61-cbbb806e11d2 (obst-fra) > zone c07447eb-f93a-4d8f-bf7a-e52fade399f3 (obst-az1) > metadata sync syncing > full sync: 0/64 shards > incremental sync: 64/64 shards > metadata is caught up with master > data sync source: 6d2c1275-527e-432f-a57a-9614930deb61 (obst-rgn) > init > full sync: 128/128 shards > full sync: 0 buckets to sync > incremental sync: 0/128 shards > data is behind on 128 shards > behind shards: [0...127] > also the newly created buckets (read: their metadata) is synced. What is apparently not working in the sync of actual data. Upon startup the radosgw on the second site shows: > 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: start > 2021-06-25T16:15:06.445+0000 7fe71eff5700 1 RGW-SYNC:meta: realm > epoch=2 period id=f4553d7c-5cc5-4759-9253-9a22b051e736 > 2021-06-25T16:15:11.525+0000 7fe71dff3700 0 > RGW-SYNC:data:sync:init_data_sync_status: ERROR: failed to read remote > data log shards > also when issuing # radosgw-admin data sync init --source-zone obst-rgn it throws > 2021-06-25T16:20:29.167+0000 7f87c2aec080 0 > RGW-SYNC:data:init_data_sync_status: ERROR: failed to read remote data > log shards Does anybody have any hints on where to look for what could be broken here? Thanks a bunch, Regards Christian

9 months, 1 week

3
4
0 0

Small RGW objects and RADOS 64KB minimun size

by Loïc Dachary

Bonjour, Reading Karan's blog post about benchmarking the insertion of billions objects to Ceph via S3 / RGW[0] from last year, it reads: > we decided to lower bluestore_min_alloc_size_hdd to 18KB and re-test. As represented in chart-5, the object creation rate found to be notably reduced after lowering the bluestore_min_alloc_size_hdd parameter from 64KB (default) to 18KB. As such, for objects larger than the bluestore_min_alloc_size_hdd , the default values seems to be optimal, smaller objects further require more investigation if you intended to reduce bluestore_min_alloc_size_hdd parameter. There also is a mail thread dated 2018 on this topic as well, with the same conclusion although using RADOS directly and not RGW[3]. I read the RGW data layout page in the documentation[1] and concluded that by default every object inserted with S3 / RGW will indeed use at least 64kb. A pull request from last year[2] seems to confirm it and also suggests modifying bluestore_min_alloc_size_hdd has adverse side effects. That being said, I'm curious to know if people developed strategies to cope with this overhead. Someone mentioned packing objects together client side to make them larger. But maybe there are simpler ways to do the same? Cheers [0] https://www.redhat.com/en/blog/scaling-ceph-billion-objects-and-beyond [1] https://docs.ceph.com/en/latest/radosgw/layout/ [2] https://github.com/ceph/ceph/pull/32809 [3] https://www.spinics.net/lists/ceph-users/msg45755.html -- Loïc Dachary, Artisan Logiciel Libre

11 months

4
6
0 0

MDS crash on FAILED ceph_assert(cur->is_auth())

by Peter van Heusden

I am running Ceph 15.2.13 on CentOS 7.9.2009 and recently my MDS servers have started failing with the error message In function 'void Server::handle_client_open(MDRequestRef&)' thread 7f0ca9908700 time 2021-06-28T09:21:11.484768+0200 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/15.2.13/rpm/el7/BUILD/ceph-15.2.13/src/mds/Server.cc: 4149: FAILED ceph_assert(cur->is_auth()) Complete log is: https://gist.github.com/pvanheus/4da555a6de6b5fa5e46cbf74f5500fbd ceph status output is: # ceph status cluster: id: ed7b2c16-b053-45e2-a1fe-bf3474f90508 health: HEALTH_WARN 30 OSD(s) experiencing BlueFS spillover insufficient standby MDS daemons available 1 MDSs report slow requests 2 mgr modules have failed dependencies 4347046/326505282 objects misplaced (1.331%) 6 nearfull osd(s) 23 pgs not deep-scrubbed in time 23 pgs not scrubbed in time 8 pool(s) nearfull services: mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 22m) mgr: ceph-mon1(active, since 11w), standbys: ceph-mon2, ceph-mon3 mds: SANBI_FS:2 {0=ceph-mon1=up:active(laggy or crashed),1=ceph-mon2=up:stopping} osd: 54 osds: 54 up (since 2w), 54 in (since 11w); 50 remapped pgs data: pools: 8 pools, 833 pgs objects: 42.37M objects, 89 TiB usage: 159 TiB used, 105 TiB / 264 TiB avail pgs: 4347046/326505282 objects misplaced (1.331%) 782 active+clean 49 active+clean+remapped 1 active+clean+scrubbing+deep 1 active+clean+remapped+scrubbing io: client: 29 KiB/s rd, 427 KiB/s wr, 37 op/s rd, 48 op/s wr When restarting a MDS it goes through states replace, reconnect, resolve and finally sets itself to active before this crash happens. Any advice on what to do? Thanks, Peter P.S. apologies if you received this email more than once - I have had some trouble figuring out the correct mailing list to use.

11 months, 4 weeks

3
5
0 0

cephadm cluster move /var/lib/docker to separate device fails

by Karsten Nielsen

Hi, I have setup a ceph cluster with cephadm with docker backend. I want to move /var/lib/docker to a separate device to get better performance and less load on the OS device. I tried that by stopping docker copy the content of /var/lib/docker to the new device and mount the new device to /var/lib/docker. The other containers started as expected and continues to work and run as expected. But the ceph containers seems to be broken. I am not able to get them back in working state. I have tried to remove the host with `ceph orch host rm itcnchn-bb4067` and readd it but no effect. The strange thing is that 2 of 4 containers comes up as expected. ceph orch ps itcnchn-bb4067 NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID crash.itcnchn-bb4067 itcnchn-bb4067 running (18h) 10m ago 4w 15.2.7 docker.io/ceph/ceph:v15 2bc420ddb175 2af28c4571cf mds.cephfs.itcnchn-bb4067.qzoshl itcnchn-bb4067 error 10m ago 4w <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown> mon.itcnchn-bb4067 itcnchn-bb4067 error 10m ago 18h <unknown> docker.io/ceph/ceph:v15 <unknown> <unknown> rgw.ikea.dc9-1.itcnchn-bb4067.gtqedc itcnchn-bb4067 running (18h) 10m ago 4w 15.2.7 docker.io/ceph/ceph:v15 2bc420ddb175 00d000aec32b Docker logs from the active manager does not say much about what is wrong debug 2021-01-05T09:57:52.537+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring mds.cephfs.itcnchn-bb4067.qzoshl (unknown last config time)... debug 2021-01-05T09:57:52.541+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring daemon mds.cephfs.itcnchn-bb4067.qzoshl on itcnchn-bb4067 debug 2021-01-05T09:57:52.973+0000 7fdb64e88700 0 log_channel(cluster) log [DBG] : pgmap v347: 241 pgs: 241 active+clean; 18 GiB data, 50 GiB used, 52 TiB / 52 TiB avail; 18 KiB/s rd, 78 KiB/s wr, 24 op/s debug 2021-01-05T09:57:53.085+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring mon.itcnchn-bb4067 (unknown last config time)... debug 2021-01-05T09:57:53.085+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring daemon mon.itcnchn-bb4067 on itcnchn-bb4067 debug 2021-01-05T09:57:53.625+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring rgw.ikea.dc9-1.itcnchn-bb4067.gtqedc (unknown last config time)... debug 2021-01-05T09:57:53.629+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring daemon rgw.ikea.dc9-1.itcnchn-bb4067.gtqedc on itcnchn-bb4067 debug 2021-01-05T09:57:54.141+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring crash.itcnchn-bb4067 (unknown last config time)... debug 2021-01-05T09:57:54.141+0000 7fdb69691700 0 log_channel(cephadm) log [INF] : Reconfiguring daemon crash.itcnchn-bb4067 on itcnchn-bb4067 - Karsten

1 year, 1 month

2
1
0 0

Stuck OSD service specification - can't remove

by David Orman

Has anybody run into a 'stuck' OSD service specification? I've tried to delete it, but it's stuck in 'deleting' state, and has been for quite some time (even prior to upgrade, on 15.2.x). This is on 16.2.3: NAME PORTS RUNNING REFRESHED AGE PLACEMENT osd.osd_spec 504/525 <deleting> 12m label:osd root@ceph01:/# ceph orch rm osd.osd_spec Removed service osd.osd_spec From active monitor: debug 2021-05-06T23:14:48.909+0000 7f17d310b700 0 log_channel(cephadm) log [INF] : Remove service osd.osd_spec Yet in ls, it's still there, same as above. --export on it: root@ceph01:/# ceph orch ls osd.osd_spec --export service_type: osd service_id: osd_spec service_name: osd.osd_spec placement: {} unmanaged: true spec: filter_logic: AND objectstore: bluestore We've tried --force, as well, with no luck. To be clear, the --export even prior to delete looks nothing like the actual service specification we're using, even after I re-apply it, so something seems 'bugged'. Here's the OSD specification we're applying: service_type: osd service_id: osd_spec placement: label: "osd" data_devices: rotational: 1 db_devices: rotational: 0 db_slots: 12 I would appreciate any insight into how to clear this up (without removing the actual OSDs, we're just wanting to apply the updated service specification - we used to use host placement rules and are switching to label-based). Thanks, David

1 year, 1 month

3
4
0 0

kernel client osdc ops stuck and mds slow reqs

by Dan van der Ster

Hi all, We are quite regularly (a couple times per week) seeing: HEALTH_WARN 1 clients failing to respond to capability release; 1 MDSs report slow requests MDS_CLIENT_LATE_RELEASE 1 clients failing to respond to capability release mdshpc-be143(mds.0): Client hpc-be028.cern.ch: failing to respond to capability release client_id: 52919162 MDS_SLOW_REQUEST 1 MDSs report slow requests mdshpc-be143(mds.0): 1 slow requests are blocked > 30 secs Which is being caused by osdc ops stuck in a kernel client, e.g.: 10:57:18 root hpc-be028 /root → cat /sys/kernel/debug/ceph/4da6fd06-b069-49af-901f-c9513baabdbd.client52919162/osdc REQUESTS 9 homeless 0 46559317 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559322 osd243 3.ee6ffcdb 3.cdb [243,501,92]/243 [243,501,92]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a01.00000057 0x400014 1 read 46559323 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559341 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559342 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559345 osd243 3.969cc573 3.573 [243,330,226]/243 [243,330,226]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a56.00000056 0x400014 1 read 46559621 osd243 3.6313e8ef 3.8ef [243,330,521]/243 [243,330,521]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a45.0000007a 0x400014 1 read 46559629 osd243 3.b280c852 3.852 [243,113,539]/243 [243,113,539]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f09a3a.0000007f 0x400014 1 read 46559928 osd243 3.1ee7bab4 3.ab4 [243,332,94]/243 [243,332,94]/243 e678697 fsvolumens_355f485c-6319-4ffe-acd6-94a07f2a14b4/10003f099ff.0000073f 0x400024 1 write LINGER REQUESTS BACKOFFS We can unblock those requests by doing `ceph osd down osd.243` (or restarting osd.243). This is ceph v14.2.6 and the client kernel is el7 3.10.0-957.27.2.el7.x86_64. Are there a better way to debug this? Best Regards, Dan

1 year, 2 months

4
12
0 0

OSD slow ops warning not clearing after OSD down

by Frank Schilder

Dear cephers, I have a strange problem. An OSD went down and recovery finished. For some reason, I have a slow ops warning for the failed OSD stuck in the system: health: HEALTH_WARN 430 slow ops, oldest one blocked for 36 sec, osd.580 has slow ops The OSD is auto-out: | 580 | ceph-22 | 0 | 0 | 0 | 0 | 0 | 0 | autoout,exists | It is probably a warning dating back to just before the fail. How can I clear it? Thanks and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

1 year, 3 months

4
7
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2021