January 2021 - ceph-users

by Michael Thomas

I have a cephfs secondary (non-root) data pool with unfound and degraded objects that I have not been able to recover[1]. I created an additional data pool and used "setfattr -n ceph.dir.layout.pool' and a very long rsync to move the files off of the degraded pool and onto the new pool. This has completed, and using find + 'getfattr -n ceph.file.layout.pool', I verified that no files are using the old pool anymore. No ceph.dir.layout.pool attributes point to the old pool either. However, the old pool still reports that there are objects in the old pool, likely the same ones that were unfound/degraded from before: https://pastebin.com/qzVA7eZr Based on a old message from the mailing list[2], I checked the MDS for stray objects (ceph daemon mds.ceph4 dump cache file.txt ; grep -i stray file.txt) and found 36 stray entries in the cache: https://pastebin.com/MHkpw3DV. However, I'm not certain how to map these stray cache objects to clients that may be accessing them. 'rados -p fs.data.archive.frames ls' shows 145 objects. Looking at the parent of each object shows 2 strays: for obj in $(cat rados.ls.txt) ; do echo $obj ; rados -p fs.data.archive.frames getxattr $obj parent | strings ; done [...] 10000020fa1.00000000 10000020fa1 stray6 10000020fbc.00000000 10000020fbc stray6 [...] ...before getting stuck on one object for over 5 minutes (then I gave up): 1000005b1af.00000083 What can I do to make sure this pool is ready to be safely deleted from cephfs (ceph fs rm_data_pool archive fs.data.archive.frames)? --Mike [1]https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QHFOGEKXK7VDNNSKR74BA6IIMGGIXBXA/#7YQ6SSTESM5LTFVLQK3FSYFW5FDXJ5CF [2]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005233.h…

3 years, 1 month

2
5
0 0

Remapped PGs

by David Orman

Hi, We see that we have 5 'remapped' PGs, but are unclear why/what to do about it. We shifted some target ratios for the autobalancer and it resulted in this state. When adjusting ratio, we noticed two OSDs go down, but we just restarted the container for those OSDs with podman, and they came back up. Here's status output: ################### root@ceph01:~# ceph status INFO:cephadm:Inferring fsid x INFO:cephadm:Inferring config x INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 cluster: id: 41bb9256-c3bf-11ea-85b9-9e07b0435492 health: HEALTH_OK services: mon: 5 daemons, quorum ceph01,ceph04,ceph02,ceph03,ceph05 (age 2w) mgr: ceph03.ytkuyr(active, since 2w), standbys: ceph01.aqkgbl, ceph02.gcglcg, ceph04.smbdew, ceph05.yropto osd: 168 osds: 168 up (since 2d), 168 in (since 2d); 5 remapped pgs data: pools: 3 pools, 1057 pgs objects: 18.00M objects, 69 TiB usage: 119 TiB used, 2.0 PiB / 2.1 PiB avail pgs: 1056 active+clean 1 active+clean+scrubbing+deep io: client: 859 KiB/s rd, 212 MiB/s wr, 644 op/s rd, 391 op/s wr root@ceph01:~# ################### When I look at ceph pg dump, I don't see any marked as remapped: ################### root@ceph01:~# ceph pg dump |grep remapped INFO:cephadm:Inferring fsid x INFO:cephadm:Inferring config x INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15 dumped all root@ceph01:~# ################### Any idea what might be going on/how to recover? All OSDs are up. Health is 'OK'. This is Ceph 15.2.4 deployed using Cephadm in containers, on Podman 2.0.3.

3 years, 1 month

2
4
0 0

Nautilus Cluster Struggling to Come Back Online

by William Law

I guess as a sort of follow up from my previous post. Our Nautilus (14.2.16 on ubuntu 18.04) cluster had some sort of event that caused many of the machines to have memory errors. The aftermath is that initially some OSDs had (and continue to have) this error https://tracker.ceph.com/issues/48827 others won't start for various reasons. The OSDs that *will* start are badly behind the current epoch for the most part. It sounds very similar to this: https://blog.noc.grnet.gr/2016/10/18/surviving-a-ceph-cluster-outage-the-ha… We are having trouble getting things back online. I think the path forward is to: -set noup/nodown/noout/nobackfill/and wait for the OSDs that run to come up; we were making good progress yesterday until some of the OSDs crashed with OOM errors. We are again moving forward but understandably nervous. -export the PGs from questionable OSDs and and then rebuild the OSDs; import the PGs if necessary (very likely). Repeat until we are up. Any suggestions for increasing speed? We are using noup/nobackfill/norebalance/pause but the epoch catchup is taking a very long time. Any tips for keeping the epoch from moving forward or speeding up the OSDs catching up? How can we estimate how long it should take? Thank you for any ideas or assistance anyone can provide. Will

3 years, 1 month

2
2
0 0

Re: mds lost very frequently

by Stefan Kooman

Hi, After setting: ceph config set mds mds_recall_max_caps 10000 (5000 before change) and ceph config set mds mds_recall_max_decay_rate 1.0 (2.5 before change) And the: ceph tell 'mds.*' injectargs '--mds_recall_max_caps 10000' ceph tell 'mds.*' injectargs '--mds_recall_max_decay_rate 1.0' our up:active MDS stopped responding and the standby-replay stepped in ... and hit an assert (same as in this thread): 2020-02-06 16:42:16.712 7ff76a528700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15 2020-02-06 16:42:17.616 7ff76ff1b700 0 mds.beacon.mds2 MDS is no longer laggy 2020-02-06 16:42:20.348 7ff76d716700 -1 /build/ceph-13.2.8/src/mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7ff76d716700 time 2020-02-06 16:42:20.351124 /build/ceph-13.2.8/src/mds/Locker.cc: 5307: FAILED assert(lock->get_state() == LOCK_PRE_SCAN) ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7ff7759939de] 2: (()+0x287b67) [0x7ff775993b67] 3: (()+0x28a9ea) [0x5585eb2b79ea] 4: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b] 5: (MDSRank::active_start()+0x135) [0x5585eb146be5] 6: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) [0x5585eb151ea5] 7: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608] 8: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc] 9: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b] 10: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d] 12: (()+0x76db) [0x7ff7752846db] 13: (clone()+0x3f) [0x7ff77446a88f] 2020-02-06 16:42:20.348 7ff76d716700 -1 *** Caught signal (Aborted) ** in thread 7ff76d716700 thread_name:ms_dispatch ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable) 1: (()+0x12890) [0x7ff77528f890] 2: (gsignal()+0xc7) [0x7ff774387e97] 3: (abort()+0x141) [0x7ff774389801] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7ff775993ae6] 5: (()+0x287b67) [0x7ff775993b67] 6: (()+0x28a9ea) [0x5585eb2b79ea] 7: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b] 8: (MDSRank::active_start()+0x135) [0x5585eb146be5] 9: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) [0x5585eb151ea5] 10: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608] 11: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc] 12: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b] 13: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d] 15: (()+0x76db) [0x7ff7752846db] 16: (clone()+0x3f) [0x7ff77446a88f] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Quoting Yan, Zheng (ukernel(a)gmail.com): > Please try below patch if you can compile ceph from source. If you > can't compile ceph or the issue still happens, please set debug_mds = > 10 for standby mds (change debug_mds to 0 after mds becomes active). > > Regards > Yan, Zheng > > diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc > index 1e8b024b8a..d1150578f1 100644 > --- a/src/mds/MDSRank.cc > +++ b/src/mds/MDSRank.cc > @@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done() > void MDSRank::clientreplay_start() > { > dout(1) << "clientreplay_start" << dendl; > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > queue_one_replay(); > } > > @@ -1487,8 +1487,8 @@ void MDSRank::active_start() > > mdcache->clean_open_file_lists(); > mdcache->export_remaining_imported_caps(); > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > > mdcache->reissue_all_caps(); > mdcache->activate_stray_manager(); AFAICT this patch has never been tested and never commited. Do you still think this might fix the issue? Any hints on how we might reproduce this issue: failing active mds and hitting this specific recovery scenario We will happily apply this patch and do testing to check if it really fixes the issue. Gr. Stefan P.s. For my understanding: the MDS should never stop responding by setting these parameters, right? -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info(a)bit.nl

3 years, 2 months

1
1
0 0

Default data pool in CEPH

by Gabriel Medve

Hi, I have a CEPH 15.2.4 running in a docker. How to configure for use a specific data pool? i try put the follow line in the ceph.conf but the changes not working. . [client.myclient] rbd default data pool = Mydatapool I need it to configure for erasure pool with cloudstack Can help me? , where is the ceph conf we i need configure? Thanks. -- Untitled Document

3 years, 2 months

4
6
0 0

Re: NFS Ganesha NFSv3

by Gabriel Medve

Hi Thanks for the reply. cephadm runs ceph containers automatically. How to set privileged mode in ceph container? -- > El 23/9/20 a las 13:24, Daniel Gryniewicz escribió: >> NFSv3 needs privileges to connect to the portmapper. Try running >> your docker container in privileged mode, and see if that helps. >> >> Daniel >> >> On 9/23/20 11:42 AM, Gabriel Medve wrote: >>> Hi, >>> >>> I have a CEPH 15.2.5 running in a docker , i configure nfs ganesha >>> with nfs version 3 but i can not mount it. >>> If configure ganesha with nfs version 4 i can mounted without >>> problems but i need the version 3 . >>> >>> The error is mount.nfs: Protocol not supported >>> >>> Can help me? >>> >>> Thanks. >>> >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > -- > Untitled Document

3 years, 2 months

2
1
0 0

Possible to disable check: x pool(s) have no replicas configured

by Marc Roos

Is it possible to disable checking on 'x pool(s) have no replicas configured', so I don't have this HEALTH_WARN constantly. Or is there some other disadvantage of keeping some empty 1x replication test pools?

3 years, 2 months

2
1
0 0

RBD Image can't be formatted - blk_error

by Gaël THEROND

Hi everyone! I'm facing a weird issue with one of my CEPH clusters: OS: CentOS - 8.2.2004 (Core) CEPH: Nautilus 14.2.11 - stable RBD using erasure code profile (K=3; m=2) When I want to format one of my RBD image (client side) I've got the following kernel messages multiple time with different sector IDs: *[2417011.790154] blk_update_request: I/O error, dev rbd23, sector 164743869184 op 0x3:(DISCARD) flags 0x4000 phys_seg 1 prio class 0[2417011.791404] rbd: rbd23: discard at objno 20110336 2490368~1703936 result -1 * At first I thought about a faulty disk BUT the monitoring system is not showing anything faulty so I decided to run manual tests on all my OSDs to look at disk health using smartctl etc. None of them is marked as not healthy and actually they don't get any counter with faulty sectors/read or writes and the Wear Level is 99% So, the only particularity of this image is it is a 80Tb image, but it shouldn't be an issue as we already have that kind of image size used on another pool. If anyone have a clue at how I could sort this out, I'll be more than happy ^^ Kind regards!

3 years, 2 months

2
5
0 0

OSDs cannot join, MON leader at 100%

by Frank Schilder

Dear cephers, I was doing some maintenance yesterday involving shutdown-power up cycles of ceph servers. With the last server I run into a problem. The server runs an MDS and a couple of OSDs. After reboot, the MDS joined the MDS cluster without problems, but the OSDs didn't come up. This was 1 out of 12 servers and I had no such problems with the other 11. I also observed that "ceph status" was responding very slow. Upon further inspection, I found out that 2 of my 3 MONs (the leader and a peon) were running at 100% CPU. Client I/O was continuing, probably because the last cluster map remained valid. On our node performance monitoring I could see that the 2 busy MONs were showing extraordinary network activity. This state lasted for over one hour. After the MONs settled down, the OSDs finally joined as well and everything went back to normal. The other instance I have seen similar behaviour was, when I restarted a MON on an empty disk and the re-sync was extremely slow due to a too large value for mon_sync_max_payload_size. This time, I'm pretty sure it was MON-client communication; see below. Are there any settings similar to mon_sync_max_payload_size that could influence responsiveness of MONs in a similar way? Why do I suspect it is MON-client communication? In our monitoring, I do not see the huge amount of packages sent by the MONs arriving at any other ceph daemon. They seem to be distributed over client nodes, but since we have a large count of client nodes (>550) this is covered by the background network traffic. A second clue is that I have had such extended lock-ups before and, whenever I checked, I only observed these in case the leader had a large share of client sessions. For example, yesterday the client session count per MON was: ceph-01: 1339 (leader) ceph-02: 189 (peon) ceph-03: 839 (peon) I usually restart the leader when such a critical distribution occurs. As long as the leader has the fewest client sessions, I never observe this problem. Ceph version is 13.2.10 (564bdc4ae87418a232fc901524470e1a0f76d641) mimic (stable). Thanks for any clues! Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14

3 years, 2 months

2
5
0 0

CEPHFS - MDS gracefull handover of rank 0

by Martin Hronek

Hello fellow CEPH-users, currently we are updating our CEPH(14.2.16) and making changes to some config settings. TLDR: is there a way to make a graceful MDS active node shutdown without loosing the caps, open files and client connections? Something like handover active state, promote standby to active, ...? Sadly we run into some difficulties when restarting MDS Nodes. While we had two active nodes and one standby we initially though that this would have a nice handover when restarting the active rank ... sadly we saw how the node was going through the states: replay-reconnect-rejoin-active as nicely visualized here https://docs.ceph.com/en/latest/cephfs/mds-states/ This left some nodes going into timeouts until the standby node has gone into the active state again, most probably since the cephfs hast already some 600k folders and 3M files and from the client side it took more than 30s. So before the next MDS the FS config where changed to one active and one standby-replay node, the idea was that since the MDS replay nodes follows the active one the handover would be smoother. The active state was reached faster, but we still noticed some hiccups on the clients while the new active MDS was waiting for clients to reconnect(state up:reconnect) after the failover. The next idea was to do a manual node promotion, graceful shutdown or something similar - where the open caps and sessions would be handed over ... but I did not find any hint in the docs regarding this functionality. But, this should somehow be possible (imho), since when adding a second active mds node (max_mds 2) and then removing it again (max_mds 1) the rank 1 node goes to stopping-state and hands over all clients/caps to rank 0 without interruptions for the clients. Therefore my question: how can one gracefully shutdown an active rank 0 mds node or promote an standby node to the active state without loosing open files/caps or client sessions? Thanks in advance, M

3 years, 2 months

4
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2021