October 2020 - ceph-users

by Lalit Maganti

(sending this email again as the first time was blocked because my attached log file was too big) Hi all, *Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this email) using Rook on a toy Kubernetes cluster of two nodes. I've got a single Ceph mon node running perfectly with 3 OSDs . There are two pools running which were created as part of a CephFS install. *Problem*: when I try to add my 4th OSD, the Ceph mon starts crashing on the OSDMonitor::build_incremental function. I've checked on the mailing lists and just in general and the last instance of this issue seems to have been 7 years ago so I'm probably not hitting the same thing! *Question*: I was wondering if anyone had ideas on what I might be doing wrong? I'm very new to Ceph so my suspicion is that it's something to do with my configuration but given I'm literally just adding an OSD and everything is fine otherwise, I'm not sure what my mistake might be. Please find the bug I filed on the Ceph tracker here <https://tracker.ceph.com/issues/48026> where I've provided a mon log file with log level 20. Kind regards, Lalit Maganti

3 years, 5 months

1
0
0 0

bluefs mount failed(crash) after a long time

by Elians Wan

Anyone can help? Bluefs mount failed after a long time The error message: 2020-10-30 05:33:54.906725 7f1ad73f5e00 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-30/block size 7.28TiB 2020-10-30 05:33:54.906758 7f1ad73f5e00 1 bluefs mount 2020-10-30 06:00:32.881850 7f1ad73f5e00 -1 *** Caught signal (Segmentation fault) ** in thread 7f1ad73f5e00 thread_name:ceph-osd ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777) luminous (stable) 1: (()+0xaa2044) [0x5570d12af044] 2: (()+0x11390) [0x7f1ad56d2390] 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0xad4) [0x5570d125ea34] 4: (BlueFS::_replay(bool)+0x409) [0x5570d1267599] 5: (BlueFS::mount()+0x209) [0x5570d126b659] 6: (BlueStore::_open_db(bool)+0x169c) [0x5570d117acdc] 7: (BlueStore::_mount(bool)+0x3ad) [0x5570d11aeded] 8: (OSD::init()+0x3e2) [0x5570d0d00f12] 9: (main()+0x2f0a) [0x5570d0c0a0ca] 10: (__libc_start_main()+0xf0) [0x7f1ad4658830] 11: (_start()+0x29) [0x5570d0c97329] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

3 years, 5 months

2
1
0 0

MDS restarts after enabling msgr2

by Stefan Kooman

Hi List, After a successful upgrade from Mimic 13.2.8 to Nautilus 14.2.12 we enabled msgr2. Soon after that both of the MDS servers (active / active-standby) restarted. We did not hit any ASSERTS this time, so that's good :>. However, I have not seen this happening on four different test clusters (while running a slightly older Nautilus release), so I certainly did not expect that. Most of the connections switched over to 3300 (apart from the cephfs kernel clients) and that all kept on working. Anybody else has seen this behavior before? Gr. Stefan

3 years, 5 months

2
1
0 0

frequent Monitor down

by Andrei Mikhailovsky

Hello everyone, I am having regular messages that the Monitors are going down and up: 2020-10-27T09:50:49.032431+0000 mon .arh-ibstorage2-ib ( mon .1) 2248 : cluster [WRN] Health check failed: 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 2020-10-27T09:50:49.123511+0000 mon .arh-ibstorage2-ib ( mon .1) 2250 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s) set; 43 pgs not deep-scrubbed in time; 12 pgs not scrubbed in time 2020-10-27T09:50:52.735457+0000 mon .arh-ibstorage1-ib ( mon .0) 31287 : cluster [INF] Health check cleared: MON_DOWN (was: 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib) 2020-10-27T12:35:20.556458+0000 mon .arh-ibstorage2-ib ( mon .1) 2260 : cluster [WRN] Health check failed: 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib (MON_DOWN) 2020-10-27T12:35:20.643282+0000 mon .arh-ibstorage2-ib ( mon .1) 2262 : cluster [WRN] overall HEALTH_WARN 23 OSD(s) experiencing BlueFS spillover; 3 large omap objects; 1/4 mons down, quorum arh-ibstorage2-ib,arh-ibstorage3-ib,arh-ibstorage4-ib; noout flag(s) set; 47 pgs not deep-scrubbed in time; 14 pgs not scrubbed in time This happens on a daily basis several times a day. Could you please let me know how to fix this annoying problem? I am running ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) on Ubuntu 18.04 LTS with latest updates. Thanks Andrei

3 years, 5 months

6
8
0 0

Huge HDD ceph monitor usage

by Ing. Luis Felipe Domínguez Vega

How can i free the store of ceph monitor?: ------------------------------------------------------------------------ root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle# du -h -d1 542G ./store.db 542G . ------------------------------------------------------------------------

3 years, 5 months

5
23
0 0

dashboard object gateway not working

by Siegfried Höllrigl

We are running Ceph 14.2.12 and would like to manage our object gateways via the dashboard. The rados gateways are running on different (virtual) machines than the mon servers (where mon, mgr and mds are running). The dashboard seems to be running fine. But when we cklick on "Object Gateway" we get the following error message : Sorry, we could not find what you were looking for 500 - Internal Server Error The server encountered an unexpected condition which prevented it from fulfilling the request. 10/27/20 12:30:32 PM The logfiles (of mgr and radosgw) do not show anything helpful. Maybe we have missed someting that needs to be enabled/installed/activated ?!? Any Ideas ?

3 years, 5 months

2
4
0 0

Cloud Sync Module

by Sailaja Yedugundla

I am trying to configure cloud sync module in my ceph cluster to implement backup to AWS S3 cluster. I could not find configure using the available documentation. Can someone help me to implement this? Thanks, Sailaja

3 years, 5 months

1
0
0 0

Ceph User Survey 2020 - Working Group Invite

by anantha.adiga＠intel.com

Hello all, This is an invite to all interested to join a working group being formed for 2020 Ceph User Survey planning. The focus is to augment the questionnaire coverage, explore survey delivery formats and to expand the survey reach to audience across the world. The popularity and adoption of Ceph is growing steadily and so are the deployment options. Survey feedback has certainly helped the community to focus on user's asks and make Ceph better for their needs. This time we want to make the survey experience more enriching to the community of developers and to the user community. As a sample, here are a few questions that have been collected to help build better hardware options for Ceph across Enterprise and CSPs. The working group can help refine them for content and importance. There will be questions in other categories like Ceph configuration that members will help assess for relevance and inclusion and find some innovative approaches to reach boarder audience. Do you prefer single socket servers for Ceph OSD nodes? Which drive form factors are important to you (NVMe drives: U.2 or ruler (E1.L))? How many drives per server fits your need? What drive capacities are important to you? Do you separate metadata and data on different classes of media? Do you use Optane 3D XPoint or NAND for BlueStore metadata? Which caching method, client side vs OSD side is more useful to you? As always, many many thanks to Mike Perez who is driving the user survey effort and making it better every passing year. Thank you, Anantha Adiga

3 years, 5 months

3
2
0 0

Re: Huge HDD ceph monitor usage [EXT]

by Ing. Luis Felipe Domínguez Vega

Well. 7 hosts up and recovery start and stop in 3 hours more or less, now the cluster is not recovering any more... can be that needs more hosts? El 2020-10-27 13:58, Eugen Block escribió: > Hm, that would be new to me that the mgr service is required for > recovery, but maybe I missed something and it is a crucial component > meanwhile, I'm not sure. > > The large MON db could be due to a large backlog of cluster maps. The > more and the longer a recovery is taking the more space the MONs > require, that's unfortunate. I think there are ways to trim that > without a major impact but I don't know how, never had to do that. But > during recovery progress the usage should decrease, I believe. > > > Zitat von "Ing. Luis Felipe Domínguez Vega" <luis.dominguez(a)desoft.cu>: > >> One thing, only when i started the ceph-mgr is that the cluster began >> to recover... I added the 2 new hosts and now is recovering, but i >> dont known why ceph monitor is consuming 500 GB of HDD.... >> >> El 2020-10-27 09:59, Eugen Block escribió: >>> Your pool 'data_storage' has a size of 7 (or 7 chunks since it's >>> erasure-coded) and the rule requires each chunk on a different host >>> but you currently have only 5 hosts available, that's why the >>> recovery >>> is not progressing. It's waiting for two more hosts. Unfortunately, >>> you can't change the EC profile or the rule of that pool. I'm not >>> sure >>> if it would work in the current cluster state, but if you can't add >>> two more hosts (which would be your best option for recovery) it >>> might >>> be possible to create a new replicated pool (you seem to have enough >>> free space) and copy the contents from that EC pool. But as I said, >>> I'm not sure if that would work in a degraded state, I've never tried >>> that. >>> >>> So your best bet is to get two more hosts somehow. >>> >>> >>>> pool 4 'data_storage' erasure profile desoft size 7 min_size 5 >>>> crush_rule 1 object_hash rjenkins pg_num 32 pgp_num 32 >>>> autoscale_mode off last_change 154384 lfor 0/121016/121014 flags >>>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 >>>> application rbd >>> >>> >>> Zitat von "Ing. Luis Felipe Domínguez Vega" >>> <luis.dominguez(a)desoft.cu>: >>> >>>> Needed data: >>>> >>>> ceph -s : https://pastebin.ubuntu.com/p/S9gKjyZtdK/ >>>> ceph osd tree : https://pastebin.ubuntu.com/p/SCZHkk6Mk4/ >>>> ceph osd df : (later, because i'm waiting since 10 >>>> minutes and not output yet) >>>> ceph osd pool ls detail : https://pastebin.ubuntu.com/p/GRdPjxhv3D/ >>>> crush rules : (ceph osd crush rule dump) >>>> https://pastebin.ubuntu.com/p/cjyjmbQ4Wq/ >>>> >>>> El 2020-10-27 07:14, Eugen Block escribió: >>>>>> I understand, but i delete the OSDs from CRUSH map, so ceph don't >>>>>> wait for these OSDs, i'm right? >>>>> >>>>> It depends on your actual crush tree and rules. Can you share >>>>> (maybe >>>>> you already did) >>>>> >>>>> ceph osd tree >>>>> ceph osd df >>>>> ceph osd pool ls detail >>>>> >>>>> and a dump of your crush rules? >>>>> >>>>> As I already said, if you have rules in place that distribute data >>>>> across 2 DCs and one of them is down the PGs will never recover >>>>> even >>>>> if you delete the OSDs from the failed DC. >>>>> >>>>> >>>>> >>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" >>>>> <luis.dominguez(a)desoft.cu>: >>>>> >>>>>> I understand, but i delete the OSDs from CRUSH map, so ceph don't >>>>>> wait for these OSDs, i'm right? >>>>>> >>>>>> El 2020-10-27 04:06, Eugen Block escribió: >>>>>>> Hi, >>>>>>> >>>>>>> just to clarify so I don't miss anything: you have two DCs and >>>>>>> one of >>>>>>> them is down. And two of the MONs were in that failed DC? Now you >>>>>>> removed all OSDs and two MONs from the failed DC hoping that your >>>>>>> cluster will recover? If you have reasonable crush rules in place >>>>>>> (e.g. to recover from a failed DC) your cluster will never >>>>>>> recover in >>>>>>> the current state unless you bring OSDs back up on the second DC. >>>>>>> That's why you don't see progress in the recovery process, the >>>>>>> PGs are >>>>>>> waiting for their peers in the other DC so they can follow the >>>>>>> crush >>>>>>> rules. >>>>>>> >>>>>>> Regards, >>>>>>> Eugen >>>>>>> >>>>>>> >>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" >>>>>>> <luis.dominguez(a)desoft.cu>: >>>>>>> >>>>>>>> I was 3 mons, but i have 2 physical datacenters, one of them >>>>>>>> breaks with not short term fix, so i remove all osds and ceph >>>>>>>> mon (2 of them) and now i have only the osds of 1 datacenter >>>>>>>> with the monitor. I was stopped the ceph manager, but i was >>>>>>>> see that when i restart a ceph manager then ceph -s show >>>>>>>> recovering info for a short term of 20 min more or less, then >>>>>>>> dissapear all info. >>>>>>>> >>>>>>>> The thing is that sems the cluster is not self recovering and >>>>>>>> the ceph monitor is "eating" all of the HDD. >>>>>>>> >>>>>>>> El 2020-10-26 15:57, Eugen Block escribió: >>>>>>>>> The recovery process (ceph -s) is independent of the MGR >>>>>>>>> service but >>>>>>>>> only depends on the MON service. It seems you only have the one >>>>>>>>> MON, >>>>>>>>> if the MGR is overloading it (not clear why) it could help to >>>>>>>>> leave >>>>>>>>> MGR off and see if the MON service then has enough RAM to >>>>>>>>> proceed with >>>>>>>>> the recovery. Do you have any chance to add two more MONs? A >>>>>>>>> single >>>>>>>>> MON is of course a single point of failure. >>>>>>>>> >>>>>>>>> >>>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" >>>>>>>>> <luis.dominguez(a)desoft.cu>: >>>>>>>>> >>>>>>>>>> El 2020-10-26 15:16, Eugen Block escribió: >>>>>>>>>>> You could stop the MGRs and wait for the recovery to finish, >>>>>>>>>>> MGRs are >>>>>>>>>>> not a critical component. You won’t have a dashboard or >>>>>>>>>>> metrics >>>>>>>>>>> during/of that time but it would prevent the high RAM usage. >>>>>>>>>>> >>>>>>>>>>> Zitat von "Ing. Luis Felipe Domínguez Vega" >>>>>>>>>>> <luis.dominguez(a)desoft.cu>: >>>>>>>>>>> >>>>>>>>>>>> El 2020-10-26 12:23, 胡玮文 escribió: >>>>>>>>>>>>>> 在 2020年10月26日，23:29，Ing. Luis Felipe Domínguez Vega >>>>>>>>>>>>>> <luis.dominguez(a)desoft.cu> 写道： >>>>>>>>>>>>>> >>>>>>>>>>>>>> mgr: fond-beagle(active, since 39s) >>>>>>>>>>>>> >>>>>>>>>>>>> Your manager seems crash looping, it only started since >>>>>>>>>>>>> 39s. Looking >>>>>>>>>>>>> at mgr logs may help you identify why your cluster is not >>>>>>>>>>>>> recovering. >>>>>>>>>>>>> You may hit some bug in mgr. >>>>>>>>>>>> Noup, I'm restarting the ceph manager because they eat all >>>>>>>>>>>> server RAM and then i have an script that when i have >>>>>>>>>>>> 1GB of Free Ram (the server has 94 Gb of RAM) then >>>>>>>>>>>> restart the manager, i dont known why and the logs of >>>>>>>>>>>> manager are: >>>>>>>>>>>> >>>>>>>>>>>> ----------------------------------- >>>>>>>>>>>> root@fond-beagle:/var/lib/ceph/mon/ceph-fond-beagle/store.db# >>>>>>>>>>>> tail -f /var/log/ceph/ceph-mgr.fond-beagle.log >>>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v584: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:12.497-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v585: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:14.501-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v586: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:16.517-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v587: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:18.521-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v588: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:20.537-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) log [DBG] : pgmap v589: 2305 pgs: 4 >>>>>>>>>>>> active+undersized+degraded+remapped, 4 >>>>>>>>>>>> active+recovery_unfound+undersized+degraded+remapped, 2104 >>>>>>>>>>>> active+clean, 5 active+undersized+degraded, 34 >>>>>>>>>>>> incomplete, 154 unknown; 1.7 TiB data, 2.9 TiB used, >>>>>>>>>>>> 21 TiB / 24 TiB avail; 347248/2606900 objects degraded >>>>>>>>>>>> (13.320%); 107570/2606900 objects misplaced (4.126%); >>>>>>>>>>>> 19/404328 objects unfound (0.005%) >>>>>>>>>>>> 2020-10-26T12:54:22.541-0400 7f2a8112b700 0 >>>>>>>>>>>> log_channel(cluster) do_log log to syslog >>>>>>>>>>>> --------------- >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>>>>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>>>>>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>>>>>>>> >>>>>>>>>> Ok i will do that... but the thing is that the cluster not >>>>>>>>>> show recovering, not show that are doing nothing, like to >>>>>>>>>> show the recovering info on ceph -s command, and then i >>>>>>>>>> dont know if is recovering or doing what?

3 years, 6 months

1
0
0 0

OSD utilization vs PG shard sum

by Jonas Jelten

Hi! I'm creating a custom balancer (mail will follow soonish), and for it to work, I'm calculating OSD usages manually, with the goal to simulate PG movements. But the calculations don't match up, I'm missing some size component. So far I have: osd_used_bytes = sum(pg_shardsizes) But that does not add up: used = ceph osd df -> osd['kb_used'] occupied = pg shard sum 404 used=1.080T occupied=1.137T => 58.303G 405 used=1.031T occupied=1.089T => 59.255G 406 used=4.459T occupied=4.563T => 105.685G 407 used=4.414T occupied=4.433T => 19.751G 408 used=4.428T occupied=4.449T => 21.906G 409 used=4.440T occupied=4.417T => -23.441G 410 used=4.416T occupied=4.397T => -19.868G 411 used=4.446T occupied=4.488T => 42.905G 412 used=4.414T occupied=4.386T => -28.452G 413 used=4.439T occupied=4.461T => 23.326G Especially weird are the negative deltas, they mean the sum of pg shardsizes is more than the osd-reported size. Could this be compression? If yes, how can I get per-PG compression stats? Missing from the calculation is the bluefs_db_size (is that available via json? I could only find the daemon-socket perf counters and prometheus). It's around 2G for each OSD, and should not contribute much to the delta. In short: How do I reliably calculate the real OSD utilization when summing up sizes of currently mapped PGs? -- Jonas

3 years, 6 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2020