January 2024 - ceph-users

Debian 12 (bookworm) / Reef 18.2.1 problems

by Chris Palmer

I was delighted to see the native Debian 12 (bookworm) packages turn up in Reef 18.2.1. We currently run a number of ceph clusters on Debian11 (bullseye) / Quincy 17.2.7. These are not cephadm-managed. I have attempted to upgrade a test cluster, and it is not going well. Quincy only supports bullseye, and Reef only supports bookworm, we are reinstalling from bare metal. However I don't think either of these two problems are related to that. Problem 1 -------------- A simple "apt install ceph" goes most of the way, then errors with Setting up cephadm (18.2.1-1~bpo12+1) ... usermod: unlocking the user's password would result in a passwordless account. You should set a password with usermod -p to unlock this user's password. mkdir: cannot create directory ‘/home/cephadm/.ssh’: No such file or directory dpkg: error processing package cephadm (--configure): installed cephadm package post-installation script subprocess returned error exit status 1 dpkg: dependency problems prevent configuration of ceph-mgr-cephadm: ceph-mgr-cephadm depends on cephadm; however: Package cephadm is not configured yet. dpkg: error processing package ceph-mgr-cephadm (--configure): dependency problems - leaving unconfigured The two cephadm-related packages are then left in an error state, which apt tries to continue each time it is run. The cephadm user has a login directory of /nonexistent, however the cephadm --configure script is trying to use /home/cephadm (as it was on Quincy/bullseye). So, we aren't using cephadm, and decided to keep going as the other packages were actually installed, and deal with the package state later. Problem 2 --------------- I upgraded 2/3 monitor nodes without any other problems, and (for the moment) removed the other Quincy monitor prior to rebuild. I then shutdown the remaining Quincy manager, and attempted to start the Reef manager. Although the manager is running, "ceph mgr services" shows it is only providing the restful and not the dashboard service. The log file has lots of the following error: ImportError: PyO3 modules may only be initialized once per interpreter process and ceph -s reports "Module 'dashboard' has failed dependency: PyO3 modules may only be initialized once per interpreter process Questions --------------- 1. Have the Reef/bookworm packages ever been tested in a non-cephadm environment? 2. I want to revert this cluster back to a fully functional state. I cannot bring back up the remaining Quincy monitor though ("require release 18 > 17"). Would I have to go through the procedure of starting over, and trying to rescue the monmap from the OSDs? (OSDs and an active MDS are still up and running Quincy). I'm aware that process exists but have never had to delve into it. Thanks, Chris

4 months, 2 weeks

3
5
0 0

Adding OSD's results in slow ops, inactive PG's

by Ruben Vestergaard

Hi We have a cluster with which currently looks like so: services: mon: 5 daemons, quorum lazy,jolly,happy,dopey,sleepy (age 13d) mgr: jolly.tpgixt(active, since 25h), standbys: dopey.lxajvk, lazy.xuhetq mds: 1/1 daemons up, 2 standby osd: 449 osds: 425 up (since 15m), 425 in (since 5m); 5104 remapped pgs data: volumes: 1/1 healthy pools: 13 pools, 11153 pgs objects: 304.11M objects, 988 TiB usage: 1.6 PiB used, 1.4 PiB / 2.9 PiB avail pgs: 6/1617270006 objects degraded (0.000%) 366696947/1617270006 objects misplaced (22.674%) 6043 active+clean 5041 active+remapped+backfill_wait 66 active+remapped+backfilling 2 active+recovery_wait+degraded+remapped 1 active+recovering+degraded It's currently rebalancing after adding a node, but this rebalance has been rather slow -- right now it's running 66 backfills, but it seems to stabilize around 8 backfills eventually. We figured that perhaps adding another node might speed things up. Immediately upon adding the node, we get slow ops and inactive PG's. Removing the new node gets us back in working order. It turns out that even adding 1 OSD breaks the cluster, and immediately sends it here: [WRN] PG_DEGRADED: Degraded data redundancy: 6/1617265712 objects degraded (0.000%), 3 pgs degraded pg 37.c8 is active+recovery_wait+degraded+remapped, acting [410,163,236,209,7,283,155,143,78] pg 37.1a1 is active+recovering+degraded, acting [234,424,163,74,22,128,177,153,181] pg 37.1da is active+recovery_wait+degraded+remapped, acting [163,408,230,190,93,284,50,78,44] [WRN] SLOW_OPS: 22 slow ops, oldest one blocked for 54 sec, daemons [osd.11,osd.110,osd.112,osd.117,osd.120,osd.123,osd.13,osd.136,osd.144,osd.157]... have slow ops. The OSD added had number 431, so it does not appear to be the immediate cause of the slow ops, however, removing 431 immediately clears the problem. We thought we might be experiencing 'Crush giving up too soon' symptoms [1], as we have seen similar behaviour on another pool, but it does not appear to be the case here. We went through the motions described on the page and everything looked OK. At least one pool which stops working is a 4+2 EC pool, placed on spinning rust, some 200-ish disks distributed across 13 nodes. I'm not sure if other pools break, but that particular 4+2 EC pool is rather important so I'm a little wary of experimenting blindly. Any thoughts on where to look next? Thanks, Ruben Vestergaard [1] https://docs.ceph.com/en/reef/rados/troubleshooting/troubleshooting-pg/#cru…

4 months, 2 weeks

4
9
0 0

minimal permission set for an rbd client

by cek+ceph＠deepunix.net

I'm following the guide @ https://docs.ceph.com/en/latest/rbd/rados-rbd-cmds/ but I'm not following why would an `mgr` permission be required to have a functioning RBD client? Thanks.

4 months, 2 weeks

2
1
1 0

recommendation for barebones server with 8-12 direct attach NVMe?

by Drew Weaver

Hello, So we were going to replace a Ceph cluster with some hardware we had laying around using SATA HBAs but I was told that the only right way to build Ceph in 2023 is with direct attach NVMe. Does anyone have any recommendation for a 1U barebones server (we just drop in ram disks and cpus) with 8-10 2.5" NVMe bays that are direct attached to the motherboard without a bridge or HBA for Ceph specifically? Thanks, -Drew

4 months, 2 weeks

7
19
0 0

Re: Stuck in upgrade process to reef

by Igor Fedotov

Hi Jan, I've just fired an upstream ticket for your case, see https://tracker.ceph.com/issues/64053 for more details. You might want to tune (or preferably just remove) your custom bluestore_cache_.*_ratio settings to fix the issue. This is reproducible and fixable in my lab this way. Hope this helps. Thanks, Igor On 15/01/2024 12:54, Jan Marek wrote: > Hi Igor, > > I've tried to start ceph-sod daemon as you advice me and I'm > sending log osd.1.start.log > > About memory: According to 'top' podman ceph daemon don't reach > 2% of whole server memory (64GB)... > > I have switch on autotune of memory... > > My ceph config dump - see attached dump.txt > > Sincerely > Jan Marek > > Dne Čt, led 11, 2024 at 04:02:02 CET napsal(a) Igor Fedotov: >> Hi Jan, >> >> unfortunately this wasn't very helpful. Moreover the log looks a bit messy - >> looks like a mixture of outputs from multiple running instances or >> something. I'm not an expert in using containerized setups though. >> >> Could you please simplify things by running ceph-osd process manually like >> you did for ceph-objectstore-tool. And enforce log output to a file. Command >> line should look somewhat the following: >> >> ceph-osd -i 0 --log-to-file --log-file <some-file> --debug-bluestore 5/20 >> --debug-prioritycache 10 >> >> Please don't forget to run repair prior to that. >> >> >> Also you haven't answered my questions about custom [memory] settings and >> RAM usage during OSD startup. It would be nice to hear some feedback. >> >> >> Thanks, >> >> Igor >> >> On 11/01/2024 16:47, Jan Marek wrote: >>> Hi Igor, >>> >>> I've tried to start osd.1 with debug_prioritycache and >>> debug_bluestore 5/20, see attached file... >>> >>> Sincerely >>> Jan >>> >>> Dne St, led 10, 2024 at 01:03:07 CET napsal(a) Igor Fedotov: >>>> Hi Jan, >>>> >>>> indeed this looks like some memory allocation problem - may be OSD's RAM >>>> usage threshold reached or something? >>>> >>>> Curious if you have any custom OSD settings or may be any memory caps for >>>> Ceph containers? >>>> >>>> Could you please set debug_bluestore to 5/20 and debug_prioritycache to 10 >>>> and try to start OSD once again. Please monitor process RAM usage along the >>>> process and share the resulting log. >>>> >>>> >>>> Thanks, >>>> >>>> Igor >>>> >>>> On 10/01/2024 11:20, Jan Marek wrote: >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io

4 months, 2 weeks

2
2
0 0

Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?

by Götz Reinicke

Hi, As I’v read and thought a lot about the migration as this is a bigger project, I was wondering if anyone has done that already and might share some notes or playbooks, because in all readings there where some parts missing or miss understandable to me. I do have some different approaches in mind, so may be you have some suggestions or hints. a) upgrade nautilus on centos 7 with the few missing features like dashboard and prometheus. After that migrate one node after an other to ubuntu 20.04 with octopus and than upgrade ceph to the recent stable version. b) migrate one node after an other to ubuntu 18.04 with nautilus and then upgrade to octupus and after that to ubuntu 20.04. or c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to the cluster until all nodes are upgraded. For test I tried c) with a mon node, but adding that to the cluster fails with some failed state, still probing for the other mons. (I dont have the right log at hand right now.) So my questions are: a) What would be the best (most stable) migration path and b) is it in general possible to add a new octopus mon (not upgraded one) to a nautilus cluster, where the other mons are still on nautilus? I hope my thoughts and questions are understandable :) Thanks for any hint and suggestion. Best . Götz

4 months, 2 weeks

7
11
0 0

ceph pg mark_unfound_lost delete results in confused ceph

by Oliver Dzombic

Hi, after osd.15 died in the wrong moment there is: #ceph health detail [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg stale pg 10.17 is stuck stale for 3d, current state stale+active+undersized+degraded, last acting [15] [WRN] PG_DEGRADED: Degraded data redundancy: 172/57063399 objects degraded (0.000%), 1 pg degraded, 1 pg undersized pg 10.17 is stuck undersized for 3d, current state stale+active+undersized+degraded, last acting [15] which will never resolv as there is no osd.15 anymore. So a ceph pg 10.17 mark_unfound_lost delete was executed. ceph seems to be a bit confused about pg 10.17 now: While this worked before, its not working anymore # ceph pg 10.17 query Error ENOENT: i don't have pgid 10.17 And while this was pointing to 15 the map changed now to 5 and 6 ( which is correct ): # ceph pg map 10.17 osdmap e14425 pg 10.17 (10.17) -> up [5,6] acting [5,6] According to ceph health, ceph assumes that osd.15 is still somehow in charge. The pg map seems to think that 10.17 is on osd.5 and osd.6 But pg 10.17 seems not to be really existing, as a query will fail. Any idea whats going wrong and howto fix this? Thank you! -- Mit freundlichen Gruessen / Best regards Oliver Dzombic Layer7 Networks mailto:info@layer7.net Anschrift: Layer7 Networks GmbH Zum Sonnenberg 1-3 63571 Gelnhausen HRB 96293 beim Amtsgericht Hanau Geschäftsführung: Oliver Dzombic UST ID: DE259845632

4 months, 2 weeks

1
1
0 0

[quincy 17.2.7] ceph orchestrator not doing anything

by Boris

Happy new year everybody. I just found out that the orchestrator in one of our clusters is not doing anything. What I tried until now: - disabling / enabling cephadm (no impact) - restarting hosts (no impact) - starting upgrade to same version (no impact) - starting downgrade (no impact) - forcefully removing hosts and adding them again (now I have no daemons anymore) - applying new configurations (no impact) The orchestrator just does nothing. Cluster itself is fine. I also checked the SSH connecability from all hosts to all hosts ( https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#ssh-errors) The logs always show a message like "took the task" but then nothing happens. Cheers Boris

4 months, 2 weeks

2
2
0 0

Email duplicates.

by Roman Pashin

4 months, 2 weeks

1
0
0 0

[v18.2.1] problem with wrong osd device symlinks after upgrade to 18.2.1

by Reto Gysi

Hi ceph community I noticed the following problem after upgrading my ceph instance on Debian 12.4 from 17.2.7 to 18.2.1: I had placed bluestore block.db for hdd osd's on raid1/mirrored logical volumes on 2 nvme devices, so that if a single block.db nvme device fails, that not all hdd osds fail. That worked fine under 17.2.7 and had no problems during host/osd restarts. During the upgrade to 18.2.1 the osd's wouldn't with the block.db on mirrored lv wouldn't start anymore because the block.db symlink was updated to pointing to the wrong device mapper device, and the osd startup failed with error message that block.db device is busy. OSD1: 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 bluestore(/var/lib/ceph/osd/ceph-1) _minimal_open_bluefs add block device(/var/lib/ceph/osd/ceph-1/block.db) returned: (16) Device or resource busy 2024-01-05T19:56:43.592+0000 7fdde9f43640 -1 bluestore(/var/lib/ceph/osd/ceph-1) _open_db failed to prepare db environment: 2024-01-05T19:56:43.592+0000 7fdde9f43640 1 bdev(0x55a2d5014000 /var/lib/ceph/osd/ceph-1/block) close 2024-01-05T19:56:43.892+0000 7fdde9f43640 -1 osd.1 0 OSD:init: unable to mount object store the symlink was updated to point to lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> /dev/mapper/optane-ceph--db--osd1_rimage_1_iorig the correct symlink would have been: lrwxrwxrwx 1 ceph ceph 111 Jan 5 20:57 block -> /dev/mapper/ceph--dec5bd7c--d84f--40d9--ba14--6bd8aadf2957-osd--block--cdd02721--6876--4db8--bdb2--12ac6c70127c lrwxrwxrwx 1 ceph ceph 48 Jan 5 20:57 block.db -> /dev/mapper/optane-ceph--db--osd1 To continue with the upgrade I converted one by one all the block.db lvm logical volumes back to linear volumes, and fixed the symlinks manually. converting the lv's back to linear was necessary, because even when I fixed the symlink manually, after a osd restart the symlink would be created wrong again if the block.db would point to a raid1 lv. Here's any example how the symlink looked before an osd was touched by the 18.2.1 upgrade: OSD2: lrwxrwxrwx 1 ceph ceph 93 Jan 4 03:38 block -> /dev/ceph-17a894d6-3a64-4e5e-9fa0-8dd3b5f4bf33/osd-block-3cd7a5af-9002-47a7-b4c2-540381d53be7 lrwxrwxrwx 1 ceph ceph 24 Jan 4 03:38 block.db -> /dev/optane/ceph-db-osd2 Here's what the output of lvs -a -o +devices looked like for OSD1 block.db device when it was an raid1 lv: LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices ceph-db-osd1 optane rwi-a-r--- 44.00g 100.00 ceph-db-osd1_rimage_0(0),ceph-db-osd1_rimage_1(0) [ceph-db-osd1_rimage_0] optane gwi-aor--- 44.00g [ceph-db-osd1_rimage_0_iorig] 100.00 ceph-db-osd1_rimage_0_iorig(0) [ceph-db-osd1_rimage_0_imeta] optane ewi-ao---- 428.00m /dev/sdg(55482) [ceph-db-osd1_rimage_0_imeta] optane ewi-ao---- 428.00m /dev/sdg(84566) [ceph-db-osd1_rimage_0_iorig] optane -wi-ao---- 44.00g /dev/sdg(9216) [ceph-db-osd1_rimage_0_iorig] optane -wi-ao---- 44.00g /dev/sdg(82518) [ceph-db-osd1_rimage_1] optane gwi-aor--- 44.00g [ceph-db-osd1_rimage_1_iorig] 100.00 ceph-db-osd1_rimage_1_iorig(0) [ceph-db-osd1_rimage_1_imeta] optane ewi-ao---- 428.00m /dev/sdj(55392) [ceph-db-osd1_rimage_1_imeta] optane ewi-ao---- 428.00m /dev/sdj(75457) [ceph-db-osd1_rimage_1_iorig] optane -wi-ao---- 44.00g /dev/sdj(9218) [ceph-db-osd1_rimage_1_iorig] optane -wi-ao---- 44.00g /dev/sdj(73409) [ceph-db-osd1_rmeta_0] optane ewi-aor--- 4.00m /dev/sdg(55388) [ceph-db-osd1_rmeta_1] optane ewi-aor--- 4.00m /dev/sdj(9217) It would be good if the symlinks were recreated pointing to the correct device even when they point to a raid1 lv. Not sure if this problem has been reported yet. Cheers Reto

4 months, 2 weeks

2
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024