May 2020 - ceph-users - lists.ceph.io

ERROR: osd init failed: (1) Operation not permitted

by Ml Ml

Hello List, first of all: Yes - i made mistakes. Now i am trying to recover :-/ I had a healthy 3 node cluster which i wanted to convert to a single one. My goal was to reinstall a fresh 3 Node cluster and start with 2 nodes. I was able to healthy turn it from a 3 Node Cluster to a 2 Node cluster. Then the problems began. I started to change size=1 and min_size=1. Health was okay until here. Then over sudden both nodes got fenced...one node refused to boot, mons where missing, etc...to make long story short, here is where i am right now: root@node03:~ # ceph -s cluster b3be313f-d0ef-42d5-80c8-6b41380a47e3 health HEALTH_WARN 53 pgs stale 53 pgs stuck stale monmap e4: 2 mons at {0=10.15.15.3:6789/0,1=10.15.15.2:6789/0} election epoch 298, quorum 0,1 1,0 osdmap e6097: 14 osds: 9 up, 9 in pgmap v93644673: 512 pgs, 1 pools, 1193 GB data, 304 kobjects 1088 GB used, 32277 GB / 33366 GB avail 459 active+clean 53 stale+active+clean root@node03:~ # ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -1 32.56990 root default -2 25.35992 host node03 0 3.57999 osd.0 up 1.00000 1.00000 5 3.62999 osd.5 up 1.00000 1.00000 6 3.62999 osd.6 up 1.00000 1.00000 7 3.62999 osd.7 up 1.00000 1.00000 8 3.62999 osd.8 up 1.00000 1.00000 19 3.62999 osd.19 up 1.00000 1.00000 20 3.62999 osd.20 up 1.00000 1.00000 -3 7.20998 host node02 3 3.62999 osd.3 up 1.00000 1.00000 4 3.57999 osd.4 up 1.00000 1.00000 1 0 osd.1 down 0 1.00000 9 0 osd.9 down 0 1.00000 10 0 osd.10 down 0 1.00000 17 0 osd.17 down 0 1.00000 18 0 osd.18 down 0 1.00000 my main mistakes seemd to be: -------------------------------- ceph osd out osd.1 ceph auth del osd.1 systemctl stop ceph-osd@1 ceph osd rm 1 umount /var/lib/ceph/osd/ceph-1 ceph osd crush remove osd.1 As far as i can tell, ceph waits and needs data from that OSD.1 (which i removed) root@node03:~ # ceph health detail HEALTH_WARN 53 pgs stale; 53 pgs stuck stale pg 0.1a6 is stuck stale for 5086.552795, current state stale+active+clean, last acting [1] pg 0.142 is stuck stale for 5086.552784, current state stale+active+clean, last acting [1] pg 0.1e is stuck stale for 5086.552820, current state stale+active+clean, last acting [1] pg 0.e0 is stuck stale for 5086.552855, current state stale+active+clean, last acting [1] pg 0.1d is stuck stale for 5086.552822, current state stale+active+clean, last acting [1] pg 0.13c is stuck stale for 5086.552791, current state stale+active+clean, last acting [1] [...] SNIP [...] pg 0.e9 is stuck stale for 5086.552955, current state stale+active+clean, last acting [1] pg 0.87 is stuck stale for 5086.552939, current state stale+active+clean, last acting [1] When i try to start ODS.1 manually, i get: -------------------------------------------- 2020-02-10 18:48:26.107444 7f9ce31dd880 0 ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af), process ceph-osd, pid 10210 2020-02-10 18:48:26.134417 7f9ce31dd880 0 filestore(/var/lib/ceph/osd/ceph-1) backend xfs (magic 0x58465342) 2020-02-10 18:48:26.184202 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is supported and appears to work 2020-02-10 18:48:26.184209 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: FIEMAP ioctl is disabled via 'filestore fiemap' config option 2020-02-10 18:48:26.184526 7f9ce31dd880 0 genericfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_features: syncfs(2) syscall fully supported (by glibc and kernel) 2020-02-10 18:48:26.184585 7f9ce31dd880 0 xfsfilestorebackend(/var/lib/ceph/osd/ceph-1) detect_feature: extsize is disabled by conf 2020-02-10 18:48:26.309755 7f9ce31dd880 0 filestore(/var/lib/ceph/osd/ceph-1) mount: enabling WRITEAHEAD journal mode: checkpoint is not enabled 2020-02-10 18:48:26.633926 7f9ce31dd880 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2020-02-10 18:48:26.642185 7f9ce31dd880 1 journal _open /var/lib/ceph/osd/ceph-1/journal fd 20: 5367660544 bytes, block size 4096 bytes, directio = 1, aio = 1 2020-02-10 18:48:26.664273 7f9ce31dd880 0 <cls> cls/hello/cls_hello.cc:271: loading cls_hello 2020-02-10 18:48:26.732154 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400, adjusting msgr requires for clients 2020-02-10 18:48:26.732163 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400 was 8705, adjusting msgr requires for mons 2020-02-10 18:48:26.732167 7f9ce31dd880 0 osd.1 6002 crush map has features 1107558400, adjusting msgr requires for osds 2020-02-10 18:48:26.732179 7f9ce31dd880 0 osd.1 6002 load_pgs 2020-02-10 18:48:31.939810 7f9ce31dd880 0 osd.1 6002 load_pgs opened 53 pgs 2020-02-10 18:48:31.940546 7f9ce31dd880 -1 osd.1 6002 log_to_monitors {default=true} 2020-02-10 18:48:31.942471 7f9ce31dd880 1 journal close /var/lib/ceph/osd/ceph-1/journal 2020-02-10 18:48:31.969205 7f9ce31dd880 -1 ESC[0;31m ** ERROR: osd init failed: (1) Operation not permittedESC[0m Its mounted: /dev/sdg1 3.7T 127G 3.6T 4% /var/lib/ceph/osd/ceph-1 Is there any way i can get the OSD.1 back in? Thanks a lot, mario

3 years, 10 months

2
1
0 0

MAX AVAIL goes up when I reboot an OSD node

by Boris Behrens

Dear people on this mailing list, I've got the "problem" that our MAX AVAIL value increases by about 5-10 TB when I reboot a whole OSD node. After the reboot the value goes back to normal. I would love to know WHY. Under normal circumstances I would ignore this behavior, but because I am very new to the whole ceph software I would like to know why stuff like this happens. What I read is, that this value will be calculated by the most filled OSD. I've set noout and norebalance while the node is offline and I unset both values after the reboot. We are currently on nautilus. Cheers and thanks in advance Boris

3 years, 10 months

6
9
0 0

Re: Excessive write load on mons after upgrade from 12.2.13 -> 14.2.7

by Dan van der Ster

This means it has been applied: # ceph osd dump -f json | jq .require_osd_release "nautilus" -- dan On Mon, Feb 17, 2020 at 11:10 AM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > How do you check if you issued this command in the past? > > > -----Original Message----- > To: ceph-users(a)ceph.io > Subject: [ceph-users] Re: Excessive write load on mons after upgrade > from 12.2.13 -> 14.2.7 > > Hi Peter, > > could be a totally different problem but did you run the command "ceph > osd require-osd-release nautilus" after the upgrade? > We had poor performance after upgrading to nautilus and running this > command fixed it. The same was reported by others for previous updates. > Here is my original message regarding this issue: > https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/OYFRWSJXPV… > > We did not observe the master election problem though. > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 10 months

2
2
0 0

Re: Reducing RAM usage on production MDS

by Patrick Donnelly

On Wed, May 27, 2020 at 10:09 PM Dylan McCulloch <dmc(a)unimelb.edu.au> wrote: > > Hi all, > > The single active MDS on one of our Ceph clusters is close to running out of RAM. > > MDS total system RAM = 528GB > MDS current free system RAM = 4GB > mds_cache_memory_limit = 451GB > current mds cache usage = 426GB This mds_cache_memory_limit is way too high for the available RAM. We normally recommend that your RAM be 150% of your cache limit but we lack data for such large cache sizes. > Presumably we need to reduce our mds_cache_memory_limit and/or mds_max_caps_per_client, but would like some guidance on whether it’s possible to do that safely on a live production cluster when the MDS is already pretty close to running out of RAM. > > Cluster is Luminous - 12.2.12 > Running single active MDS with two standby. > 890 clients > Mix of kernel client (4.19.86) and ceph-fuse. > Clients are 12.2.12 (398) and 12.2.13 (3) v12.2.12 has the changes necessary to throttle MDS cache size reduction. You should be able to reduce mds_cache_memory_limit to any lower value without destabilizing the cluster. > The kernel clients have stayed under “mds_max_caps_per_client”: “1048576". But the ceph-fuse clients appear to hold very large numbers according to the ceph-fuse asok. > e.g. > “num_caps”: 1007144398, > “num_caps”: 1150184586, > “num_caps”: 1502231153, > “num_caps”: 1714655840, > “num_caps”: 2022826512, This data from the ceph-fuse asok is actually the number of caps ever received, not the current number. I've created a ticket for this: https://tracker.ceph.com/issues/45749 Look at the data from `ceph tell mds.foo session ls` instead. > Dropping caches on the clients appears to reduce their cap usage but does not free up RAM on the MDS. The MDS won't free up RAM until the cache memory limit is reached. > What is the safest method to free cache and reduce RAM usage on the MDS in this situation (without having to evict or remount clients)? reduce mds_cache_memory_limit > I’m concerned that reducing mds_cache_memory_limit even in very small increments may trigger a large recall of caps and overwhelm the MDS. That used to be the case in older versions of Luminous but not any longer. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

3 years, 10 months

2
1
0 0

dealing with spillovers

by thoralf schulze

hi there, trying to get around my head rocksdb spillovers and how to deal with them … in particular, i have one osds which does not have any pools associated (as per ceph pg ls-by-osd $osd ), yet it does show up in ceph health detail as: osd.$osd spilled over 2.9 MiB metadata from 'db' device (49 MiB used of 37 GiB) to slow device compaction doesn't help. i am well aware of https://tracker.ceph.com/issues/38745 , yet find it really counter-intuitive that an empty osd with a more-or-less optimal sized db volume can't fit its rockdb on the former. is there any way to repair this, apart from re-creating the osd? fwiw, dumping the database with ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$osd dump > bluestore_kv.dump yields a file of less than 100mb in size. and, while we're at it, a few more related questions: - am i right to assume that the leveldb and rocksdb arguments to ceph-kvstore-tool are only relevant for osds with filestore-backend? - does ceph-kvstore-tool bluestore-kv … also deal with rocksdb-items for osds with bluestore-backend? thank you very much & with kind regards, thoralf.

3 years, 10 months

4
5
0 0

Cephadm Hangs During OSD Apply

by m＠silvenga.com

Hi, trying to migrate a second ceph cluster to Cephadm. All the host successfully migrated from "legacy" except one of the OSD hosts (cephadm kept duplicating osd ids e.g. two "osd.5", still not sure why). To make things easier, we re-provisioned the node (reinstalled from netinstall, applied the same SaltStack traits as the other nodes, wiped the disks) and tried to use cephadm to setup the OSD's. So, orch correctly starts the provisioning processes (a docker container running ceph-volume is created). But the provisioning never completes (docker exec): # ps axu root 1 0.1 0.2 99272 22488 ? Ss 15:26 0:01 /usr/libexec/platform-python -s /usr/sbin/ceph-volume lvm batch --no-auto /dev/sdb /dev/sdc --dmcrypt --yes --no-systemd root 807 0.9 0.5 154560 44120 ? S<L 15:26 0:06 /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4 Afr6Ct-ok4h-pBEy-GfFF-xxYl-EKwi-cHhjZc # cat /var/log/ceph/ceph-volume.log Running command: /usr/sbin/cryptsetup --batch-mode --key-file - luksFormat /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4 Running command: /usr/sbin/cryptsetup --key-file - --allow-discards luksOpen /dev/ceph-851cae40-3270-45ea-b788-be6e05465e92/osd-data-e3157b54-f6b9-4ec9-ab12-e289f52c00a4 Afr6Ct-ok4h-pBEy-GfFF-xxYl-EKwi-cHhjZc # docker ps 2956dec0450d ceph/ceph:v15 "/usr/sbin/ceph-volu…" 14 minutes ago Up 14 minutes condescending_nightingale # cat osd_spec_default.yaml service_type: osd service_id: osd_spec_default placement: host_pattern: '*' data_devices: all: true encrypted: true It looks like cephadm hangs on luksOpen. Is this expected (encryption is mentioned to be supported, outside of no documentation)?

3 years, 10 months

2
4
0 0

crashing OSDs: ceph_assert(h->file->fnode.ino != 1)

by Harald Staub

This is again about our bad cluster, with too much objects, and the hdd OSDs have a DB device that is (much) too small (e.g. 20 GB, i.e. 3 GB usable). Now several OSDs do not come up any more. Typical error message: /build/ceph-14.2.8/src/os/bluestore/BlueFS.cc: 2261: FAILED ceph_assert(h->file->fnode.ino != 1) Also just tried to add a few GB to the DB device (lvextend, ceph-bluestore-tool bluefs-bdev-expand), but this also crashes, also with this message. Options that helped us before (thanks Wido :-) do not help here, e.g. CEPH_ARGS="--bluestore-rocksdb-options compaction_readahead_size=0" ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$OSD compact Any ideas that I could try to save these OSDs? Cheers Harry

3 years, 10 months

3
10
0 0

ceph orch upgrade stuck at the beginning.

by Gencer W. Genç

Hi, I've 15.2.1 installed on all machines. On primary machine I executed ceph upgrade command: $ ceph orch upgrade start --ceph-version 15.2.2 When I check ceph -s I see this: progress: Upgrade to docker.io/ceph/ceph:v15.2.2 (30m) [=...........................] (remaining: 8h) It says 8 hours. It is already ran for 3 hours. No upgrade processed. It get stuck at this point. Is there any way to know why this has stuck? Thanks, Gencer.

3 years, 10 months

3
19
0 0

Cephadm Setup Query

by Shivanshi .

3 years, 10 months

2
1
0 0

Octopus 15.2.2 unable to make drives available (reject reason locked)...

by Marco Pizzolo

Hello, Hitting an issue with a new 15.2.2 deployment using cephadm. I am having a problem creating encrypted, 2 osds per device OSDs (they are NVMe). After removing and bootstrapping the cluster again, i am unable to create OSDs as they're locked. sgdisk, wipefs, zap all fail to leave the drives as available. Any help would be appreciated. Any comments on performance experiences with ceph in containers (cephadm deployed) vs bare metal (ceph-deploy) would be greatly appreciated as well. Thanks, Marco ceph orch device ls HOST PATH TYPE SIZE DEVICE AVAIL REJECT REASONS prdhcistonode01 /dev/nvme0n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_2006266528D1 False *locked* prdhcistonode01 /dev/nvme1n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_2006266534D9 False *locked* prdhcistonode01 /dev/nvme2n1 ssd 953G INTEL SSDPEKKF010T8_BTHH850215GA1P0E False *locked* prdhcistonode01 /dev/nvme3n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_200626651473 False *locked* prdhcistonode01 /dev/nvme4n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_2006266508FB False * locked* prdhcistonode01 /dev/nvme5n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_20062664E6E8 False *locked* prdhcistonode01 /dev/nvme6n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_200626653CC0 False * locked* prdhcistonode01 /dev/nvme7n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_1939243B797E False * locked* prdhcistonode01 /dev/nvme8n1 ssd 11.6T Micron_9300_MTFDHAL12T8TDR_200626652441 False *locked* lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme2n1 259:0 0 953.9G 0 disk ├─nvme2n1p1 259:1 0 512M 0 part /boot/efi └─nvme2n1p2 259:2 0 953.4G 0 part / nvme3n1 259:3 0 11.7T 0 disk └─ceph--5bd47cae--97b3--4cad--b010--215fd982497b-osd--data--e6045acd--a56d--41d2--a016--b8647b9a717a 253:1 0 11.7T 0 lvm nvme4n1 259:4 0 11.7T 0 disk └─ceph--bf7dbfb4--afe3--4391--9847--08e461bf6247-osd--data--12faafac--b695--4c30--b6d7--7046d8275d9f 253:0 0 11.7T 0 lvm nvme0n1 259:5 0 11.7T 0 disk └─ceph--1a5d8e23--ff7d--44c3--b6d2--de143fed2b7d-osd--block--b6593547--e99a--4add--8edd--5d0fb53254cd 253:2 0 11.7T 0 lvm nvme5n1 259:6 0 11.7T 0 disk └─ceph--7d85ff24--79c8--4792--a2c8--bb4908f77ff0-osd--data--fc4e9dbd--920f--41b8--8467--74e9dcbd57ca 253:3 0 11.7T 0 lvm nvme6n1 259:7 0 11.7T 0 disk └─ceph--d8c8652a--1cd8--4e10--a333--4ea10f3b5004-osd--data--9a70a549--3cba--4f0d--a13a--8465781a10e9 253:5 0 11.7T 0 lvm nvme8n1 259:8 0 11.7T 0 disk └─ceph--e1914f1c--2385--4c0c--9951--d4b9200b7164-osd--data--8876559c--6393--4fbc--821b--7ac74cfb5a54 253:7 0 11.7T 0 lvm nvme7n1 259:9 0 11.7T 0 disk └─ceph--3765b53a--75eb--489e--97e1--d6b03bc25532-osd--data--777638e0--a325--401d--a01d--459676871003 253:4 0 11.7T 0 lvm nvme1n1 259:10 0 11.7T 0 disk └─ceph--2124f206--2b50--41a1--8a3c--d47c1a909a3b-osd--block--88e4f1eb--73f4--4c83--b978--fe7cabc0c3e6 253:6 0 11.7T 0 lvm

3 years, 10 months

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2020