May 2021 - ceph-users - lists.ceph.io

by Szabo, Istvan (Agoda)

Hi, I had issue with the snaptrim after a hug amount of deleted data, it slows down the team operations due to the snaptrim and snaptrim_wait pgs. I've changed couple of things: debug_ms = 0/0 #default 0/5 osd_snap_trim_priority = 1 # default 5 osd_pg_max_concurrent_snap_trims = 1 # default 2 But didn't help. I've found this thread about buffered io and seems like it helped to them: https://forum.proxmox.com/threads/ceph-storage-all-pgs-snaptrim-every-night… I don't use swap on the OSD nodes, so I gave a try on 1 osd node and it caused basically the complete node's pg-s are degraded. Is it normal? I hope it will not rebalance the complete node because I don't have space for that. I changed it back but still slowly decreasing, so not sure this settings is correct or not or this behavior is good or not? 2021-05-14 12:18:11.447628 mon.2004 [WRN] Health check update: 3353/91976715 objects misplaced (0.004%) (OBJECT_MISPLACED) 2021-05-14 12:18:11.447640 mon.2004 [WRN] Health check update: Degraded data redundancy: 33078466/91976715 objects degraded (35.964%), 254 pgs degraded, 253 pgs undersized (PG_DEGRADED) Istvan Szabo Senior Infrastructure Engineer --------------------------------------------------- Agoda Services Co., Ltd. e: istvan.szabo(a)agoda.com<mailto:istvan.szabo@agoda.com> --------------------------------------------------- ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 11 months

4
17
0 0

Octopus MDS hang under heavy setfattr load

by Nigel Williams

One of my colleagues attempted to set quotas on a large number (some dozens) of users with the session below, but it caused the MDS to hang and reject client requests. Offending command was: cat recent-users | xargs -P16 -I% setfattr -n ceph.quota.max_bytes -v 8796093022208 /scratch/% Result was to hang /scratch and any other mounts managed by the same MDS on all clients. Status of ceph-mds while broken was: root@cnx-14:~# systemctl status ceph-mds@cnx-14 ● ceph-mds(a)cnx-14.service - Ceph metadata server daemon Loaded: loaded (/lib/systemd/system/ceph-mds@.service; indirect; vendor preset: enabled) Active: active (running) since Thu 2021-05-06 17:16:45 AEST; 1 weeks 3 days ago Main PID: 2385 (ceph-mds) Tasks: 23 CGroup: /system.slice/system-ceph\x2dmds.slice/ceph-mds(a)cnx-14.service └─2385 /usr/bin/ceph-mds -f --cluster ceph --id cnx-14 --setuser ceph --setgroup cephMay 13 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-13T06:25:01.724+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 13 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-13T06:25:01.736+1000 7f5444832700 -1 received signal: Hangup from (PID: 229281) UID: 0 May 14 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:01.992+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 14 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-14T06:25:02.004+1000 7f5444832700 -1 received signal: Hangup from (PID: 232464) UID: 0 May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.468+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 15 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-15T06:25:01.480+1000 7f5444832700 -1 received signal: Hangup from (PID: 236005) UID: 0 May 16 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:01.989+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 16 06:25:02 cnx-14 ceph-mds[2385]: 2021-05-16T06:25:02.001+1000 7f5444832700 -1 received signal: Hangup from (PID: 239260) UID: 0 May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.813+1000 7f5444832700 -1 received signal: Hangup from killall -q -1 ceph-mon ceph-m May 17 06:25:01 cnx-14 ceph-mds[2385]: 2021-05-17T06:25:01.829+1000 7f5444832700 -1 received signal: Hangup from (PID: 242044) UID: 0 Fix was to run: systemctl restart ceph-mds@cnx-14 Non parallelised run of xargs with sleep 1 between each iteration worked.

2 years, 11 months

1
0
0 0

Re: Limit memory of ceph-mgr

by mabi

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Monday, May 17, 2021 4:31 AM, Anthony D'Atri <anthony.datri(a)gmail.com> wrote: > You’re running on so small a node that 3.6GB is a problem?? Yes, I have hardware constraints based on the hardware where my hardware has maximum 8 GB of RAM as it is a Raspberry Pi 4. I am doing a proof of concept to see if it is possible to run a small cluster on this type of hardware. So it is no problem to me to scale horizontally (add more nodes) but scaling vertically (adding more RAM or CPU) is not possible. It is also a great way of learning and experiencing ceph and see what can be done in terms of optimization.

2 years, 11 months

1
0
0 0

after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

by Andrius Jurkus

Hello, I will try to keep it sad and short :) :( PS sorry if this dublicate I tried post it from web also. Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds. After data migration for few hours, 1 SSD failed, then another and another 1 by 1. Now I have cluster in pause and 5 failed SSD's, same host has SSD and HDD, but only SSD's are failing so I think this has to be balancing refiling or something bug and probably not upgrade bug. Cluster has been in pause for 4 hours and no more OSD's are failing. full trace https://pastebin.com/UxbfFYpb Now I m googling and learning but, Is there a way how to easily test lets say 15.2.XX version on osd without losing anything? Any help would be appreciated. Error start like this May 14 16:58:52 dragon-ball-radar systemd[1]: Starting Ceph osd.2 for 4e01640b-951b-4f75-8dca-0bad4faf1b11... May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 16:58:53.057836433 +0000 UTC m=+0.454352919 container create 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, GIT_BRANCH=HEAD, maintainer=D May 14 16:58:53 dragon-ball-radar systemd[1]: Started libcrun container. May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 16:58:53.3394116 +0000 UTC m=+0.735928098 container init 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, maintainer=Dimitri Savineau <dsav May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 16:58:53.446921192 +0000 UTC m=+0.843437626 container start 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, GIT_BRANCH=HEAD, org.label-sch May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 16:58:53.447050119 +0000 UTC m=+0.843566553 container attach 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, org.label-schema.name=CentOS May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25 --path /var/lib/ceph/osd/ceph-2 --no-mon-config May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/ln -snf /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25 /var/lib/ceph/osd/ceph-2/block May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/chown -R ceph:ceph /dev/dm-1 May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 May 14 16:58:53 dragon-ball-radar bash[113558]: --> ceph-volume lvm activate successful for osd ID: 2 May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 16:58:53.8147653 +0000 UTC m=+1.211281741 container died 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate) May 14 16:58:55 dragon-ball-radar podman[113650]: 2021-05-14 16:58:55.044964534 +0000 UTC m=+2.441480996 container remove 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, CEPH_POINT_RELEASE=-16.2.4, R May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 16:58:55.594265612 +0000 UTC m=+0.369978347 container create 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, RELEASE=HEAD, org.label-schema.build-d May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 16:58:55.864589286 +0000 UTC m=+0.640302021 container init 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, org.label-schema.schema-version=1.0, GIT May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 set uid:gid to 167:167 (ceph:ceph) May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 ceph version 16.2.4 (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process ceph-osd, pid 2 May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 pidfile_write: ignore empty --pid-file May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.896+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.45 kv 0.45 data 0.06 May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bluefs add_block_device bdev 1 path /var/lib/ceph/osd/ceph-2/block size 466 GiB May 14 16:58:55 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 /var/lib/ceph/osd/ceph-2/block) close May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 16:58:55.972267166 +0000 UTC m=+0.747979911 container start 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, ceph=True, GIT_REPO=https://github.com/ May 14 16:58:55 dragon-ball-radar bash[113558]: 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da May 14 16:58:55 dragon-ball-radar systemd[1]: Started Ceph osd.2 for 4e01640b-951b-4f75-8dca-0bad4faf1b11. May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.184+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 /var/lib/ceph/osd/ceph-2/block) close May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.444+0000 7fcf16aa2080 1 objectstore numa_node 0 May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.444+0000 7fcf16aa2080 0 starting osd.2 osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4 address in networks '10.0.199.0/24' interfaces '' May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4 address in networks '172.16.199.0/24' interfaces '' May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.452+0000 7fcf16aa2080 0 load: jerasure load: lrc load: isa May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 /var/lib/ceph/osd/ceph-2/block) open path /var/lib/ceph/osd/ceph-2/block May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size 3221225472 meta 0.45 kv 0.45 data 0.06 May 14 16:58:56 dragon-ball-radar conmon[113957]: debug 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 /var/lib/ceph/osd/ceph-2/block) close

2 years, 11 months

4
4
1 0

v16.2.4 Pacific released

by David Galloway

This is a hotfix release addressing a number of security issues and regressions. We recommend all users update to this release. For a detailed release notes with links & changelog please refer to the official blog entry at https://ceph.io/releases/v16-2-4-pacific-released Getting Ceph ------------ * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-16.2.4.tar.gz * For packages, see https://docs.ceph.com/docs/master/install/get-packages/ * Release git sha1: 3cbe25cde3cfa028984618ad32de9edc4c1eaed0

2 years, 11 months

2
1
0 0

CephFS Snaptrim stuck?

by Andras Sali

Dear Ceph Users, We are experiencing a strange behaviour on Ceph v15.2.9 that a set of PGs seem to be stuck in active + clean + snaptrim state. (for almost a day now) Usually snaptrim is quite fast (done in a few minutes), however now in the osd logs we see slowly increasing trimq numbers, with such entries coming constantly (every few seconds): 2021-05-16T13:58:28.795+0000 7fc668f2e700 -1 osd.9 pg_epoch: 91137 pg[2.39( v 91137'1584600 (91059'1581379,91137'1584600] local-lis/les=91119/91120 n=115714 ec=165/115 lis/c=91119/91119 les/c/f=91120/91120/0 sis=91119) [9,10,3] r=0 lpr=91119 luod=91137'1584598 crt=91137'1584600 lcod 91137'1584597 mlcod 91137'1584597 active+clean+snaptrim* trimq=82* ps=[75fe~1,7600~1,868a~1,8caa~1,a30c~1,a422~1,a65e~1,c0cf~1,c569~1]] removing snap head We tried restarting the OSD-s, but no difference. Otherwise the cluster reports itself as being healthy. If anyone has any ideas what might be causing this and how to get these PGs finished with the snaptrim state, that would be very much appreciated. Kind regards, András Sali

2 years, 11 months

1
0
0 0

dedicated metadata servers

by mabi

Hello, On my small Octopus 6 nodes cluster I have only 8 GB per node available and I am co-locating the active and standby MDS on two out of the 3 OSD nodes. But because memory is tight (8GB per node max, limited due to hardware constraints) I was thinking to add two new 8GB nodes dedicated only for MDS so they can benefit of more RAM. So my question is, is this a good idea? Are there any downsides of having MDS active and passive on two dedicated nodes? Regards, Mabi

2 years, 11 months

2
1
0 0

radosgw lost config during upgrade 14.2.16 -> 21

by Jan Kasprzak

Hello, I have just upgraded my cluster from 14.2.16 to 14.2.21, and after the upgrade, radosgw was listening on the default port 7480 instead of the SSL port it used before the upgrade. It might be I mishandled "ceph config assimilate-conf" previously or forgot to restart radosgw after the assimilate-conf or something. What is the correct way to store radosgw configuration in ceph config? I have the following (which I think worked previously, but I might be wrong, e.g. forgot to restart radosgw or something): # ceph config dump [...] client.rgw.<myhost> basic rgw_frontends beast ssl_port=<myport> ssl_certificate=/etc/pki/tls/certs/<myhost>.crt+bundle ssl_private_key=/etc/pki/tls/private/<myhost>.key * However, after rgw startup, there was the following in /var/log/ceph/ceph-client.rgw.<myhost>.log: 2021-05-14 21:38:35.075 7f6ffd621900 1 mgrc service_daemon_register rgw.<myhost> metadata {arch=x86_64,ceph_release=nautilus,ceph_version=ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6) nautilus (stable),ceph_version_short=14.2.21,cpu=AMD ...,distro=centos,distro_description=CentOS Linux 7 (Core),distro_version=7,frontend_config#0=beast port=7480,frontend_type#0=beast,hostname=<myhost>,kernel_description=#1 SMP ...,kernel_version=...,mem_swap_kb=...,mem_total_kb=...,num_handles=1,os=Linux,pid=20451,zone_id=...,zone_name=default,zonegroup_id=...,zonegroup_name=default} (note the port=7480 and no SSL). After adding the following into /etc/ceph/ceph.conf on the host where rgw is running, it started to use the correct SSL port again: [client.rgw.<myhost>] rgw_frontends = beast ssl_port=<myport> ssl_certificate=/etc/pki/tls/certs/<myhost>.crt+bundle ssl_private_key=/etc/pki/tls/private/<myhost>.key How can I configure this using "ceph config"? Thanks, -Yenya -- | Jan "Yenya" Kasprzak <kas at {fi.muni.cz - work | yenya.net - private}> | | http://www.fi.muni.cz/~kas/ GPG: 4096R/A45477D5 | We all agree on the necessity of compromise. We just can't agree on when it's necessary to compromise. --Larry Wall

2 years, 11 months

2
1
0 0

Re: after upgrade to 16.2.3 16.2.4 and after adding few hdd's OSD's started to fail 1 by 1.

by Neha Ojha

You are welcome! We still need to get to the bottom of this, I will update the tracker to make a note of this occurrence. Thanks, Neha On Fri, May 14, 2021 at 12:25 PM Andrius Jurkus <andrius.jurkus(a)unishop.lt> wrote: > > Big thanks, Much appreciated help. > > It probably is same bug. > > bluestore_allocator = bitmap > > by setting this parameter all failed OSD started. > > Thanks again! > > On 2021-05-14 21:09, Neha Ojha wrote: > > On Fri, May 14, 2021 at 10:47 AM Andrius Jurkus > > <andrius.jurkus(a)unishop.lt> wrote: > >> > >> Hello, I will try to keep it sad and short :) :( PS sorry if this > >> dublicate I tried post it from web also. > >> > >> Today I upgraded from 16.2.3 to 16.2.4 and added few hosts and osds. > >> After data migration for few hours, 1 SSD failed, then another and > >> another 1 by 1. Now I have cluster in pause and 5 failed SSD's, same > >> host has SSD and HDD, but only SSD's are failing so I think this has > >> to > >> be balancing refiling or something bug and probably not upgrade bug. > >> > >> Cluster has been in pause for 4 hours and no more OSD's are failing. > >> > >> full trace > >> https://pastebin.com/UxbfFYpb > > > > This looks very similar to https://tracker.ceph.com/issues/50656. > > Adding Igor for more ideas. > > > > Neha > > > >> > >> Now I m googling and learning but, Is there a way how to easily test > >> lets say 15.2.XX version on osd without losing anything? > >> > >> Any help would be appreciated. > >> > >> Error start like this > >> > >> May 14 16:58:52 dragon-ball-radar systemd[1]: Starting Ceph osd.2 for > >> 4e01640b-951b-4f75-8dca-0bad4faf1b11... > >> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:53.057836433 +0000 UTC m=+0.454352919 container create > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, > >> GIT_BRANCH=HEAD, maintainer=D > >> May 14 16:58:53 dragon-ball-radar systemd[1]: Started libcrun > >> container. > >> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:53.3394116 +0000 UTC m=+0.735928098 container init > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, > >> maintainer=Dimitri Savineau <dsav > >> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:53.446921192 +0000 UTC m=+0.843437626 container start > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, > >> GIT_BRANCH=HEAD, org.label-sch > >> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:53.447050119 +0000 UTC m=+0.843566553 container attach > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, > >> org.label-schema.name=CentOS > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev > >> /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25 > >> --path /var/lib/ceph/osd/ceph-2 --no-mon-config > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/ln -snf > >> /dev/ceph-45e6ef2e-fbdc-4289-a900-3d1ffc81ee14/osd-block-973cfe73-06c8-4ea0-9aea-1361d063eb25 > >> /var/lib/ceph/osd/ceph-2/block > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/chown -h ceph:ceph /var/lib/ceph/osd/ceph-2/block > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/chown -R ceph:ceph /dev/dm-1 > >> May 14 16:58:53 dragon-ball-radar bash[113558]: Running command: > >> /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-2 > >> May 14 16:58:53 dragon-ball-radar bash[113558]: --> ceph-volume lvm > >> activate successful for osd ID: 2 > >> May 14 16:58:53 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:53.8147653 +0000 UTC m=+1.211281741 container died > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate) > >> May 14 16:58:55 dragon-ball-radar podman[113650]: 2021-05-14 > >> 16:58:55.044964534 +0000 UTC m=+2.441480996 container remove > >> 3b44520aa651b8196cd0bf0c96daa2bd03845ef5f8cfaf9a689410a1f98d84dd > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2-activate, > >> CEPH_POINT_RELEASE=-16.2.4, R > >> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 > >> 16:58:55.594265612 +0000 UTC m=+0.369978347 container create > >> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, RELEASE=HEAD, > >> org.label-schema.build-d > >> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 > >> 16:58:55.864589286 +0000 UTC m=+0.640302021 container init > >> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, > >> org.label-schema.schema-version=1.0, GIT > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 set uid:gid to 167:167 > >> (ceph:ceph) > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 ceph version 16.2.4 > >> (3cbe25cde3cfa028984618ad32de9edc4c1eaed0) pacific (stable), process > >> ceph-osd, pid 2 > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 0 pidfile_write: ignore > >> empty > >> --pid-file > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.896+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 > >> /var/lib/ceph/osd/ceph-2/block) open path > >> /var/lib/ceph/osd/ceph-2/block > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 > >> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, > >> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 > >> bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size > >> 3221225472 meta 0.45 kv 0.45 data 0.06 > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 > >> /var/lib/ceph/osd/ceph-2/block) open path > >> /var/lib/ceph/osd/ceph-2/block > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 > >> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, > >> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bluefs add_block_device > >> bdev > >> 1 path /var/lib/ceph/osd/ceph-2/block size 466 GiB > >> May 14 16:58:55 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:55.900+0000 7fcf16aa2080 1 bdev(0x564ad3a8cc00 > >> /var/lib/ceph/osd/ceph-2/block) close > >> May 14 16:58:55 dragon-ball-radar podman[113909]: 2021-05-14 > >> 16:58:55.972267166 +0000 UTC m=+0.747979911 container start > >> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da > >> (image=docker.io/ceph/ceph@sha256:54e95ae1e11404157d7b329d0bef866ebbb214b195a009e87aae4eba9d282949, > >> name=ceph-4e01640b-951b-4f75-8dca-0bad4faf1b11-osd.2, ceph=True, > >> GIT_REPO=https://github.com/ > >> May 14 16:58:55 dragon-ball-radar bash[113558]: > >> 31364008fcb8b290643d6e892fba16d19618f5682f590373feabed23061749da > >> May 14 16:58:55 dragon-ball-radar systemd[1]: Started Ceph osd.2 for > >> 4e01640b-951b-4f75-8dca-0bad4faf1b11. > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.184+0000 7fcf16aa2080 1 bdev(0x564ad3a8c800 > >> /var/lib/ceph/osd/ceph-2/block) close > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 1 objectstore numa_node 0 > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 0 starting osd.2 osd_data > >> /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4 > >> address in networks '10.0.199.0/24' interfaces '' > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.444+0000 7fcf16aa2080 -1 unable to find any IPv4 > >> address in networks '172.16.199.0/24' interfaces '' > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.452+0000 7fcf16aa2080 0 load: jerasure load: lrc > >> load: isa > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 > >> /var/lib/ceph/osd/ceph-2/block) open path > >> /var/lib/ceph/osd/ceph-2/block > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 > >> /var/lib/ceph/osd/ceph-2/block) open size 500103643136 (0x7470800000, > >> 466 GiB) block_size 4096 (4 KiB) non-rotational discard supported > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 > >> bluestore(/var/lib/ceph/osd/ceph-2) _set_cache_sizes cache_size > >> 3221225472 meta 0.45 kv 0.45 data 0.06 > >> May 14 16:58:56 dragon-ball-radar conmon[113957]: debug > >> 2021-05-14T16:58:56.456+0000 7fcf16aa2080 1 bdev(0x564ad476e400 > >> /var/lib/ceph/osd/ceph-2/block) close > >> _______________________________________________ > >> ceph-users mailing list -- ceph-users(a)ceph.io > >> To unsubscribe send an email to ceph-users-leave(a)ceph.io > >> >

2 years, 11 months

1
0
0 0

Re: "No space left on device" when deleting a file

by Mark Schouten

[Resent because of incorrect ceph-users@ address..] On Tue, Mar 26, 2019 at 05:19:24PM +0000, Toby Darling wrote: > Hi Dan > > Thanks! > > ceph tell mds.ceph1 config set mds_bal_fragment_size_max 200000 > > got us running again. This helped me too. However, should I see num_strays decrease again? I'm running a `find -ls` over my CephFS tree.. -- Mark Schouten | Tuxis B.V. KvK: 74698818 | http://www.tuxis.nl/ T: +31 318 200208 | info(a)tuxis.nl

2 years, 11 months

1
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021