Hello,
my cluster is currently showing a metadata imbalance. Normally, all OSDs
have around 23GB metadata (META column), but 4 OSDs out of 56 have 34 GB
metadata. Compacting reduces the data for some OSDs, but not for others.
OSDs where the compaction worked quickly grow to back to 34GB.
Our cluster configuration:
* 8 nodes, each with 6 HDDs OSDs and 1 SSD used for blockdb and WAL
* k=4 m=2 EC
* v14.2.14
Normal OSD:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META
AVAIL %USE VAR PGS STATUS
40 hdd 11.09470 1.00000 11 TiB 8.6 TiB 8.4 TiB 1.3 GiB 23 GiB 2.5
TiB 77.15 1.01 130 up
Big OSD:
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META
AVAIL %USE VAR PGS STATUS
0 hdd 11.09499 1.00000 11 TiB 8.6 TiB 8.4 TiB 1.8 GiB 30 GiB 2.5
TiB 77.59 1.02 130 up
There are 56 OSDs in the cluster, 4 OSDs of which are bigger. These OSDs
are all in different hosts.
Why is that? Is that dangerous or could lead to problems such as
performance degrades?
Thanks,
Paul
Hi,
I'm still evaluating ceph 15.2.5 in a lab so the problem is not really hurting me, but I want to understand it and hopefully fix it. It is a good practice. To test the resilience of the cluster I try to break it by doing all kinds of things. Today I powered off (clean shutdown) one osd node and powered it back on. Last time I tried this there was no problem getting it back online. After a few minutes the cluster health was back to ok. This time it stayed degraded forever. I checked and noticed that the service osd.0 on the osd node was failing. So i used google and there people recommended to simply delete the osd and re-create it. I tried it and still can't get the osd back in service.
First I removed the osd:
[root@gedasvl02 ~]# ceph osd out 0
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
osd.0 is already out.
[root@gedasvl02 ~]# ceph auth del 0
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
Error EINVAL: bad entity name
[root@gedasvl02 ~]# ceph auth del osd.0
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
updated
[root@gedasvl02 ~]# ceph osd rm 0
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
removed osd.0
[root@gedasvl02 ~]# ceph osd tree
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.43658 root default
-7 0.21829 host gedaopl01
2 ssd 0.21829 osd.2 up 1.00000 1.00000
-3 0 host gedaopl02
-5 0.21829 host gedaopl03
3 ssd 0.21829 osd.3 up 1.00000 1.00000
Looks ok it's gone...
Then i zapped it:
[root@gedasvl02 ~]# ceph orch device zap gedaopl02 /dev/sdb --force
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
INFO:cephadm:/usr/bin/podman:stderr WARNING: The same type, major and minor should not be used for multiple devices.
INFO:cephadm:/usr/bin/podman:stderr --> Zapping: /dev/sdb
INFO:cephadm:/usr/bin/podman:stderr --> Zapping lvm member /dev/sdb. lv_path is /dev/ceph-3bf1bb28-0858-4464-a848-d7f56319b40a/osd-block-3a79800d-2a19-45d8-a850-82c6a8113323
INFO:cephadm:/usr/bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/ceph-3bf1bb28-0858-4464-a848-d7f56319b40a/osd-block-3a79800d-2a19-45d8-a850-82c6a8113323 bs=1M count=10 conv=fsync
INFO:cephadm:/usr/bin/podman:stderr stderr: 10+0 records in
INFO:cephadm:/usr/bin/podman:stderr 10+0 records out
INFO:cephadm:/usr/bin/podman:stderr 10485760 bytes (10 MB, 10 MiB) copied, 0.0314447 s, 333 MB/s
INFO:cephadm:/usr/bin/podman:stderr stderr:
INFO:cephadm:/usr/bin/podman:stderr --> Only 1 LV left in VG, will proceed to destroy volume group ceph-3bf1bb28-0858-4464-a848-d7f56319b40a
INFO:cephadm:/usr/bin/podman:stderr Running command: /usr/sbin/vgremove -v -f ceph-3bf1bb28-0858-4464-a848-d7f56319b40a
INFO:cephadm:/usr/bin/podman:stderr stderr: Removing ceph--3bf1bb28--0858--4464--a848--d7f56319b40a-osd--block--3a79800d--2a19--45d8--a850--82c6a8113323 (253:0)
INFO:cephadm:/usr/bin/podman:stderr stderr: Archiving volume group "ceph-3bf1bb28-0858-4464-a848-d7f56319b40a" metadata (seqno 5).
INFO:cephadm:/usr/bin/podman:stderr stderr: Releasing logical volume "osd-block-3a79800d-2a19-45d8-a850-82c6a8113323"
INFO:cephadm:/usr/bin/podman:stderr stderr: Creating volume group backup "/etc/lvm/backup/ceph-3bf1bb28-0858-4464-a848-d7f56319b40a" (seqno 6).
INFO:cephadm:/usr/bin/podman:stderr stdout: Logical volume "osd-block-3a79800d-2a19-45d8-a850-82c6a8113323" successfully removed
INFO:cephadm:/usr/bin/podman:stderr stderr: Removing physical volume "/dev/sdb" from volume group "ceph-3bf1bb28-0858-4464-a848-d7f56319b40a"
INFO:cephadm:/usr/bin/podman:stderr stdout: Volume group "ceph-3bf1bb28-0858-4464-a848-d7f56319b40a" successfully removed
INFO:cephadm:/usr/bin/podman:stderr Running command: /usr/bin/dd if=/dev/zero of=/dev/sdb bs=1M count=10 conv=fsync
INFO:cephadm:/usr/bin/podman:stderr stderr: 10+0 records in
INFO:cephadm:/usr/bin/podman:stderr 10+0 records out
INFO:cephadm:/usr/bin/podman:stderr stderr: 10485760 bytes (10 MB, 10 MiB) copied, 0.0355641 s, 295 MB/s
INFO:cephadm:/usr/bin/podman:stderr --> Zapping successful for: <Raw Device: /dev/sdb>
And re-added it:
[root@gedasvl02 ~]# ceph orch daemon add osd gedaopl02:/dev/sdb
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
Created osd(s) 0 on host 'gedaopl02'
But the osd is still out...
[root@gedasvl02 ~]# ceph osd tree
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.43658 root default
-7 0.21829 host gedaopl01
2 ssd 0.21829 osd.2 up 1.00000 1.00000
-3 0 host gedaopl02
-5 0.21829 host gedaopl03
3 ssd 0.21829 osd.3 up 1.00000 1.00000
0 0 osd.0 down 0 1.00000
Looking at the cluster log in the webui i see the following error:
Failed to apply osd.dashboard-admin-1606745745154 spec DriveGroupSpec(name=dashboard-admin-1606745745154->placement=PlacementSpec(host_pattern='*'), service_id='dashboard-admin-1606745745154', service_type='osd', data_devices=DeviceSelection(size='223.6GB', rotational=False, all=False), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False): No filters applied Traceback (most recent call last): File "/usr/share/ceph/mgr/cephadm/module.py", line 2108, in _apply_all_services if self._apply_service(spec): File "/usr/share/ceph/mgr/cephadm/module.py", line 2005, in _apply_service self.osd_service.create_from_spec(cast(DriveGroupSpec, spec)) File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 43, in create_from_spec ret = create_from_spec_one(self.prepare_drivegroup(drive_group)) File "/usr/share/ceph/mgr/cephadm/services/osd.py", line 127, in prepare_drivegroup drive_selection = DriveSelection(drive_group, inventory_for_host) File "/lib/python3.6/site-packages/ceph/deployment/drive_selection/selector.py", line 32, in __init__ self._data = self.assign_devices(self.spec.data_devices) File "/lib/python3.6/site-packages/ceph/deployment/drive_selection/selector.py", line 138, in assign_devices if not all(m.compare(disk) for m in FilterGenerator(device_filter)): File "/lib/python3.6/site-packages/ceph/deployment/drive_selection/selector.py", line 138, in <genexpr> if not all(m.compare(disk) for m in FilterGenerator(device_filter)): File "/lib/python3.6/site-packages/ceph/deployment/drive_selection/matchers.py", line 410, in compare raise Exception("No filters applied") Exception: No filters applied
I have another error "pgs undersized", maybe this is also causing trouble?
[root@gedasvl02 ~]# ceph -s
INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af
INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config
INFO:cephadm:Using recent ceph image docker.io/ceph/ceph:v15
cluster:
id: d0920c36-2368-11eb-a5de-005056b703af
health: HEALTH_WARN
Degraded data redundancy: 13142/39426 objects degraded (33.333%), 176 pgs degraded, 225 pgs undersized
services:
mon: 1 daemons, quorum gedasvl02 (age 2w)
mgr: gedasvl02.vqswxg(active, since 2w), standbys: gedaopl02.yrwzqh
mds: cephfs:1 {0=cephfs.gedaopl01.zjuhem=up:active} 1 up:standby
osd: 3 osds: 2 up (since 4d), 2 in (since 94m)
task status:
scrub status:
mds.cephfs.gedaopl01.zjuhem: idle
data:
pools: 7 pools, 225 pgs
objects: 13.14k objects, 77 GiB
usage: 148 GiB used, 299 GiB / 447 GiB avail
pgs: 13142/39426 objects degraded (33.333%)
176 active+undersized+degraded
49 active+undersized
io:
client: 0 B/s rd, 6.1 KiB/s wr, 0 op/s rd, 0 op/s wr
Best Regards,
Oliver
This is the 7th backport release in the Octopus series. This release fixes
a serious bug in RGW that has been shown to cause data loss when a read of
a large RGW object (i.e., one with at least one tail segment) takes longer than
one half the time specified in the configuration option `rgw_gc_obj_min_wait`.
The bug causes the tail segments of that read object to be added to the RGW
garbage collection queue, which will in turn cause them to be deleted after
a period of time.
Changelog
---------
* rgw: during GC defer, prevent new GC enqueue
Getting Ceph
------------
* Git at git://github.com/ceph/ceph.git
* Tarball at http://download.ceph.com/tarballs/ceph-15.2.7.tar.gz
* For packages, see http://docs.ceph.com/docs/master/install/get-packages/
* Release git sha1: 88e41c6c49beb18add4fdb6b4326ca466d931db8