We have a user-provisioned instance( Bare Metal Installation) of OpenShift cluster running on version 4.12 and we are using OpenShift Data Foundation as the Storage System. Earlier we had 3 disks attached to the storage system and 3 OSDs were available in the cluster. Today, while adding additional disks to the storage cluster, we increased the number of disks from 3 to 9, that is 3 per node. The addition of storage capacity was successful, resulting in 6 new OSDs in the cluster.
But, after this operation, we noticed that Rebuilding Data Resiliency is stuck at 5% and not moving forward. At the same time, ceph status shows 65% of objects are misplaced and PGs are not in active+clean state.
Here is more information about the ceph cluster:
sh-4.4$ ceph status
cluster:
id: 18bf836d-4937-4925-b964-7a026c1d548d
health: HEALTH_OK
services:
mon: 3 daemons, quorum b,u,v (age 2w)
mgr: a(active, since 7w)
mds: 1/1 daemons up, 1 hot standby
osd: 9 osds: 9 up (since 5h), 9 in (since 5h); 191 remapped pgs
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 305 pgs
objects: 2.69M objects, 2.9 TiB
usage: 8.8 TiB used, 27 TiB / 36 TiB avail
pgs: 4723077/8079717 objects misplaced (58.456%)
188 active+remapped+backfill_wait
114 active+clean
3 active+remapped+backfilling
io:
client: 679 KiB/s rd, 11 MiB/s wr, 13 op/s rd, 622 op/s wr
recovery: 20 MiB/s, 89 keys/s, 22 objects/s
sh-4.4$ ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.000276",
"last_optimize_started": "Tue Sep 12 17:36:03 2023",
"mode": "upmap",
"optimize_result": "Too many objects (0.581933 > 0.050000) are misplaced; try again later",
"plans": []
}
One more thing we observed is that the number of misplaced objects is decreasing and also there is a drop in the percentage. What might be the reason behind Rebuilding Data Resiliency is not moving forward?
Any inputs would be appreciated.
Thanks
Dear Ceph users,
I just upgraded my cluster to Reef, and with the new version came also a
revamped dashboard. Unfortunately the new dashboard is really awful to me:
1) it's no longer possible to see the status of the PGs: in the old
dashboard it was very easy to see e.g. how many PGs were recovering, how
many scrubbing etc. by clicking on the PG Status widget. Now the
interface shows just how many are Ok and how many are working, without
details, and I have to go to the command line to understand what's
happening (not really comfortable on mobile)
2) The new timeline graphs do not work properly: changing the time frame
sometimes produce empty graphs,
3) The instant values in Cluster utilization are refreshed so slowly
that I cannot properly monitor the cluster behavior in real time
Is it just me or maybe my impressions are shared by someone else? Is
there anything that can be done to improve the situation?
Thanks,
Nicola
Hi,
I currently try to adopt our stage cluster, some hosts just pull strange
images.
root@0cc47a6df330:/var/lib/containers/storage/overlay-images# podman ps
CONTAINER ID IMAGE COMMAND
CREATED STATUS PORTS NAMES
a532c37ebe42 docker.io/ceph/daemon-base:latest-master-devel -n
mgr.0cc47a6df3... 2 minutes ago Up 2 minutes ago
ceph-03977a23-f00f-4bb0-b9a7-de57f40ba853-mgr-0cc47a6df330-fxrfyl
root@0cc47a6df330:~# ceph orch ps
NAME HOST PORTS STATUS
REFRESHED AGE MEM USE MEM LIM VERSION IMAGE ID
CONTAINER ID
mgr.0cc47a6df14e.vqizdz 0cc47a6df14e.f00f.gridscale.dev *:9283 running
(3m) 3m ago 3m 10.8M - 16.2.11
de4b0b384ad4 00b02cd82a1c
mgr.0cc47a6df330.iijety 0cc47a6df330.f00f.gridscale.dev *:9283 running
(5s) 2s ago 4s 10.5M - 17.0.0-7183-g54142666
75e3d7089cea 662c6baa097e
mgr.0cc47aad8ce8 0cc47aad8ce8.f00f.gridscale.dev running
(65m) 8m ago 60m 553M - 17.2.6
22cd8daf4d70 8145c63fdc44
Any idea what I need to do to change that?
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
Hello,
We have a cluster with 21 nodes, each having 12 x 18TB, and 2 NVMe for db/wal.
We need to add more nodes.
The last time we did this, the PGs remained at 1024, so the number of PGs per OSD decreased.
Currently, we are at 43 PGs per OSD.
Does auto-scaling work correctly in Ceph version 17.2.5?
Should we increase the number of PGs before adding nodes?
Should we keep PG auto-scaling active?
If we disable auto-scaling, should we increase the number of PGs to reach 100 PGs per OSD?
Considering that we use this cluster with a large EC pool (8+3).
Thank you for your assistance.
Hey ceph-users,
I am running two (now) Quincy clusters doing RGW multi-site replication
with only one actually being written to by clients.
The other site is intended simply as a remote copy.
On the primary cluster I am observing an ever growing (objects and
bytes) "sitea.rgw.log" pool, not so on the remote "siteb.rgw.log" which
is only 300MiB and around 15k objects with no growth.
Metrics show that the growth of pool on primary is linear for at least 6
months, so not sudden spikes or anything. Also sync status appears to be
totally happy.
There are also no warnings in regards to large OMAPs or anything similar.
I was under the impression that RGW will trim its three logs (md, bi,
data) automatically and only keep data that has not yet been replicated
by the other zonegroup members?
The config option "ceph config get mgr rgw_sync_log_trim_interval" is
set to 1200, so 20 Minutes.
So I am wondering if there might be some inconsistency or how I can best
analyze what the cause for the accumulation of log data is?
There are older questions on the ML, such as [1], but there was not
really a solution or root cause identified.
I know there is manual trimming, but I'd rather want to analyze the
current situation and figure out what the cause for the lack of
auto-trimming is.
* Do I need to go through all buckets and count logs and look at
their timestamps? Which queries do make sense here?
* Is there usually any logging of the log trimming activity that I
should expect? Or that might indicate why trimming does not happen?
Regards
Christian
[1]
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/WZCFOAMLWV…
Hi Team,
Facing a similar situation, Any help would be appreciated.
Thanks once again for the support.
-Lokendra
On Tue, Sep 5, 2023 at 10:51 AM Kushagr Gupta <kushagrguptasps.mun(a)gmail.com>
wrote:
> *Ceph-version*: Quincy
> *OS*: Centos 8 stream
>
> *Issue*: Not able to find a standardized restoration procedure for
> subvolume snapshots.
>
> *Description:*
> Hi team,
>
> We are currently working in a 3-node ceph cluster.
> We are currently exploring the scheduled snapshot capability of the
> ceph-mgr module.
> To enable/configure scheduled snapshots, we followed the following link:
>
> https://docs.ceph.com/en/quincy/cephfs/snap-schedule/
>
> The scheduled snapshots are working as expected. But we are unable to find
> any standardized restoration procedure for the same.
>
> We have found the following link( not official documentation):
> https://www.suse.com/support/kb/doc/?id=000019627
>
> We have also found a link of cloning a new subvolume from snapshots:
> https://docs.ceph.com/en/reef/cephfs/fs-volumes/
> (Section: Cloning Snapshots)
>
> Is there a standard procedure to restore from a snapshot.
> By this I mean, is there some kind of command link maybe
> ceph fs subvolume snapshot restore <snapshot-name>
>
> Or any other procedure please let us know.
>
> Thanks and Regards,
> Kushagra Gupta
>
--
~ Lokendra
skype: lokendrarathour
Hi There,
I have a ceph cluster running on my proxmox system and it all seemed to upgrade successfully however after the reboot my ceph-mon and my ceph-osd services are failing to start or are crashing by the looks of it.
```
ceph version 17.2.6 (810db68029296377607028a6c6da1ec06f5a2b27) quincy (stable)
1: /lib/x86_64-linux-gnu/libc.so.6(+0x3bfd0) [0x7f10aba5afd0]
2: gf_init_hard()
3: gf_init_easy()
4: galois_init_default_field()
5: jerasure_init()
6: __erasure_code_init()
7: (ceph::ErasureCodePluginRegistry::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ceph::ErasureCodePlugin**, std::ostream*)+0x2b5) [0x55b04c32c605]
8: (ceph::ErasureCodePluginRegistry::preload(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::ostream*)+0x9f) [0x55b04c32cbaf]
9: (global_init_preload_erasure_code(ceph::common::CephContext const*)+0x7c2) [0x55b04bdd9f92]
10: main()
11: /lib/x86_64-linux-gnu/libc.so.6(+0x271ca) [0x7f10aba461ca]
12: __libc_start_main()
13: _start()
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
```
I am still quite new to Ceph and would like advice on how to troubleshoot this and get the services working again.
Regards
Ross
Hi,
I want to create a new OSD on a 4TB Samsung MZ1L23T8HBLA-00A07
enterprise nvme device in a hyper-converged proxmox 8 environment.
Creating the OSD works but it cannot be initialized and therefore not
started.
In the log I see an entry about a failed assert.
./src/os/bluestore/fastbmap_allocator_impl.cc: 405: FAILED
ceph_assert((aligned_extent.length % l0_granularity) == 0)
Is this the culprit?
In addition at the end of the logfile there is a failed mount and a
failed osd init mentioned.
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 bluefs _check_allocations
OP_FILE_UPDATE_INC invalid extent 1: 0x140000~10000: duplicate
reference, ino 30
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1 bluefs mount failed to
replay log: (14) Bad address
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 20 bluefs _stop_alloc
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1
bluestore(/var/lib/ceph/osd/ceph-43) _open_bluefs failed bluefs mount:
(14) Bad address
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 10 bluefs maybe_verify_layout
no memorized_layout in bluefs superblock
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 -1
bluestore(/var/lib/ceph/osd/ceph-43) _open_db failed to prepare db
environment:
2023-09-11T16:30:04.708+0200 7f99aa28f3c0 1 bdev(0x5565c261fc00
/var/lib/ceph/osd/ceph-43/block) close
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1 osd.43 0 OSD:init: unable
to mount object store
2023-09-11T16:30:04.940+0200 7f99aa28f3c0 -1 ** ERROR: osd init failed:
(5) Input/output error
I verified that the hardware of the new nvme is working fine.
--
Regards,
ppa. Martin Konold
--
Viele Grüße
ppa. Martin Konold
--
Martin Konold - Prokurist, CTO
KONSEC GmbH - make things real
Amtsgericht Stuttgart, HRB 23690
Geschäftsführer: Andreas Mack
Im Köller 3, 70794 Filderstadt, Germany