Hi all,
Ceph Nautilus 14.2.16.
We encounter a strange and critical poblem since this morning.
Our cephfs metadata pool suddenly grew from 2,7% to 100%! (in less than
5 hours) while there is no significant activities on the OSD data !
Here are some numbers:
# ceph df
RAW STORAGE:
CLASS SIZE AVAIL USED RAW USED %RAW USED
hdd 205 TiB 103 TiB 102 TiB 102 TiB 49.68
nvme 4.4 TiB 2.2 TiB 2.1 TiB 2.2 TiB 49.63
TOTAL 210 TiB 105 TiB 104 TiB 104 TiB 49.68
POOLS:
POOL ID PGS STORED OBJECTS
USED %USED MAX AVAIL
cephfs_data_home 7 512 11 TiB 22.58M 11
TiB 18.31 17 TiB
cephfs_metadata_home 8 128 724 GiB 2.32M 724
GiB 100.00 0 B
rbd_backup_vms 9 1024 19 TiB 5.00M 19
TiB 37.08 11 TiB
The cephfs_data uses less than the half of the storage space, and there
was no significant increase during the period (and before) where
metadata became full.
Is someone already encounter that ?
Currently, I have no idea how I can solve this problem. The restart of
associated OSD and mds services have not been useful.
Let me know if you want more informations or logs.
Thank you for your help.
Regards,
Hervé
Dear all, I have a problem with a ceph cluster. reconstruction is
stopped. The status remains at HEALTH_WARN. Strangely enough, when I do
a 'ceph osd df' I have have some OSD which returns zero in size. Maybe,
is it related to my reconstruction problem.
ceph status : https://pastebin.com/rEwG0nD5
ceph osd df : https://pastebin.com/P2Dn1z7A
ceph osd tree : https://pastebin.com/rpy7hkiw
Thanks.
--
Julien Lenseigne
Responsable informatique LMD
Tel: 0169335172
Ecole Polytechnique, route de saclay, 91128 Palaiseau
Bat 83 - Bureau 83.30.13
Hi Hervé,
On 01.06.21 14:00, Hervé Ballans wrote:
> I'm aware with your points, and maybe I was not really clear in my
> previous email (written in a hurry!)
> The problematic pool is the metadata one. All its OSDs (x3) are full.
> The associated data pool is OK and no OSD is full on the data pool.
Are you saying that you only have 3 OSD for your metadata pool, which
are the full ones? Alright, then you can - at least for this specific
issue - disregard my previous comment.
>
> The problem is that metadata pool suddenly increases a lot and
> continiously from 3% to 100% in 5 hours (from 5 am to 10 am, then crash)
724 GiB stored in the metadata pool with only 11 TiB cephfs data size
does seem huge at first glance. For reference, I have about 160 TiB
cephfs data with only 31 GiB stored in the metadata pool.
I don't have an explanation for this behaviour, as I am relatively new
to Ceph. Maybe the list can chime in?
>
> And we don't understand the reason, since there was no specific
> activities on the data pool ?
> This cluster runs perfectly with the current configuration since many
> years.
Probably unrelated to your issues: I noticed that your STORED and USED
column in `ceph df` output are identical. Is that because of Nautilus (I
myself am running Octopus, where USED is the expected multiple of STORED
depending on replication factor / EC configuration in the pool) or are
you running a specific configuration that might cause that?
Cheers
Sebastian
Hi Hervé,
On 01.06.21 13:15, Hervé Ballans wrote:
> # ceph status
> cluster:
> id: 838506b7-e0c6-4022-9e17-2d1cf9458be6
> health: HEALTH_ERR
> 1 filesystem is degraded
> 3 full osd(s)
> 1 pool(s) full
> 1 daemons have recently crashed
You have full OSDs and therefore a full pool. The "fullness" of a pool
is limited by the fullest OSD, i.e. a single full OSDs can block your
pool. Take a look at `ceph osd df` and you will notice a very
non-uniform osd usage (both with numbers of PG / size as well as usage %).
> osd: 126 osds: 126 up (since 5m), 126 in (since 5M)
> pgs: 1662 active+clean
The PG/osd ratio seems to be very low for me. The general recommendation
is 100 PG / osd post-replication (and power of 2 for each pool). In my
cluster I actually run with ~200 PG / osd for my SSD which contain the
cephfs metadata.
>
> Thanks a lot if you have some ways for trying to solve this...
You have to get your OSDs to rebalance, which probably includes
increasing the number of PGs in some pools. Details depend on which Ceph
version you are running and your CRUSH rules (maybe your cephfs metadata
pool is residing only on NVMe?). Take a look at the balancer module [1]
and the autoscaler [2] (`ceph osd pool autoscale-status` is most
interesting).
Theoretically, you could (temporarilly!) increase the full_ratio.
However, this is a very dangerous operation which you should not do
unless you know *exactly* what you are doing.
Cheers & Best of luck
Sebastian
[1] https://docs.ceph.com/en/latest/rados/operations/balancer/
[2] https://docs.ceph.com/en/latest/rados/operations/placement-groups/
Replace latest in the URIs with your Ceph version string (i.e. octopus,
nautilus) for version specific documentation
hi,
Can you show me how to mirror the ceph docker images that are
available from quay.ceph.io
Our CEPH clusters are on a private network, and they are installed
from a local repository because they cannot directly access the
internet network.
With Octopus, I need to be able to create a local mirror of the
different packages and images.
What do you recommend as a product for creating a local registry?
thanks for your advices
Sébastien.
Hi,
Any way to clean up large-omap in the index pool?
PG deep_scrub didn't help.
I know how to clean in the log pool, but no idea in the index pool :/
It's an octopus deployment 15.2.10.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
So the bucket has been deleted on the master zone which has been removed from the other zones as well. On the master zone after deep scrub the omap disappeared but on the secondary zone it's still there.
It was 3 at the beginning after I scrubbed the affected OSDs (not just pgs) I have 6 omap.
The bucket which I assumed caused the issue is not there anymore.
This is the log:
2021-06-01T08:48:11.560449+0700 osd.39 (osd.39) 1 : cluster [DBG] 20.6 deep-scrub starts
2021-06-01T08:48:13.218814+0700 osd.39 (osd.39) 2 : cluster [WRN] Large omap object found. Object: 20:62d24dc9:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.5:head PG: 20.93b24b46 (20.6) Key count: 244664 Size (bytes): 69659918
2021-06-01T08:48:15.245975+0700 osd.39 (osd.39) 3 : cluster [DBG] 20.6 deep-scrub ok
2021-06-01T08:48:16.623097+0700 osd.39 (osd.39) 4 : cluster [DBG] 20.e deep-scrub starts
2021-06-01T08:48:20.201926+0700 osd.39 (osd.39) 5 : cluster [WRN] Large omap object found. Object: 20:77cbb881:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.9:head PG: 20.811dd3ee (20.e) Key count: 244363 Size (bytes): 69582418
2021-06-01T08:48:20.275906+0700 osd.39 (osd.39) 6 : cluster [DBG] 20.e deep-scrub ok
2021-06-01T08:48:21.560212+0700 osd.39 (osd.39) 7 : cluster [DBG] 20.15 deep-scrub starts
2021-06-01T08:48:22.456133+0700 osd.39 (osd.39) 8 : cluster [WRN] Large omap object found. Object: 20:a8a0840e:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.3:head PG: 20.70210515 (20.15) Key count: 244169 Size (bytes): 69508179
2021-06-01T08:48:25.202051+0700 osd.39 (osd.39) 9 : cluster [DBG] 20.15 deep-scrub ok
2021-06-01T08:48:26.019422+0700 osd.37 (osd.37) 4 : cluster [DBG] 20.f deep-scrub starts
2021-06-01T08:48:28.370919+0700 osd.37 (osd.37) 5 : cluster [WRN] Large omap object found. Object: 20:f59fa8d7:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.6:head PG: 20.eb15f9af (20.f) Key count: 244243 Size (bytes): 69539068
2021-06-01T08:48:29.010877+0700 osd.37 (osd.37) 6 : cluster [DBG] 20.f deep-scrub ok
2021-06-01T08:48:29.573160+0700 osd.39 (osd.39) 10 : cluster [DBG] 20.18 deep-scrub starts
2021-06-01T08:48:32.682416+0700 osd.39 (osd.39) 11 : cluster [WRN] Large omap object found. Object: 20:1e2579ff:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.4:head PG: 20.ff9ea478 (20.18) Key count: 243858 Size (bytes): 69426645
2021-06-01T08:48:33.383843+0700 osd.39 (osd.39) 12 : cluster [DBG] 20.18 deep-scrub ok
2021-06-01T08:48:35.021940+0700 osd.37 (osd.37) 7 : cluster [DBG] 20.14 deep-scrub starts
2021-06-01T08:48:36.139892+0700 osd.37 (osd.37) 8 : cluster [WRN] Large omap object found. Object: 20:291b1669:::.dir.9213182a-14ba-48ad-bde9-289a1c0c0de8.29868038.2.1:head PG: 20.9668d894 (20.14) Key count: 244533 Size (bytes): 69611691
2021-06-01T08:48:38.843235+0700 osd.37 (osd.37) 9 : cluster [DBG] 20.14 deep-scrub ok
Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo(a)agoda.com
---------------------------------------------------
-----Original Message-----
From: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
Sent: Monday, May 31, 2021 10:53 PM
To: Matt Vandermeulen <storage(a)reenigne.net>
Cc: ceph-users(a)ceph.io
Subject: [ceph-users] Re: [Suspicious newsletter] Re: The always welcomed large omap
Yeah, I found a bucket at the moment in progress the deleting, will preshard it.
Istvan Szabo
Senior Infrastructure Engineer
---------------------------------------------------
Agoda Services Co., Ltd.
e: istvan.szabo(a)agoda.com
---------------------------------------------------
-----Original Message-----
From: Matt Vandermeulen <storage(a)reenigne.net>
Sent: Monday, May 31, 2021 9:43 PM
To: Szabo, Istvan (Agoda) <Istvan.Szabo(a)agoda.com>
Cc: ceph-users(a)ceph.io
Subject: [Suspicious newsletter] [ceph-users] Re: The always welcomed large omap
All the index data will be in OMAP, which you can see a listing of with
`ceph osd df tree`
Do you have large buckets (many, many objects in a single bucket) with few shards? You may have to reshard one (or some) of your buckets.
It'll take some reading if you're using multisite, in order to coordinate it (though I'm unfamiliar with how it works with multisite in Octopus).
On 2021-05-31 02:25, Szabo, Istvan (Agoda) wrote:
> Hi,
>
> Any way to clean up large-omap in the index pool?
> PG deep_scrub didn't help.
> I know how to clean in the log pool, but no idea in the index pool :/
> It's an octopus deployment 15.2.10.
>
> Thank you
>
> ________________________________
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by
> copyright or other legal rules. If you have received it by mistake
> please let us know by reply email and delete it from your system. It
> is prohibited to copy this message or disclose its content to anyone.
> Any confidentiality or privilege is not waived or lost by any mistaken
> delivery or unauthorized disclosure of the message. All messages sent
> to and from Agoda may be monitored to ensure compliance with company
> policies, to protect the company's interests and to remove potential
> malware. Electronic messages may be intercepted, amended, lost or
> deleted, or contain viruses.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io