May 2021 - ceph-users - lists.ceph.io

by Szabo, Istvan (Agoda)

Hi, I've successfully updated my luminous lab environment to nautilus so next week I'll give a try to the prod env but 2 things same up in the description: 1. Upmap: I've never used this before don't know how I couldn't see it because it is quite cool feature. So ceph says let it in upmap, my question is like with the ceph-pg autoscaler module not the best thing to use in 'on' mode, better in warn, so is it good to use in upmap mode let's say in ceph world "safe"? 2. Regarding this assimilate-conf. It's not clear for me what is it actually? Doc says: "This is also a good time to fully transition any config options in ceph.conf into the cluster's configuration database. On each host, you can use the following command to import any option into the monitors with ceph config assimilate-conf -i /etc/ceph/ceph.conf". What does this mean? If I have different configs on different servers and I run it on the servers it will merge it together all the configs? Or what I can do with this feature? Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

2 years, 11 months

2
1
0 0

Does dynamic resharding block I/Os by design?

by Satoru Takeuchi

Hi, I have a Ceph cluster used for RGW and RBD. I found that all I/Os to RGW seemed to be blocked while dynamic resharding. Could you tell me whether this behavior is by design or not? I attached a graph which means I/O seemed to be blocked. Here x-axis is time and y-axis is the number of RADOS objects. In addition, dynamic resharding was run between 16:22:30 and 16:31:30. I read the official documents about dynamic resharding. But there is no description about blocking during dynamic resharding. https://docs.ceph.com/en/octopus/radosgw/dynamicresharding/ In addition, I read the following Red Hat's blog post. https://www.redhat.com/ja/blog/ceph-rgw-dynamic-bucket-sharding-performance… > You do not need to stop reading or writing objects to the bucket while resharding is happening. It would mean the dynamic resharding is online operation. However, it's not clear whether this feature blocks I/Os or not. Thanks, Satoru

2 years, 11 months

2
4
0 0

How to organize data in S3

by Michal Strnad

Hi all, We need to store millions of files using S3 protocol in Ceph (version Nautilus), but have projects where isn't appropriate or possible to create a lot of S3 accounts. Is it better to have multiple S3 buckets or one bucket with sub folders? For example AWS service from Amazon allows you to create up to 100 buckets in each of your AWS cloud accounts. You can request more buckets, up to a maximum quota of 1,000, by submitting a service limit increase. There is no limit on the number of objects you can store in a bucket, but in the Ceph we run into a problem with listing and resharding with a millions of files in one bucket. Thank you Michal

2 years, 11 months

4
5
0 0

Recommendations on problem with PG

by Gabriel Medve

Hi, We have a problem with a PG that was inconsistent, currently the PG in our cluster have 3 copies. It was not possible for us to repair this pg with "ceph pg repair" (This PG is in osd 14,1,2) so we deleted some of the copies of osd 14 with the following command. ceph-objectstore-tool --data-path /var/lib/ceph/osd.14/ --pgid 22.f --op remove --force This caused an automatic attempt to create the missing copy entering the backfilling state, but when doing this it crashed osd 1 and 2 and threw the IOPS to 0, freezing the cluster. Is there any way to remove this entire pg or try to recreate the missing copy or ignore it completely? It causes instability in the cluster. Thank you, I await comments -- Untitled Document ------------------------------------------------------------------------ Gabriel I. Medve

2 years, 11 months

2
2
0 0

One mds daemon damaged, filesystem is offline. How to recover?

by Sagara Wijetunga

Hi all An accidental power failure happened. That resulted CephFS offline and cannot be mounted. I have 3 MDS daemons but it complains "1 mds daemon damaged". It seems a PG of cephfs_metadata is inconsistent. I tried to repair, but doesn't get it repaired. How do I repair the damaged MDS and bring the CephFS up/online? Details are included below. Many thanks in advance. Sagara # ceph -s cluster: id: abc... health: HEALTH_ERR 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged 4 scrub errors Possible data damage: 1 pg inconsistent services: mon: 3 daemons, quorum a,b,c (age 107s) mgr: a(active, since 22m), standbys: b, c mds: cephfs:0/1 3 up:standby, 1 damaged osd: 3 osds: 3 up (since 96s), 3 in (since 96s) data: pools: 3 pools, 192 pgs objects: 281.05k objects, 327 GiB usage: 2.4 TiB used, 8.1 TiB / 11 TiB avail pgs: 191 active+clean 1 active+clean+inconsistent # ceph health detail HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds daemon damaged; 4 scrub errors; Possible data damage: 1 pg inconsistent FS_DEGRADED 1 filesystem is degraded fs cephfs is degraded MDS_ALL_DOWN 1 filesystem is offline fs cephfs is offline because no MDS is active for it. MDS_DAMAGE 1 mds daemon damaged fs cephfs mds.0 is damaged OSD_SCRUB_ERRORS 4 scrub errors PG_DAMAGED Possible data damage: 1 pg inconsistent pg 2.44 is active+clean+inconsistent, acting [0,2,1] # ceph osd lspools 2 cephfs_metadata 3 cephfs_data 4 rbd # ceph pg repair 2.44 # ceph -w 2021-05-22 01:48:04.775783 osd.0 [ERR] 2.44 shard 0 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch 2021-05-22 01:48:04.775786 osd.0 [ERR] 2.44 shard 1 soid 2:22efaf6a:::200.00006048:head : candidate size 1540096 info size 1555896 mismatch 2021-05-22 01:48:04.775787 osd.0 [ERR] 2.44 shard 2 soid 2:22efaf6a:::200.00006048:head : candidate size 1441792 info size 1555896 mismatch 2021-05-22 01:48:04.775789 osd.0 [ERR] 2.44 soid 2:22efaf6a:::200.00006048:head : failed to pick suitable object info 2021-05-22 01:48:04.775849 osd.0 [ERR] repair 2.44 2:22efaf6a:::200.00006048:head : on disk size (1540096) does not match object info size (1555896) adjusted for ondisk to (1555896) 2021-05-22 01:48:04.787167 osd.0 [ERR] 2.44 repair 4 errors, 0 fixed --- End of detail ---

2 years, 11 months

3
15
0 0

Re: One mds daemon damaged, filesystem is offline. How to recover?

by Eugen Block

Oh these are filestore OSDs? I didn't expect that. > 2021-05-20 23:56:52.223 2227200 0 mds.0.journaler.mdlog(ro) > _finish_read got less than expected (1555896) > 2021-05-20 23:56:52.223 2229180 0 mds.0.log _replay journaler got > error -22, aborting > 2021-05-20 23:56:52.223 2229180 -1 log_channel(cluster) log [ERR] : > Error loading MDS rank 0: (22) Invalid argument This confirms that the MDS is not starting because of the object size. I assume there was some recovery going on when this happened? The OSD uptime from your status was quite short. I'm not really sure what to do next, I found an old thread [4] with almost the same error, only that the user ended up truncating the objects because the expected size was smaller than the object info, in your case they're the other way around. In any case I would backup all three copies of that object including md5sum before chaning anything. [4] https://www.spinics.net/lists/ceph-devel/msg16516.html Zitat von Sagara Wijetunga <sagarawmw(a)yahoo.com>: > Here are the physical file sizes of the "200.00006048*": > OSD.0:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47 > /var/lib/ceph/osd/ceph-0/current/2.44_head/200.00006048__head_56F5F744__ > > OSD.1:-rw-r--r-- 1 ceph ceph 1540096 May 20 22:47 > /var/lib/ceph/osd/ceph-1/current/2.44_head/DIR_4/DIR_4/200.00006048__head_56F5F744__2 > > OSD2:-rw-r--r-- 1 ceph ceph 1441792 May 20 22:47 > /var/lib/ceph/osd/ceph-2/current/2.44_head/DIR_4/DIR_4/200.00006048__head_56F5F744__2 > > Sagara > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

2 years, 11 months

1
0
0 0

Re: ceph orch status hangs forever

by ManuParra

Hi Eugen, this is the output: ceph mgr module ls { "always_on_modules": [ "balancer", "crash", "devicehealth", "orchestrator", "pg_autoscaler", "progress", "rbd_support", "status", "telemetry", "volumes" ], "enabled_modules": [ "cephadm", "dashboard", "diskprediction_local", "iostat", "prometheus", "restful" ], "disabled_modules": [ ... } As you see balancer/crash/… are in section always_on. I checked it on all of the 3 monitor nodes with the same output. Then, checking disabled_modules I’ve seen many modules that could help to track some more information (logs) on our problem, like: - alerts - insights - test_orchestrator - and other… But we are not sure if we can enable some of them. Now all the logs we have from Ceph are not showing errors. Would it help to see more logs to enable some of those modules? On the other hand, as for what you indicate us of the commands that hang, we see that the containers that launch it remain executing, since they are waiting to finish. So you can see here the list of tests that remain running (hung): af13bda77a1a 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph osd status" 23 hours ago Up 23 hours wizardly_leavitt 5b5c760454c7 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph telemetry stat…" 24 hours ago Up 24 hours intelligent_bardeen a98e6061489d 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph service dump" 24 hours ago Up 24 hours romantic_mendel 66c943a032f8 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph service status" 24 hours ago Up 24 hours happy_shannon 7e18899dffc5 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph crash stat" 24 hours ago Up 24 hours xenodochial_germain 8268082e753b 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph crash ls" 24 hours ago Up 24 hours stoic_volhard fc5c434a4e23 172.16.3.146:4000/ceph/ceph:v15.2.9 "ceph balancer status" 24 hours ago Up 24 hours epic_mendel So the containers will have to be eliminated. As for the logs of these containers nothing appears inside the container (docker logs xxxx), only, when you kill it you can see (--verbose): [ceph: root@spsrc-mon-1 /]# ceph —verbose pg stat …. validate_command: pg stat better match: 2.5 > 0: pg stat bestcmds_sorted: [{'flags': 8, 'help': 'show placement group status.', 'module': 'pg', 'perm': 'r', 'sig': [argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=pg), argdesc(<class 'ceph_argparse.CephPrefix'>, req=True, name=prefix, n=1, numseen=0, prefix=stat)]}] Submitting command: {'prefix': 'pg stat', 'target': ('mon-mgr', '')} submit ['{"prefix": "pg stat", "target": ["mon-mgr", ""]}'] to mon-mgr [hung forever …] Kind regards, Manu.

2 years, 11 months

3
3
0 0

Force processing of num_strays in mds

by Mark Schouten

Hi, I have a 12.2.13 I want to go and upgrade. However, there are a whole bunch of stray files/inodes(?) which I would want to have processed. Also because I get a lot of 'No space left on device' messages. I started a 'find . -ls' in the root of the CephFS filesystem, but that causes overload and takes a lot of time, while not neccesarily fixing the num_strays. How do I force the mds'es to process those strays so that clients do not get 'incorrect' errors? -- Mark Schouten | Tuxis B.V. KvK: 74698818 | http://www.tuxis.nl/ T: +31 318 200208 | info(a)tuxis.nl

2 years, 11 months

2
1
0 0

mgr+Prometheus, grafana, consul

by Jeremy Austin

I recently configured Prometheus to scrape mgr /metrics and add Grafana dashboards. All daemons currently at 15.2.11 I use Hashicorp consul to advertise the active mgr in DNS, and Prometheus points at a single DNS target. (Is anyone else using this method, or just statically pointing Prometheus at all potentially active managers?) All was working fine initially, and it's *mostly* still working fine. For the first couple of days, all went well, and then a few rate metrics stopped meaningfully increasing — essentially pegged at zero, which is implausible in a healthy cluster. Some cluster maintenance was occurring such as outing and recreating some OSDs, so I have a baseline for throughput and recovery. Metric graphs that stopped functioning: Throughput: ceph_osd_op_r_out_bytes, ceph_osd_op_w_in_bytes, ceph_osd_op_rw_in_bytes Recovery: ceph_osd_recovery_ops I can see that Grafana output is using this method of converting the counters to rates: sum(irate(ceph_osd_recovery_ops{job="$job"}[$interval])) The underlying counters appear to be sane, and reading the raw values from prometheus is also valid, so I'm guessing some failure either of the irate or sum functions? By inspection in Grafana, the queries return correct timestamps with zero values, so that leaves us with "sum(irate)" as the likely source of the problem. Does anyone have experience with this? I admit it is possibly tangential to ceph itself, but as the Prometheus/grafana integration is more or less supported, I thought I'd try here first among active mgr/Prometheus users. -- Jeremy Austin jhaustin(a)gmail.com

2 years, 11 months

1
0
0 0

OSD's still UP after power loss

by by morphin

Hello I have a weird problem on 3 node cluster. "Nautilus 14.2.9" When I try power failure OSD's are not marking as DOWN and MDS do not respond anymore. If I manually set osd down then MDS becomes active again. BTW: Only 2 node has OSD's. Third node is only for MON. I've set mon_osd_down_out_interval = 0.3 in ceph.conf at global section and restart all MON's but when I check it with "ceph daemon mon.ID config show" I see mon_osd_down_out_interval: "600". I didn't get it why its still "600" and honestly I don't know even it has any effect on my problem. Where should I check?

2 years, 11 months

3
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users May 2021