- ceph-users - lists.ceph.io

pg repair or pg deep-scrub does not start

by Marcel Kuiper

Hi I've got an old cluster running ceph 10.2.11 with filestore backend. Last week a PG was reported inconsistent with a scrub error # ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 38.20 is active+clean+inconsistent, acting [1778,1640,1379] 1 scrub errors I first tried 'ceph pg repair' but nothing seemed to happen, then # rados list-inconsistent-obj 38.20 --format=json-pretty showed that the problem was on osd 1379. The logs showed that that osd had read errors so I decided to mark that osd out for replacement. Later on removed it from the crush map en deleted the osd. My thoughts were that the missing replica gets backfilled on another osd and everything would be ok again. It got another osd assigned but the health error stayed # ceph health detail HEALTH_ERR 1 pgs inconsistent; 1 scrub errors pg 38.20 is active+clean+inconsistent, acting [1778,1640,1384] 1 scrub errors Now I get an error on: # rados list-inconsistent-obj 38.20 --format=json-pretty No scrub information available for pg 38.20 error 2: (2) No such file or directory And if I try # ceph pg deep-scrub 38.20 instructing pg 38.20 on osd.1778 to deep-scrub The deepscrub does not get scheduled. Same goes for # ceph daemon osd.1778 trigger_scrub 38.20 on the storage node Nothing appears in the logs concerning the scrubbing of PG 38.20. I see in the log that other PG's get (deep) scrubbed according to the automatic scheduling There is no recovery going on but just to be sure I'd set ceph daemon osd.1778 config set osd_scrub_during_recovery true Also the load limit is set way higher then the actual system load I checked the other osds en there are no scrubs going on on these when I schedule the deep-scrub I found some report of people that had the same problem. However no solution was found (for example https://tracker.ceph.com/issues/15781). Even in mimic and luminous there were sort of the same cases - Does anyone know what logging I should incraese in order to get more information as to why my deep-scrub does not get scheduled - Is there a way in jewel to see the list of scheduled scrubs and their dates for an osd - Does someone have advice on how to proceed in clearing this PG error Thanks for any help Marcel

3 years, 3 months

1
0
0 0

osd recommended scheduler

by Andrei Mikhailovsky

Hello everyone, Could some one please let me know what is the recommended modern kernel disk scheduler that should be used for SSD and HDD osds? The information in the manuals is pretty dated and refer to the schedulers which have been deprecated from the recent kernels. Thanks Andrei

3 years, 3 months

4
5
0 0

CephFS per client monitoring

by Erwin Bogaard

Hi, we're using mainly CephFS to give access to storage. At all times we can see that all clients combines use "X MiB/s" and "y op/s" for read and write by using the cli or ceph dashboard. With a tool like iftop, I can get a bit of insight to which clients most data 'flows', but it isn't really precise. is there any way to get a MiB/s and op/s number per CephFS client? Thanks, Erwin

3 years, 3 months

2
1
0 0

radosgw process crashes multiple times an hour

by Andrei Mikhailovsky

Hello, I am experiencing very frequent crashes of the radosgw service. It happens multiple times every hour. As an example, over the last 12 hours we've had 35 crashes. Has anyone experienced similar behaviour of the radosgw octopus release service? More info below: Radosgw service is running on two Ubuntu servers. I have tried upgrading OS on one of the servers to Ubuntu 20.04 with latest updates. The second server is still running Ubuntu 18.04. Both services crash occasionally, but the service which is running on Ubuntu 20.04 crashes far more often it seems. The ceph cluster itself is pretty old and was initially setup around 2013. The cluster was updated pretty regularly with every major release. Currently, I've got Octopus 15.2.8 running on all osd, mon, mgr and radosgw servers. Crash Backtrace: ceph crash info 2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8 |less { "backtrace": [ "(()+0x46210) [0x7f815a49a210]", "(gsignal()+0xcb) [0x7f815a49a18b]", "(abort()+0x12b) [0x7f815a479859]", "(()+0x9e951) [0x7f8150ee9951]", "(()+0xaa47c) [0x7f8150ef547c]", "(()+0xaa4e7) [0x7f8150ef54e7]", "(()+0xaa799) [0x7f8150ef5799]", "(()+0x344ba) [0x7f815a1404ba]", "(()+0x71e04) [0x7f815a17de04]", "(librados::v14_2_0::IoCtx::nobjects_begin(librados::v14_2_0::ObjectCursor const&, ceph::buffer::v15_2_0::list const&)+0x5d) [0x7f815a18c7bd]", "(RGWSI_RADOS::Pool::List::init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWAccessListFilter*)+0x115) [0x7f815b0d9935]", "(RGWSI_SysObj_Core::pool_list_objects_init(rgw_pool const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, RGWSI_SysObj::Pool::ListCtx*)+0x255) [0x7f815abd7035]", "(RGWSI_MetaBackend_SObj::list_init(RGWSI_MetaBackend::Context*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x206) [0x7f815b0ccfe6]", "(RGWMetadataHandler_GenericMetaBE::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x41) [0x7f815ad23201]", "(RGWMetadataManager::list_keys_init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void**)+0x71) [0x7f815ad254d1]", "(AsyncMetadataList::_send_request()+0x9b) [0x7f815b13c70b]", "(RGWAsyncRadosProcessor::handle_request(RGWAsyncRadosRequest*)+0x25) [0x7f815ae60f25]", "(RGWAsyncRadosProcessor::RGWWQ::_process(RGWAsyncRadosRequest*, ThreadPool::TPHandle&)+0x11) [0x7f815ae69401]", "(ThreadPool::worker(ThreadPool::WorkThread*)+0x5bb) [0x7f81517b072b]", "(ThreadPool::WorkThread::entry()+0x15) [0x7f81517b17f5]", "(()+0x9609) [0x7f815130d609]", "(clone()+0x43) [0x7f815a576293]" ], "ceph_version": "15.2.8", "crash_id": "2021-01-28T11:36:48.912771Z_08f80efd-c0ad-4551-88ce-905ca9cd3aa8", "entity_name": "client.radosgw1.gateway", "os_id": "ubuntu", "os_name": "Ubuntu", "os_version": "20.04.1 LTS (Focal Fossa)", "os_version_id": "20.04", "process_name": "radosgw", "stack_sig": "347474f09a756104ac2bb99d80e0c1fba3e9dc6f26e4ef68fe55946c103b274a", "timestamp": "2021-01-28T11:36:48.912771Z", "utsname_hostname": "arh-ibstorage1-ib", "utsname_machine": "x86_64", "utsname_release": "5.4.0-64-generic", "utsname_sysname": "Linux", "utsname_version": "#72-Ubuntu SMP Fri Jan 15 10:27:54 UTC 2021" } radosgw.log file (file names were redacted): -25> 2021-01-28T11:36:48.794+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-u115134.JPG HTTP/1.1" 400 460 - - -24> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== starting new request req=0x7f80437f5780 ===== -23> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s initializing for trans_id = tx000000000000000001431-006012a1d0-31197b5c-default -22> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s getting op 1 -21> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj verifying requester -20> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj normalizing buckets and tenants -19> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj init permissions -18> 2021-01-28T11:36:48.814+0000 7f80437fe700 0 req 5169 0s NOTICE: invalid dest placement: default-placement/REDUCED_REDUNDANCY -17> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 op->ERRORHANDLER: err_no=-22 new_err_no=-22 -16> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj op status=0 -15> 2021-01-28T11:36:48.814+0000 7f80437fe700 2 req 5169 0s s3:put_obj http status=400 -14> 2021-01-28T11:36:48.814+0000 7f80437fe700 1 ====== req done req=0x7f80437f5780 op status=0 http_status=400 latency=0s ====== -13> 2021-01-28T11:36:48.822+0000 7f80437fe700 1 civetweb: 0x7f814c0cf9e8: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-d20201223-u115132.JPG HTTP/1.1" 400 460 - - -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing buckets and tenants -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester -12> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== starting new request req=0x7f8043ff6780 ===== -11> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s initializing for trans_id = tx000000000000000001432-006012a1d0-31197b5c-default -10> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s getting op 1 -9> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj verifying requester -8> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj normalizing buckets and tenants -7> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj init permissions -6> 2021-01-28T11:36:48.878+0000 7f8043fff700 0 req 5170 0s NOTICE: invalid dest placement: default-placement/REDUCED_REDUNDANCY -5> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 op->ERRORHANDLER: err_no=-22 new_err_no=-22 -4> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj op status=0 -3> 2021-01-28T11:36:48.878+0000 7f8043fff700 2 req 5170 0s s3:put_obj http status=400 -2> 2021-01-28T11:36:48.878+0000 7f8043fff700 1 ====== req done req=0x7f8043ff6780 op status=0 http_status=400 latency=0s ====== -1> 2021-01-28T11:36:48.886+0000 7f8043fff700 1 civetweb: 0x7f814c0cf010: 176.35.173.88 - - [28/Jan/2021:11:36:48 +0000] "PUT /<file_name>-223-u115136.JPG HTTP/1.1" 400 460 - - 0> 2021-01-28T11:36:48.910+0000 7f8128ff9700 -1 *** Caught signal (Aborted) ** 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 deferred set uid:gid to 64045:64045 (ceph:ceph) 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable), process radosgw, pid 30417 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework: civetweb 2021-01-28T11:36:49.810+0000 7f76032db9c0 0 framework conf key: port, val: 443s Could someone help me troubleshoot and fix the issue? Thanks Andrei

3 years, 3 months

3
4
0 0

Issue with cephadm upgrading containers.

by Darrin Hodges

Hi all, I'm attempting to upgrade our octopus 15.2.4 containers to 15.2.8. If I run 'ceph orch upgrade start --ceph-version 15.2.8' it eventually errors with: '"message": "Error: UPGRADE_FAILED_PULL: Upgrade: failed to pull target image", The documentation suggests that this is caused by specifying the incorrect version or the registry is not reachable. If I run 'cephadm pull' it doesn't complain: Using recent ceph image ceph/ceph:v15.2.8 Pulling container image ceph/ceph:v15.2.8... { "ceph_version": "ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable)", "image_id": "5553b0cb212ca2aa220d33ba39d9c602c8412ce6c5febc57ef9cdc9c5844b185", "repo_digest": "ceph/ceph@sha256:37939a3739e4e037dcf1b1f5828058d721d8c6de958212609f9e7d920b9c62bf" } Not sure what the issue is with upgrading. many thanks Darrin -- CONFIDENTIALITY NOTICE: This email is intended for the named recipients only. It may contain privileged, confidential or copyright information. If you are not the named recipients, any use, reliance upon, disclosure or copying of this email or any attachments is unauthorised. If you have received this email in error, please reply via email or telephone +61 2 8004 5928.

3 years, 3 months

1
1
0 0

`cephadm` not deploying OSDs from a storage spec

by Davor Cubranic

Hello, I am trying to set up a test cluster with the cephadm tool on Ubuntu 20.04 nodes. Following the directions at https://docs.ceph.com/en/octopus/cephadm/install/, I have set up the monitor and manager on a management node, and added two hosts that I want to use for storage. All storage devices present on those nodes are included in the output of `ceph orch device ls`, and all are marked “available”. However, when I try to deploy OSDs with `ceph orch apply osd -i spec.yml`, following the example for HDD+SSD storage spec at https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-simple-case, I see the new service in the output of `ceph orch ls`, but it is not running anywhere (“0/2”), and no OSDs get created. I am not sure how to debug this, and any pointers would be much appreciated. Thank you, Davor Output: ``` # ceph orch host ls INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3 INFO:cephadm:Inferring config /var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config INFO:cephadm:Using recent ceph image ceph/ceph:v15 HOST ADDR LABELS STATUS sps-head sps-head mon sps-st1 sps-st1 mon sps-st2 sps-st2 # ceph orch device ls INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3 INFO:cephadm:Inferring config /var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config INFO:cephadm:Using recent ceph image ceph/ceph:v15 Hostname Path Type Serial Size Health Ident Fault Available sps-head /dev/nvme0n1 ssd S5JXNS0N504446R 1024G Unknown N/A N/A Yes sps-st1 /dev/nvme0n1 ssd S5JXNS0N504948D 1024G Unknown N/A N/A Yes sps-st1 /dev/nvme1n1 ssd S5JXNS0N504958T 1024G Unknown N/A N/A Yes sps-st1 /dev/sdb hdd 5000cca28ed36018 14.0T Unknown N/A N/A Yes sps-st1 /dev/sdc hdd 5000cca28ed353e5 14.0T Unknown N/A N/A Yes […] # cat /mnt/osd_spec.yml service_type: osd service_id: default_drive_group placement: host_pattern: 'sps-st[1-6]' data_devices: rotational: 1 db_devices: rotational: 0 [**After running `ceph orch apply osd -i spec.yml`:**] # ceph orch ls NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID alertmanager 1/1 9m ago 6h count:1 docker.io/prom/alertmanager:v0.20.0 0881eb8f169f crash 3/3 9m ago 6h * docker.io/ceph/ceph:v15 5553b0cb212c grafana 1/1 9m ago 6h count:1 docker.io/ceph/ceph-grafana:6.6.2 a0dce381714a mgr 2/2 9m ago 6h count:2 docker.io/ceph/ceph:v15 5553b0cb212c mon 1/2 9m ago 3h label:mon docker.io/ceph/ceph:v15 5553b0cb212c node-exporter 0/3 - - * <unknown> <unknown> osd.default_drive_group 0/2 - - sps-st[1-6] <unknown> <unknown> prometheus 1/1 9m ago 6h count:1 docker.io/prom/prometheus:v2.18.1 de242295e225 [** I am not sure why neither “osd.default_drive_group” nor “node-exporter” are running anywhere. How do I check that? **] # ceph osd tree INFO:cephadm:Inferring fsid 150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3 INFO:cephadm:Inferring config /var/lib/ceph/150b5f1a-64bf-11eb-a7e9-d96bd5ac4db3/mon.sps-head/config INFO:cephadm:Using recent ceph image ceph/ceph:v15 ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0 root default # ceph orch --version ceph version 15.2.8 (bdf3eebcd22d7d0b3dd4d5501bee5bac354d5b55) octopus (stable) ```

3 years, 3 months

1
0
0 0

Bucket synchronization works only after disable/enable, once finished, some operation maxes out SSDs/nvmes and sync degraded.

by Szabo, Istvan (Agoda)

Hello, We have a 3 geo locational freshly installed multisite setup with an upgraded octopus from 15.2.5 to 15.2.7. We have 6 osd nodes, 3 mon/mgr/rgw in each dc, full SSD, 3 ssd is using 1 nvme for journaling. Each zone backed with 3 RGW, one on each mon/mgr node. The goal is to replicate 2 (currently) big buckets in the zonegroup but it only works if I disable and reenable the bucket sync. Big buckets means, one bucket is presharded for 9000 shards (9 billions objects), the 2nd bucket that I'm detailing in this ticket 24000 (24 billions objects) shards. Once picked up the objects (not all, only the ones that was on the source site at that given time when it was enabled) it will slows down a lot from 100.000 objects / 15 minutes in and 10GB/15 minutes to 50 objects/4 hours. Once it synchronized after enabled/disabled, it maxing out the osd nodes with NVME/SSD drives with some operation which I don't know what is it. Let me show you the symptoms below. Let me summarize as much as I can. We have 1 realm, in this realm we have 1 zonegroup (please help me to check if the sync policies are ok) and in this zonegroup we have 1 cluster in US, 1 in Hong Kong (master) and 1 in Singapore. Here is the realm, zonegroup and zones definition: https://pastebin.com/raw/pu66tqcf Let me show you one enable/disable operation when I've disabled on the HKG master site the pix-bucket and enabled it. In this screenshot: https://i.ibb.co/WNC0gNQ/6nodes6day.png the highlighted area is when the data sync is running after disable enable. You can see almost no operation. You can see also when sync is not running, the green and yellow is the NVME drive rocksdb+wal drives. The screenshot represents the 6 Singapore nodes SSD/NVME disk utilizations. The first node you can see in the last hours no green and yellow, it's because I've reinstalled in that nodes all the osds to not use NVME. In the following 1st screenshot you can see the HKG object usage where the user is uploading the objects. 2nd screenshot the SGP one where you can see the highlighted area is the disable/enable operation. HKG where user upload: https://i.ibb.co/vj2VFYP/pixhkg6d.png SGP where sync happened: https://i.ibb.co/w41rmQT/pixsgp6d.png Let me show you some troubleshooting logs regarding bucket sync status, cluster sync status, reshard list (which might be because of previous testing), sync error list https://pastebin.com/raw/TdwiZFC1 The issue might be very similar to this issue: https://tracker.ceph.com/issues/21591 Where I should move forward or how can I help you to provide more logs to help me please? Thank you in advance ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 3 months

1
0
0 0

rbd resize progress report?

by Jorge Garcia

I'm trying to resize a block device using "rbd --resize". The block device is pretty huge (100+ TB). The resize has been running for over a week, and I have no idea if it's actually doing anything, or if it's just hanging or in some infinite loop. Is there any way of getting a progress report from the resize to get an idea if this is ever going to finish? Thanks!

3 years, 3 months

1
0
0 0

Unable to enable RBD-Mirror Snapshot on image when VM is using RBD

by Adam Boyhan

This is a odd one. I don't hit it all the time so I don't think its expected behavior. Sometimes I have no issues enabling rbd-mirror snapshot mode on a rbd when its in use by a KVM VM. Other times I hit the following error, the only way I can get around it is to power down the KVM VM. root@Ccscephtest1:~# rbd mirror image enable CephTestPool1/vm-101-disk-0 snapshot 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f1e7c012440 handle_create_snapshot: failed to create mirror snapshot: (22) Invalid argument 2021-01-29T09:29:07.875-0500 7f1e99ffb700 -1 librbd::mirror::EnableRequest: 0x5597667fd200 handle_create_primary_snapshot: failed to create initial primary snapshot: (22) Invalid argument 2021-01-29T09:29:07.875-0500 7f1ea559f3c0 -1 librbd::api::Mirror: image_enable: cannot enable mirroring: (22) Invalid argument

3 years, 3 months

2
4
0 0

Balancing with upmap

by Francois Legrand

Hi all, I have a cluster with 116 disks (24 new disks of 16TB added in december and the rest of 8TB) running nautilus 14.2.16. I moved (8 month ago) from crush_compat to upmap balancing. But the cluster seems not well balanced, with a number of pgs on the 8TB disks varying from 26 to 52 ! And an occupation from 35 to 69%. The recent 16 TB disks are more homogeneous with 48 to 61 pgs and space between 30 and 43%. Last week, I realized that some osd were maybe not using upmap because I did a ceph osd crush weight-set ls and got (compat) as result. Thus I ran a ceph osd crush weight-set rm-compat which triggered some rebalancing. Now there is no more recovery for 2 days, but the cluster is still unbalanced. As far as I understand, upmap is supposed to reach an equal number of pgs on all the disks (I guess weighted by their capacity). Thus I would expect more or less 30 pgs on the 8TB disks and 60 on the 16TB and around 50% usage on all. Which is not the case (by far). The problem is that it impact the free available space in the pools (264Ti while there is more than 578Ti free in the cluster) because free space seems to be based on space available before the first osd will be full ! Is it normal ? Did I missed something ? What could I do ? F.

3 years, 3 months

3
11
0 0

2024

2023

2022

2021

2020

2019

ceph-users