June 2023 - ceph-users - lists.ceph.io

Quincy release -Swift integration with Keystone

by fsbiz＠yahoo.com

Hi folks, My ceph cluster with Quincy and Rocky9 is up and running. But I'm having issues with swift authenticating with keystone. Was wondering if I'm missed anything in the configuration. From the debug logs below, it appears that radosgw is still trying to authenticate with Swift instead of Keystone. Any pointers will be appreciated. thanks, Here is my configuration. # ceph config dump | grep rgw client advanced debug_rgw 20/20 client advanced rgw_keystone_accepted_roles admin,user * client advanced rgw_keystone_admin_domain Default * client advanced rgw_keystone_admin_password <secret> * client advanced rgw_keystone_admin_project service * client advanced rgw_keystone_admin_user ceph-ks-svc * client advanced rgw_keystone_api_version 3 client advanced rgw_keystone_implicit_tenants false * client advanced rgw_keystone_token_cache_size 0 client basic rgw_keystone_url <Identity URL> * client advanced rgw_s3_auth_use_keystone true client advanced rgw_swift_account_in_url true client basic rgw_thread_pool_size 512 client.rgw.s_rgw.dev-ipp1-u1-control01.ojmddc basic rgw_frontends beast port=7480 * client.rgw.s_rgw.dev-ipp1-u1-control02.adnjrx basic rgw_frontends beast port=7480 Here's the debug log. If I interpret it correctly, it is trying to do a swift authentication and failing. Am I missing any configuration for Keystone based authentication ? Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: beast: 0x7fddeb8e7710: 10.117.53.10 - - [03/Jun/2023:18:47:03.060 +0000] "GET /swift/v1/AUTH_c668ed224e434c88a9e0fce125056112?format=json HTTP/1.1" 401 119 - "openstacksdk/0.52.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" - latency=0.000000000s Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_ACCEPT=*/* Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_ACCEPT_ENCODING=gzip, deflate Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_CONNECTION=close Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: HTTP_HOST=dev-ipp1-u1-object-store Jun 03 11:47:03 dev-ipp1-u1-control02radosgw[2802861]: HTTP_USER_AGENT=openstacksdk/0.52.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_VERSION=1.1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_X_AUTH_TOKEN=gAAAAABke4qn779UQ_XMz0EDL3P3TgjBQsGG6p-MNhviJxLZTuMTnTDmpT5Yfi9UpgO_T3LOOsPjQAw6zoMUIaC22wPeryp5x-UumB3XwXOWp-qSXLbuN3b9oj_Qg5kCZWA0waWNRHzQ1mwtlEmmpTgvTXbU5V1ym6hEBOn6Q3RWhn34Hj3cF9o Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_X_FORWARDED_FOR=10.117.148.3 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: QUERY_STRING=format=json Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REMOTE_ADDR=10.117.53.10 Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REQUEST_METHOD=GET Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REQUEST_URI=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112?format=json Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: SCRIPT_URI=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: SERVER_PORT=7480 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: ====== starting new request req=0x7fddeb8e7710 ===== Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s initializing for trans_id = tx000003991cfc5c1791f95-00647b8aa7-30c56-default Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s rgw api priority: s3=8 s3website=7 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s host=dev-ipp1-u1-object-store Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s name: format val: json Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s ver=v1 first= req= Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s handler=29RGWHandler_REST_Service_SWIFT Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s getting op 0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s get_system_obj_state: rctx=0x7fddeb8e6790 obj=default.rgw.log:script.prerequest. state=0x55f743b97720 s->prefetch_data=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s cache get: name=default.rgw.log++script.prerequest. : hit (negative entry) Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets scheduling with throttler client=3 cost=1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets op=29RGWListBuckets_ObjStore_SWIFT Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets verifying requester Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::TempURLEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::TempURLEngine denied with reason=-13 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SignedTokenEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::SignedTokenEngine denied with reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SwiftAnonymousEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::SwiftAnonymousEngine denied with reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets Failed the auth strategy, reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: failed to authorize request Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s op->ERRORHANDLER: err_no=-1 new_err_no=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s get_system_obj_state: rctx=0x7fddeb8e6790 obj=default.rgw.log:script.postrequest. state=0x55f743b97960 s->prefetch_data=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s cache get: name=default.rgw.log++script.postrequest. : hit (negative entry) Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets op status=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets http status=401

11 months, 2 weeks

2
1
0 0

RADOSGW integration with Keystone not working in Quincy release ??

by fsbiz＠yahoo.com

I have a ceph cluster installed using cephadm. The cluster is up and running but I'm unable to get Keystone integration working with RADOSGW. Is this a known issue? thanks,Fred.

11 months, 2 weeks

1
0
0 0

Unexpected behavior of directory mtime after being set explicitly

by Sandip Divekar

Hi Team, I'm writing to bring to your attention an issue we have encountered with the "mtime" (modification time) behavior for directories in the Ceph filesystem. Upon observation, we have noticed that when the mtime of a directory (let's say: dir1) is explicitly changed in CephFS, subsequent additions of files or directories within 'dir1' fail to update the directory's mtime as expected. This behavior appears to be specific to CephFS - we have reproduced this issue on both Quincy and Pacific. Similar steps work as expected in the ext4 filesystem amongst others. Reproduction steps: 1. Create a directory - mkdir dir1 2. Modify mtime using the touch command - touch dir1 3. Create a file or directory inside of 'dir1' - mkdir dir1/dir2 Expected result: mtime for dir1 should change to the time the file or directory was created in step 3 Actual result: there was no change to the mtime for 'dir1' Note : For more detail, kindly find the attached logs. Our queries are : 1. Is this expected behavior for CephFS? 2. If so, can you explain why the directory behavior is inconsistent depending on whether the mtime for the directory has previously been manually updated. Best Regards, Sandip Divekar Component QA Lead SDET.

11 months, 2 weeks

7
14
0 0

How to show used size of specific storage class in Radosgw?

by Huy Nguyen

Hi, I'm not able to find the information about used size of a storage class. - bucket stats - usage show - user stats ... Does Radosgw support it? Thanks

11 months, 2 weeks

1
0
0 0

RGW: bucket notification issue with Kafka

by Huy Nguyen

Hi, In Ceph Radosgw 15.2.17, I get this issue when trying to create a push endpoint to Kafka Here is push endpoint configuration: endpoint_args = 'push-endpoint=kafka://abcef:123456@kafka.endpoint:9093&use-ssl=true&ca-location=/etc/ssl/certs/ca.crt' attributes = {nvp[0] : nvp[1] for nvp in urllib.parse.parse_qsl(endpoint_args, keep_blank_values=True)} response = snsclient.create_topic(Name=topic_name, Attributes=attributes) When I put an object, the radosgw log show this: Kafka connect: failed to create producer: ssl.ca.location failed: crypto/x509/by_file.c:199: error:0B084002:x509 certificate routines:X509_load_cert_crl_file:system lib: I have checked my ca.crt file and it is definitely in x509 format. If I use RGW v16.2.13, the producer will be created successfully. Anyone have any ideal? Thanks

11 months, 2 weeks

1
0
0 0

RADOSGW not authenticating with Keystone. Quincy release

by fsbiz＠yahoo.com

Hi folks, My ceph cluster with Quincy and Rocky9 is up and running. But I'm having issues with RADOSGW authenticating with keystone. Was wondering if I'm missed anything in the configuration. From the debug logs below, it appears that radosgw is still trying to authenticate with Swift instead of Keystone. Any pointers will be appreciated. thanks, Here is my configuration. # ceph config dump | grep rgw client advanced debug_rgw 20/20 client advanced rgw_keystone_accepted_roles admin,user * client advanced rgw_keystone_admin_domain Default * client advanced rgw_keystone_admin_password <secret> * client advanced rgw_keystone_admin_project service * client advanced rgw_keystone_admin_user ceph-ks-svc * client advanced rgw_keystone_api_version 3 client advanced rgw_keystone_implicit_tenants false * client advanced rgw_keystone_token_cache_size 0 client basic rgw_keystone_url <Identity URL> * client advanced rgw_s3_auth_use_keystone true client advanced rgw_swift_account_in_url true client basic rgw_thread_pool_size 512 client.rgw.s_rgw.dev-ipp1-u1-control01.ojmddc basic rgw_frontends beast port=7480 * client.rgw.s_rgw.dev-ipp1-u1-control02.adnjrx basic rgw_frontends beast port=7480 Here's the debug log. If I interpret it correctly, it is trying to do a swift authentication and failing. Am I missing any configuration for Keystone based authentication ? Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: beast: 0x7fddeb8e7710: 10.117.53.10 - - [03/Jun/2023:18:47:03.060 +0000] "GET /swift/v1/AUTH_c668ed224e434c88a9e0fce125056112?format=json HTTP/1.1" 401 119 - "openstacksdk/0.52.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10" - latency=0.000000000s Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_ACCEPT=*/* Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_ACCEPT_ENCODING=gzip, deflate Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_CONNECTION=close Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: HTTP_HOST=dev-ipp1-u1-object-store Jun 03 11:47:03 dev-ipp1-u1-control02radosgw[2802861]: HTTP_USER_AGENT=openstacksdk/0.52.0 keystoneauth1/4.0.0 python-requests/2.22.0 CPython/3.8.10 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_VERSION=1.1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_X_AUTH_TOKEN=gAAAAABke4qn779UQ_XMz0EDL3P3TgjBQsGG6p-MNhviJxLZTuMTnTDmpT5Yfi9UpgO_T3LOOsPjQAw6zoMUIaC22wPeryp5x-UumB3XwXOWp-qSXLbuN3b9oj_Qg5kCZWA0waWNRHzQ1mwtlEmmpTgvTXbU5V1ym6hEBOn6Q3RWhn34Hj3cF9o Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: HTTP_X_FORWARDED_FOR=10.117.148.3 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: QUERY_STRING=format=json Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REMOTE_ADDR=10.117.53.10 Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REQUEST_METHOD=GET Jun 03 11:47:03 dev-ipp1-u1-control02.radosgw[2802861]: REQUEST_URI=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112?format=json Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: SCRIPT_URI=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: SERVER_PORT=7480 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: ====== starting new request req=0x7fddeb8e7710 ===== Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s initializing for trans_id = tx000003991cfc5c1791f95-00647b8aa7-30c56-default Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s rgw api priority: s3=8 s3website=7 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s host=dev-ipp1-u1-object-store Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/swift/v1/AUTH_c668ed224e434c88a9e0fce125056112 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s name: format val: json Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s ver=v1 first= req= Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s handler=29RGWHandler_REST_Service_SWIFT Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s getting op 0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s get_system_obj_state: rctx=0x7fddeb8e6790 obj=default.rgw.log:script.prerequest. state=0x55f743b97720 s->prefetch_data=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s cache get: name=default.rgw.log++script.prerequest. : hit (negative entry) Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets scheduling with throttler client=3 cost=1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets op=29RGWListBuckets_ObjStore_SWIFT Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets verifying requester Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::TempURLEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::TempURLEngine denied with reason=-13 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SignedTokenEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::SignedTokenEngine denied with reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::DefaultStrategy: trying rgw::auth::swift::SwiftAnonymousEngine Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets rgw::auth::swift::SwiftAnonymousEngine denied with reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets Failed the auth strategy, reason=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: failed to authorize request Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s op->ERRORHANDLER: err_no=-1 new_err_no=-1 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s get_system_obj_state: rctx=0x7fddeb8e6790 obj=default.rgw.log:script.postrequest. state=0x55f743b97960 s->prefetch_data=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s cache get: name=default.rgw.log++script.postrequest. : hit (negative entry) Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets op status=0 Jun 03 11:47:03 dev-ipp1-u1-control02 radosgw[2802861]: req 4148325180046385045 0.000000000s swift:list_buckets http status=401

11 months, 2 weeks

1
0
0 0

PGs stuck undersized and not scrubbed

by Nicola Mori

Dear Ceph users, after an outage and recovery of one machine I have several PGs stuck in active+recovering+undersized+degraded+remapped. Furthermore, many PGs have not been (deep-)scrubbed in time. See below for status and health details. It's been like this for two days, with no recovery I/O being reported, so I guess something is stuck in a bad state. I'd need some help in understanding what's going on here and how to fix it. Thanks, Nicola --------------------- # ceph -s cluster: id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45 health: HEALTH_WARN 2 OSD(s) have spurious read errors Degraded data redundancy: 7349/147534197 objects degraded (0.005%), 22 pgs degraded, 22 pgs undersized 332 pgs not deep-scrubbed in time 503 pgs not scrubbed in time (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT) services: mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 2d) mgr: bofur.tklnrn(active, since 32h), standbys: balin.hvunfe, aka.wzystq mds: 2/2 daemons up, 1 standby osd: 104 osds: 104 up (since 37h), 104 in (since 37h); 22 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 529 pgs objects: 18.53M objects, 40 TiB usage: 54 TiB used, 142 TiB / 196 TiB avail pgs: 7349/147534197 objects degraded (0.005%) 2715/147534197 objects misplaced (0.002%) 507 active+clean 20 active+recovering+undersized+degraded+remapped 2 active+recovery_wait+undersized+degraded+remapped # ceph health detail [WRN] PG_DEGRADED: Degraded data redundancy: 7349/147534197 objects degraded (0.005%), 22 pgs degraded, 22 pgs undersized pg 3.2c is stuck undersized for 37h, current state active+recovery_wait+undersized+degraded+remapped, last acting [79,83,34,37,65,NONE,18,95] pg 3.57 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [57,99,37,NONE,15,104,55,40] pg 3.76 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [57,5,37,15,100,33,85,NONE] pg 3.9c is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [57,86,88,NONE,11,69,20,10] pg 3.106 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,15,89,NONE,36,32,23,64] pg 3.107 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,NONE,64,20,61,92,104,43] pg 3.10c is stuck undersized for 37h, current state active+recovery_wait+undersized+degraded+remapped, last acting [79,34,NONE,95,104,16,69,18] pg 3.11e is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,89,64,46,32,NONE,40,15] pg 3.14e is stuck undersized for 37h, current state active+recovering+undersized+degraded+remapped, last acting [57,34,69,97,85,NONE,46,62] pg 3.160 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [57,1,101,84,18,33,NONE,69] pg 3.16a is stuck undersized for 37h, current state active+recovering+undersized+degraded+remapped, last acting [57,16,59,103,13,38,49,NONE] pg 3.16e is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [57,0,27,96,55,10,81,NONE] pg 3.170 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [NONE,57,14,46,55,99,15,40] pg 3.19b is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [NONE,79,59,8,32,17,7,90] pg 3.1a0 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [NONE,79,26,50,104,24,97,40] pg 3.1a5 is stuck undersized for 37h, current state active+recovering+undersized+degraded+remapped, last acting [57,100,61,27,20,NONE,24,85] pg 3.1a8 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,24,NONE,3,55,40,98,45] pg 3.1aa is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,91,48,NONE,24,3,8,85] pg 3.1af is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,NONE,90,33,104,69,26,8] pg 3.1c1 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,95,NONE,53,54,27,18,85] pg 3.1c4 is stuck undersized for 2d, current state active+recovering+undersized+degraded+remapped, last acting [79,69,56,84,95,8,NONE,4] pg 3.1d5 is stuck undersized for 37h, current state active+recovering+undersized+degraded+remapped, last acting [57,48,NONE,104,34,16,37,89] [WRN] PG_NOT_DEEP_SCRUBBED: 332 pgs not deep-scrubbed in time pg 3.1ff not deep-scrubbed since 2023-05-18T21:06:57.883787+0000 pg 3.1fe not deep-scrubbed since 2023-05-22T19:50:11.497538+0000 pg 3.1fd not deep-scrubbed since 2023-05-22T19:44:12.680598+0000 pg 3.1fc not deep-scrubbed since 2023-05-20T19:56:43.746580+0000 pg 3.1fb not deep-scrubbed since 2023-05-22T18:29:12.794152+0000 pg 3.1f9 not deep-scrubbed since 2023-05-19T08:19:16.636964+0000 pg 3.1f8 not deep-scrubbed since 2023-05-22T21:49:28.891350+0000 pg 3.1f5 not deep-scrubbed since 2023-05-18T21:18:19.636068+0000 pg 3.1f4 not deep-scrubbed since 2023-05-18T18:00:41.241562+0000 pg 3.1f3 not deep-scrubbed since 2023-05-21T01:36:32.735139+0000 pg 3.1f2 not deep-scrubbed since 2023-05-23T03:59:02.154966+0000 pg 3.1f1 not deep-scrubbed since 2023-05-22T21:47:46.419880+0000 pg 3.1f0 not deep-scrubbed since 2023-05-22T19:17:38.327356+0000 pg 3.1ef not deep-scrubbed since 2023-05-19T01:49:04.133392+0000 pg 3.1ee not deep-scrubbed since 2023-05-21T12:25:52.010406+0000 pg 3.1ed not deep-scrubbed since 2023-05-19T20:13:20.675257+0000 pg 3.1eb not deep-scrubbed since 2023-05-18T12:13:53.684650+0000 pg 3.1ea not deep-scrubbed since 2023-05-18T09:45:57.172578+0000 pg 3.1e9 not deep-scrubbed since 2023-05-23T00:26:18.621324+0000 pg 3.1e8 not deep-scrubbed since 2023-05-21T05:15:03.969687+0000 pg 3.1e4 not deep-scrubbed since 2023-05-21T16:21:11.738145+0000 pg 3.1e3 not deep-scrubbed since 2023-05-22T13:13:19.611165+0000 pg 3.1e0 not deep-scrubbed since 2023-05-21T17:43:36.545240+0000 pg 3.1de not deep-scrubbed since 2023-05-18T00:03:49.873073+0000 pg 3.1dd not deep-scrubbed since 2023-05-22T20:30:56.025015+0000 pg 3.1db not deep-scrubbed since 2023-05-22T18:12:44.615539+0000 pg 3.1da not deep-scrubbed since 2023-05-20T21:11:00.060022+0000 pg 3.1d9 not deep-scrubbed since 2023-05-22T19:02:03.292022+0000 pg 3.1d8 not deep-scrubbed since 2023-05-23T17:37:05.320161+0000 pg 3.1d6 not deep-scrubbed since 2023-05-19T15:19:58.293551+0000 pg 3.1d4 not deep-scrubbed since 2023-05-23T02:28:54.392188+0000 pg 3.1d3 not deep-scrubbed since 2023-05-18T06:02:14.181321+0000 pg 3.1d2 not deep-scrubbed since 2023-05-18T11:46:29.582700+0000 pg 3.1d1 not deep-scrubbed since 2023-05-19T08:31:54.033426+0000 pg 3.1cd not deep-scrubbed since 2023-05-21T08:52:41.817826+0000 pg 3.1cc not deep-scrubbed since 2023-05-22T22:51:02.466708+0000 pg 3.1c9 not deep-scrubbed since 2023-05-18T08:06:50.220587+0000 pg 3.1c7 not deep-scrubbed since 2023-05-22T17:07:35.346608+0000 pg 3.1c5 not deep-scrubbed since 2023-05-20T17:09:12.048012+0000 pg 3.1c1 not deep-scrubbed since 2023-05-21T11:39:47.640196+0000 pg 3.1c0 not deep-scrubbed since 2023-05-22T20:22:57.166475+0000 pg 3.1bf not deep-scrubbed since 2023-05-19T19:08:08.313143+0000 pg 3.1be not deep-scrubbed since 2023-05-21T12:28:17.345386+0000 pg 3.1bd not deep-scrubbed since 2023-05-18T19:19:29.002801+0000 pg 3.1bb not deep-scrubbed since 2023-05-19T07:15:53.508751+0000 pg 3.1b8 not deep-scrubbed since 2023-05-19T18:50:27.701909+0000 pg 3.1b6 not deep-scrubbed since 2023-05-19T03:30:55.707248+0000 pg 3.1b5 not deep-scrubbed since 2023-05-20T20:37:48.346272+0000 pg 3.1b4 not deep-scrubbed since 2023-05-23T02:11:04.833784+0000 pg 3.1b3 not deep-scrubbed since 2023-05-18T20:46:40.876590+0000 282 more pgs... [WRN] PG_NOT_SCRUBBED: 503 pgs not scrubbed in time pg 3.1ff not scrubbed since 2023-05-24T23:37:22.323516+0000 pg 3.1fe not scrubbed since 2023-05-25T02:01:18.754476+0000 pg 3.1fd not scrubbed since 2023-05-24T20:31:23.239794+0000 pg 3.1fc not scrubbed since 2023-05-25T00:42:05.670791+0000 pg 3.1fb not scrubbed since 2023-05-24T19:29:29.438626+0000 pg 3.1fa not scrubbed since 2023-05-24T21:50:04.911965+0000 pg 3.1f9 not scrubbed since 2023-05-25T20:44:49.010622+0000 pg 3.1f8 not scrubbed since 2023-05-24T18:17:49.471926+0000 pg 3.1f7 not scrubbed since 2023-05-24T17:27:43.545337+0000 pg 3.1f6 not scrubbed since 2023-05-24T22:16:04.008644+0000 pg 3.1f5 not scrubbed since 2023-05-24T20:14:01.159271+0000 pg 3.1f4 not scrubbed since 2023-05-24T16:20:29.746958+0000 pg 3.1f3 not scrubbed since 2023-05-25T00:45:49.464448+0000 pg 3.1f2 not scrubbed since 2023-05-24T17:37:58.701570+0000 pg 3.1f1 not scrubbed since 2023-05-24T20:21:46.824657+0000 pg 3.1f0 not scrubbed since 2023-05-25T00:59:02.693836+0000 pg 3.1ef not scrubbed since 2023-05-24T21:35:10.061965+0000 pg 3.1ee not scrubbed since 2023-05-24T17:13:37.835095+0000 pg 3.1ed not scrubbed since 2023-05-24T18:17:21.739348+0000 pg 3.1ec not scrubbed since 2023-05-24T17:54:23.365899+0000 pg 3.1eb not scrubbed since 2023-05-24T23:18:31.345229+0000 pg 3.1ea not scrubbed since 2023-05-25T00:25:06.747723+0000 pg 3.1e9 not scrubbed since 2023-05-25T19:27:39.496774+0000 pg 3.1e8 not scrubbed since 2023-05-25T01:31:11.083814+0000 pg 3.1e7 not scrubbed since 2023-05-25T01:43:43.116599+0000 pg 3.1e6 not scrubbed since 2023-05-24T18:26:39.778008+0000 pg 3.1e4 not scrubbed since 2023-05-24T22:18:59.986309+0000 pg 3.1e3 not scrubbed since 2023-05-24T14:34:52.095564+0000 pg 3.1e2 not scrubbed since 2023-05-24T23:56:04.083842+0000 pg 3.1e1 not scrubbed since 2023-05-25T02:00:18.766811+0000 pg 3.1e0 not scrubbed since 2023-05-25T02:01:42.094304+0000 pg 3.1df not scrubbed since 2023-05-24T19:41:59.890557+0000 pg 3.1de not scrubbed since 2023-05-24T23:57:49.463552+0000 pg 3.1dd not scrubbed since 2023-05-25T17:42:33.397660+0000 pg 3.1dc not scrubbed since 2023-05-24T17:34:43.656366+0000 pg 3.1db not scrubbed since 2023-05-24T21:48:10.126232+0000 pg 3.1da not scrubbed since 2023-05-24T17:54:43.136739+0000 pg 3.1d9 not scrubbed since 2023-05-24T20:22:14.256914+0000 pg 3.1d8 not scrubbed since 2023-05-24T23:34:56.555311+0000 pg 3.1d7 not scrubbed since 2023-05-25T18:08:08.689329+0000 pg 3.1d6 not scrubbed since 2023-05-24T20:23:30.301130+0000 pg 3.1d5 not scrubbed since 2023-05-25T20:30:25.691077+0000 pg 3.1d4 not scrubbed since 2023-05-24T21:21:46.923743+0000 pg 3.1d3 not scrubbed since 2023-05-24T18:12:50.468466+0000 pg 3.1d2 not scrubbed since 2023-05-24T20:33:32.376232+0000 pg 3.1d1 not scrubbed since 2023-05-24T20:32:55.981738+0000 pg 3.1d0 not scrubbed since 2023-05-24T18:16:51.195524+0000 pg 3.1cf not scrubbed since 2023-05-24T22:32:00.879058+0000 pg 3.1ce not scrubbed since 2023-05-25T02:46:02.834267+0000 pg 3.1cd not scrubbed since 2023-05-24T21:02:08.288116+0000 453 more pgs...

11 months, 2 weeks

2
2
0 0

Duplicate help statements in Prometheus metrics in 16.2.13

by Andreas Haupt

Dear all, after the update to CEPH 16.2.13 the Prometheus exporter is wrongly exporting multiple metric help & type lines for ceph_pg_objects_repaired: [mon1] /root #curl -sS http://localhost:9283/metrics # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="34"} 0.0 # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="33"} 0.0 # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="32"} 0.0 [...] This annoys our exporter_exporter service so it rejects the export of ceph metrics. Is this a known issue? Will this be fixed in the next update? Cheers, Andreas -- | Andreas Haupt | E-Mail: andreas.haupt(a)desy.de | DESY Zeuthen | WWW: http://www.zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax: +49/33762/7-7216

11 months, 2 weeks

2
1
0 0

Converting to cephadm : Error EINVAL: Failed to connect

by David Barton

I am trying to debug an issue with ceph orch host add Is there a way to debug the specific ssh commands being issued or add debugging code to a python script? There is nothing useful in my syslog or /var/log/ceph/cephadm.log Is there a way to get the command to log, or can someone point me in the direction of the source code so I can have a look? I've run tcpdump on port 22 to listen for outgoing packets and also for traffic going to the target IP, and there is nothing going out when I run ceph orch host add If I run ssh inside the cephadm shell then I see the packets go out and it works as I document below. I was going to upgrade to Quincy from Pacific 16.2.5 and decided to upgrade from ceph-deploy to cephadm I initially had problems because I run ssh on a non-standard port. Allowing port 22 has allowed me to run the command below on every node except one. ceph orch host add [short hostname] [ip address] That one host fails, inexplicably with the error: Error EINVAL: Failed to connect to cephstorage-rs01 (103.XXX.YY.ZZ). If I run cephadm shell (without --no-hosts as that gives the error: unknown flag: --no-hosts) it works as expected. # cephadm shell Inferring fsid 525ec8aa-b401-4ddf-aa8f-4493727dac02 Inferring config /var/lib/ceph/525ec8aa-b401-4ddf-aa8f-4493727dac02/mon.cephstorage-ig03/config Using recent ceph image ceph/daemon-base@sha256:a038c6dc35064edff40bb7e824783f1bbd325c888e722ec5e814671406216ad5 root@cephstorage-ig03:/# ceph cephadm get-ssh-config > ssh_config root@cephstorage-ig03:/# ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key root@cephstorage-ig03:/# chmod 0600 ~/cephadm_private_key root@cephstorage-ig03:/# ssh -F ssh_config -i ~/cephadm_private_key root(a)103.XXX.YY.ZZ Warning: Permanently added '103.XXX.YY.ZZ' (ECDSA) to the list of known hosts. Welcome to XXXXX I was mucking around with custom ssh-config files to get around the port issue, but it did not seem to work so I and reverted back to the vanilla version with: ceph cephadm clear-ssh-config So when I am inside the shell it works, but it doesn't work properly via ceph orch host add There is one thing that is unusual that I think is worth mentioning. When I was adding the servers with custom ssh config files, I had a bad entry in the hosts file for cephstorage-rs01 on that server, resolving to 127.0.0.1 When I added it, it said it added the IP as 127.0.0.127# ceph orch host ls HOST ADDR LABELS STATUS ... cephstorage-rs01 127.0.0.127 Offline ... I then ran ceph orch host rm cephstorage-rs01 I have tried an iptables re-route in the vain idea that if there was some kind of host to IP cache it would route to localhost and tell me that the host name didn't match. That did not work. Right now, I am a sad panda as my ceph cluster is half transitioned. My next port of call is probably to try and adopt what I can into cephadm and make sure the cluster is ok, and then finally drop the problem node and then re-add it. Any help will be appreciated. Regards, David

11 months, 3 weeks

3
3
0 0

Re: CEPH Version choice

by Frank Schilder

Hi Marc, I uploaded all scripts and a rudimentary readme to https://github.com/frans42/cephfs-bench . I hope it is sufficient to get started. I'm afraid its very much tailored to our deployment and I can't make it fully configurable anytime soon. I hope it serves a purpose though - at least I discovered a few bugs with it. We actually kept the benchmark running through an upgrade from mimic to octopus. Was quite interesting to see how certain performance properties change with that. This benchmark makes it possible to compare versions with live timings coming in. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Marc <Marc(a)f1-outsourcing.eu> Sent: Monday, May 15, 2023 11:28 PM To: Frank Schilder Subject: RE: [ceph-users] Re: CEPH Version choice > I planned to put it on-line. The hold-back is that the main test is un- > taring a nasty archive and this archive might contain personal > information, so I can't just upload it as is. I can try to put together > a similar archive from public sources. Please give me a bit of time. I'm > also a bit under stress right now with our users being hit by an FS meta > data corruption. That's also why I'm a bit trigger happy. > Ok thanks, very nice, no hurry!!!

11 months, 3 weeks

2
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2023