January 2024 - ceph-users

CEPH create an pool with 256 PGs stuck peering

by Phong Tran Thanh

Hi community. I' am running ceph cluster with 10 node and 180 osds, and i create an pool erasure code 4+2 with 256 PGs, but when create an pool PG too slow, and pg status stuck peering EALTH_WARN Reduced data availability: 5 pgs inactive, 5 pgs peering [WRN] PG_AVAILABILITY: Reduced data availability: 5 pgs inactive, 5 pgs peering pg 59.6b is stuck peering for 4m, current state creating+peering, last acting [17,87,92,117,71,149] pg 59.78 is stuck peering for 4m, current state creating+peering, last acting [94,16,137,98,41,79] pg 59.86 is stuck peering for 4m, current state creating+peering, last acting [37,107,24,138,144,25] and this is a pg query "recovery_state": [ { "name": "Started/Primary/Peering/GetInfo", "enter_time": "2024-01-04T11:02:09.208218+0000", "requested_info_from": [ { "osd": "101(4)" } ] }, { "name": "Started/Primary/Peering", "enter_time": "2024-01-04T11:02:09.208209+0000", "past_intervals": [ { "first": "0", "last": "0", "all_participants": [], "intervals": [] } ], "probing_osds": [ "0(3)", "36(5)", "74(2)", "100(0)", "101(4)", "150(1)" ], "down_osds_we_would_probe": [], "peering_blocked_by": [] }, { "name": "Started", "enter_time": "2024-01-04T11:02:09.208161+0000" } ], "agent_state": {} Why is the pg peering state so slow, it's affected by the network? My network lacp with two of 10Gbps NIC -- ---------------------------------------------------------------------------- *Tran Thanh Phong* Email: tranphong079(a)gmail.com Skype: tranphong079

4 months, 2 weeks

2
1
0 0

Upgrading from 16.2.11?

by Jeremy Hansen

I’d like to upgrade from 16.2.11 to the latest version. Is it possible to do this in one jump or do I need to go from 16.2.11 -> 16.2.14 -> 17.1.0 -> 17.2.7 -> 18.1.0 -> 18.2.1? I’m using cephadm. Thanks -jeremy

4 months, 2 weeks

2
1
0 0

Reef Dashboard Recovery Throughput empty

by Zoltán Beck

Hi All, we just upgraded to Reef, everything looks great, except the new Dashboard. The Recovery Throughput graph is empty, the recovery is ongoing for 18 hours and still no data. I tried to move the prometheus service to other node and redeployed couple of times, but still no data. Kind Regards Zoltan

4 months, 2 weeks

2
1
0 0

How to increment osd_deep_scrub_interval

by Jorge JP

Hello! I want to increment the interval between deep scrub in all osd. I tryed this but not configured: ceph config set osd.* osd_deep_scrub_interval 1209600 I have 50 osd.. should config every osd? thanks for the support!

4 months, 2 weeks

2
1
0 0

MacOS support for CephFS client

by Fabien Sirjean

Dear all, Could you please share recent experiences with support for cephfs access from MacOS clients ? I couldn't find any threads on this matter on the list since 2021 (Daniel Persson), where he stated that : > Mac Mini Intel Catalina - Connected and working fine. > Mac Mini M1 BigSur - Can't compile brew cask, no popups for extra > permissions in the GUI. Thanks a lot! Cheers, Fabien

4 months, 2 weeks

1
0
0 0

rgw connection resets

by Nathan Gleason

Hello, I’ve noticed what I think may be strange behavior from some rgw transfers. It only seems to be with larger multipart mp4 files. Once a connection comes into rgw it is accepted and processed, data begins to transfer, then I see "write_data failed: Connection reset by peer” and "s3:get_obj iterate_obj() failed with -104”. The client then reconnects and the transfer seems to be ok. This doesn’t appear to happen just downloading a regular file. This is happening to both my production and test cluster. I’m not seeing errors anywhere else, the network looks good and the clusters are healthy. I’m in the process of collecting some tcpdumps but figured I would ask if anyone knows why this is occurring. We run haproxy in front of the rgw. For troubleshooting purposes we have a basic haproxy config to just one rgw. We have tested with multiple browsers. Ceph 17.2.5 - cluster built and managed with cephadm Ubuntu 20.04.6 LTS Log: 2024-01-03T17:40:23.916+0000 7f646033d700 1 ====== starting new request req=0x7f640ca95730 ===== 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s initializing for trans_id = tx000003ec1e4b4dc157efd-0065959c07-104216-default 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s rgw api priority: s3=8 s3website=7 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s host=a2-west.usacld.net 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/n athan:tshoot/testfile.mp4 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s get_handler handler=22RGWHandler_REST_Obj_S3 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s handler=22RGWHandler_REST_Obj_S3 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s getting op 0 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s get_system_obj_state: rctx=0x7f640ca947c0 obj=default.rgw.log:script.prerequest. state=0x55fa31f882e0 s->prefetch_data=0 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s cache get: name=default.rgw.log++script.prerequest. : hit (negative entry) 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj scheduling with throttler client=2 cost=1 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj op=21RGWGetObj_ObjStore_S3 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj verifying requester 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rgw::auth::s3::S3AnonymousEngine granted access 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rgw::auth::s3::AWSAuthStrategy granted access 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj normalizing buckets and tenants 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s->object=testfile.mp4 s->bucket=nathan/tshoot 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj init permissions 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_system_obj_state: rctx=0x7f640ca94200 obj=default.rgw.meta:root:nathan/tshoot state=0x55fa31f882e0 s->prefetch _data=0 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj cache get: name=default.rgw.meta+root+nathan/tshoot : hit (requested=0x16, cached=0x17) 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_system_obj_state: s->obj_tag was set empty 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj cache get: name=default.rgw.meta+root+nathan/tshoot : hit (requested=0x11, cached=0x17) 2024-01-03T17:40:23.916+0000 7f646033d700 15 req 4522146966294331133 0.000000000s s3:get_obj decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner> <ID>nathan$testuser</ID><DisplayName>Nathan API User</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>nathan$testuser </ID><DisplayName>Nathan API User</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWSI_User_RADOS::read_user_info(): anonymous user 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj recalculating target 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj reading permissions 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_obj_state: rctx=0x7f640ca94ad0 obj=tshoot:testfile.mp4 state=0x55fa31082de8 s->prefetch_data=1 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj WARNING: blocking librados call 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj manifest: total_size = 88811314 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_obj_state: setting s->obj_tag to _0OUNxgsSsqTAs9Z6xh1SOKpJZLV_Vos 2024-01-03T17:40:23.916+0000 7f646033d700 15 req 4522146966294331133 0.000000000s s3:get_obj decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner> <ID>nathan$testuser</ID><DisplayName>Nathan API User</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="Group"><URI>http://acs.amazonaws.c om/groups/global/AllUsers</URI></Grantee><Permission>READ</Permission></Grant><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>nathan$testuser</ID><DisplayNa me>Nathan API User</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy> 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj init op 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_system_obj_state: rctx=0x7f640ca94488 obj=default.rgw.meta:users.uid:nathan$testuser state=0x55fa31f882e0 s->p refetch_data=0 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj cache get: name=default.rgw.meta+users.uid+nathan$testuser : hit (requested=0x16, cached=0x17) 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_system_obj_state: s->obj_tag was set empty 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr: user.rgw.idtag 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj cache get: name=default.rgw.meta+users.uid+nathan$testuser : hit (requested=0x13, cached=0x17) 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj verifying op mask 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj required_mask= 1 user.op_mask=7 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj verifying op permissions 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj -- Getting permissions begin with perm_mask=49 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=anonymous, acct_n ame=, subuser=, perm_mask=15, is_admin=0) mask=49 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj Searching permissions for uid=anonymous 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj Permissions for user not found 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj Searching permissions for group=1 mask=49 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj Found permission: 1 2024-01-03T17:40:23.916+0000 7f646033d700 5 req 4522146966294331133 0.000000000s s3:get_obj -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=anonymous, acct_name=, subuser=, perm_mask=15, is_admin=0), owner=nathan$testuser, perm=1 2024-01-03T17:40:23.916+0000 7f646033d700 10 req 4522146966294331133 0.000000000s s3:get_obj identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=anonymous, acct_name=, subuser=, perm_mask =15, is_admin=0) requested perm (type)=1, policy perm=1, user_perm_mask=15, acl perm=1 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj verifying op params 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj pre-executing 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj check rate limiting 2024-01-03T17:40:23.916+0000 7f646033d700 2 req 4522146966294331133 0.000000000s s3:get_obj executing 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_obj_state: rctx=0x7f640ca94ad0 obj=tshoot:testfile.mp4 state=0x55fa31082de8 s->prefetch_data=1 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.acl 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.content_type 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.etag 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.idtag 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.manifest 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.pg_ver 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.source_zone 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.tail_tag 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-content-sha256 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-date 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj Read xattr rgw_rados: user.rgw.x-amz-meta-s3cmd-attrs 2024-01-03T17:40:23.916+0000 7f646033d700 15 req 4522146966294331133 0.000000000s Encryption mode: 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj get_obj_state: rctx=0x7f640ca94ad0 obj=tshoot:testfile.mp4 state=0x55fa31082de8 s->prefetch_data=1 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__multipart_testfile.mp4.2~XSiTvBBtaU4 134dQCusv3kMP7YjDBk7.1 obj-ofs=0 read_ofs=0 len=4194304 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=4194304 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): result: ofs=4194304 stripe_ofs=4194304 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__shadow_testfile.mp4.2~XSiTvBBtaU4134 dQCusv3kMP7YjDBk7.1_1 obj-ofs=4194304 read_ofs=0 len=4194304 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=8388608 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): result: ofs=8388608 stripe_ofs=8388608 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__shadow_testfile.mp4.2~XSiTvBBtaU4134 dQCusv3kMP7YjDBk7.1_2 obj-ofs=8388608 read_ofs=0 len=4194304 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=12582912 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): result: ofs=12582912 stripe_ofs=12582912 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__shadow_testfile.mp4.2~XSiTvBBtaU4134 dQCusv3kMP7YjDBk7.1_3 obj-ofs=12582912 read_ofs=0 len=3145728 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=16777216 part_ofs=0 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj RGWObjManifest::operator++(): result: ofs=15728640 stripe_ofs=15728640 part_ofs=15728640 rule->part_size=15728640 2024-01-03T17:40:23.916+0000 7f646033d700 20 req 4522146966294331133 0.000000000s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__multipart_testfile.mp4.2~XSiTvBBtaU4 134dQCusv3kMP7YjDBk7.2 obj-ofs=15728640 read_ofs=0 len=4194304 2024-01-03T17:40:23.924+0000 7f6442301700 20 req 4522146966294331133 0.007999974s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.924+0000 7f6442301700 20 req 4522146966294331133 0.007999974s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=19922944 part_ofs=15728640 rule->part_size=15728640 2024-01-03T17:40:23.924+0000 7f6442301700 20 req 4522146966294331133 0.007999974s s3:get_obj RGWObjManifest::operator++(): result: ofs=19922944 stripe_ofs=19922944 part_ofs=15728640 rule->part_size=15728640 2024-01-03T17:40:23.924+0000 7f6442301700 20 req 4522146966294331133 0.007999974s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__shadow_testfile.mp4.2~XSiTvBBtaU4134 dQCusv3kMP7YjDBk7.2_1 obj-ofs=19922944 read_ofs=0 len=4194304 2024-01-03T17:40:23.932+0000 7f643d2f7700 20 req 4522146966294331133 0.015999949s s3:get_obj RGWObjManifest::operator++(): rule->part_size=15728640 rules.size()=2 2024-01-03T17:40:23.932+0000 7f643d2f7700 20 req 4522146966294331133 0.015999949s s3:get_obj RGWObjManifest::operator++(): stripe_ofs=24117248 part_ofs=15728640 rule->part_size=15728640 2024-01-03T17:40:23.932+0000 7f643d2f7700 20 req 4522146966294331133 0.015999949s s3:get_obj RGWObjManifest::operator++(): result: ofs=24117248 stripe_ofs=24117248 part_ofs=15728640 rule->part_size=15728640 2024-01-03T17:40:23.932+0000 7f643d2f7700 20 req 4522146966294331133 0.015999949s s3:get_obj rados->get_obj_iterate_cb oid=4e125c16-5355-4d42-a6ba-4d388e8f7bd2.1011497.1__shadow_testfile.mp4.2~XSiTvBBtaU4134 dQCusv3kMP7YjDBk7.2_2 obj-ofs=24117248 read_ofs=0 len=4194304 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_status: e=0, sent=17, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_content_length: e=0, sent=26, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::send_header: e=0, sent=0, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::complete_header: e=0, sent=460, total=0 2024-01-03T17:40:23.932+0000 7f646a351700 30 AccountingFilter::set_account: e=1 2024-01-03T17:40:24.068+0000 7f642a2d1700 4 write_data failed: Connection reset by peer 2024-01-03T17:40:24.068+0000 7f642a2d1700 0 req 4522146966294331133 0.151999518s s3:get_obj iterate_obj() failed with -104 2024-01-03T17:40:24.156+0000 7f642eada700 2 req 4522146966294331133 0.239999235s s3:get_obj completing 2024-01-03T17:40:24.156+0000 7f642eada700 20 req 4522146966294331133 0.239999235s get_system_obj_state: rctx=0x7f640ca947c0 obj=default.rgw.log:script.postrequest.nathan state=0x55fa32424760 s->prefetch_data =0 2024-01-03T17:40:24.156+0000 7f642eada700 10 req 4522146966294331133 0.239999235s cache get: name=default.rgw.log++script.postrequest.nathan : hit (negative entry) 2024-01-03T17:40:24.156+0000 7f642eada700 30 AccountingFilter::complete_request: e=1, sent=0, total=0 2024-01-03T17:40:24.156+0000 7f642eada700 2 req 4522146966294331133 0.239999235s s3:get_obj op status=-104 2024-01-03T17:40:24.156+0000 7f642eada700 2 req 4522146966294331133 0.239999235s s3:get_obj http status=200 2024-01-03T17:40:24.156+0000 7f642eada700 1 ====== req done req=0x7f640ca95730 op status=-104 http_status=200 latency=0.239999235s ====== 2024-01-03T17:40:24.156+0000 7f642eada700 1 beast: 0x7f640ca95730: 10.30.0.196 - anonymous [03/Jan/2024:17:40:23.916 +0000] "GET /nathan:tshoot/testfile.mp4 HTTP/1.1" 200 0 - "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" - latency=0.239999235s 2024-01-03T17:40:24.156+0000 7f642eada700 20 failed to read header: end of stream 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_ACCEPT=*/* 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_ACCEPT_ENCODING=identity;q=1, *;q=0 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.7 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_HOST=a2-west.usacld.net 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_RANGE=bytes=0- 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_REFERER=https://a2-west.usacld.net/nathan:tshoot/testfile.mp4 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_CH_UA="Not_A Brand";v="8", "Chromium";v="120", "Brave";v="120" 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_CH_UA_MOBILE=?0 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_CH_UA_PLATFORM="macOS" 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_FETCH_DEST=video 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_FETCH_MODE=no-cors 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_FETCH_SITE=same-origin 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_SEC_GPC=1 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_VERSION=1.1 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_X_FORWARDED_PORT=443 2024-01-03T17:40:24.160+0000 7f6503483700 20 HTTP_X_FORWARDED_PROTO=https 2024-01-03T17:40:24.160+0000 7f6503483700 20 REMOTE_ADDR=10.30.0.196 2024-01-03T17:40:24.160+0000 7f6503483700 20 REQUEST_METHOD=GET 2024-01-03T17:40:24.160+0000 7f6503483700 20 REQUEST_URI=/nathan:tshoot/testfile.mp4 2024-01-03T17:40:24.160+0000 7f6503483700 20 SCRIPT_URI=/nathan:tshoot/testfile.mp4 2024-01-03T17:40:24.160+0000 7f6503483700 20 SERVER_PORT=8000 2024-01-03T17:40:24.160+0000 7f6503483700 1 ====== starting new request req=0x7f640ca95730 ===== Thanks, Nathan

4 months, 2 weeks

1
0
0 0

Unable to find Refresh Interval Option in Ceph Dashboard (Ceph v18.2.1 "reef")- Seeking Assistance

by Alam Mohammad

Hello, We've been using Ceph for managing our storage infrastructure, and we recently upgraded to the latest version (Ceph v18.2.1 "reef"). However, We've noticed that the "refresh interval" option seems to be missing in the dashboard, and we are facing challenges with monitoring our cluster in real-time. In the earlier version of the Ceph dashboard, there was a useful "refresh interval" option that allowed us to customize the update frequency of the dashboard. This was particularly handy for monitoring changes and responding promptly. However, after the upgrade to Ceph v18.2.1 "reef", We can't seem to find this option anywhere in the dashboard. Additionally, we observed an automatic refresh occurring at every 25 seconds. Seeking guidance locating and tuning the refresh interval settings in the latest version of Ceph to potentially reduce this interval. We've explored the dashboard settings thoroughly and reviewed the release notes for Ceph v18.2.1 "reef", but we couldn't find any mention of the removal of the "refresh interval" option. Any guidance or insights would be greatly appreciated! Thanks, Mohammad Saif Ceph Enthusiast

4 months, 2 weeks

3
2
0 0

Re: Ceph newbee questions

by Anthony D'Atri

>> >> You can do that for a PoC, but that's a bad idea for any production workload. You'd want at least three nodes with OSDs to use the default RF=3 replication. You can do RF=2, but at the peril of your mortal data. > > I'm not sure I agree - I think size=2, min_size=2 is no worse than > RAID1 for data security. size=2, min_size=2 *is* RAID1. Except that you become unavailable if a single drive is unavailable. >> That isn't even the main risk as I understand it. Of course a double > failure is going to be a problem with size=2, or traditional RAID1, > and I think anybody choosing this configuration accepts this risk. We see people often enough who don’t know that. I’ve seen double failures. ymmv. > As I understand it, the reason min_size=1 is a trap has nothing to do > with double failures per se. It’s one of the concerns. > > The issue is that Ceph OSDs are somewhat prone to flapping during > recovery (OOM, etc). So even if the disk is fine, an OSD can go down > for a short time. If you have size=2, min=1 configured, then when > this happens the PG will become degraded and will continue operating > on the other OSD, and the flapping OSD becomes stale. Then when it > comes back up it recovers. The problem is that if the other OSD has a > permanent failure (disk crash/etc) while the first OSD is flapping, > now you have no good OSDs, because when the flapping OSD comes back up > it is stale, and its PGs have no peer. Indeed, arguably that’s an overlapping failure. I’ve seen this too, and have a pg query to demonstrate it. > I suspect there are ways to re-activate it, though this will result in potential data > inconsistency since writes were allowed to the cluster and will then > get rolled back. Yep. > With only two OSDs I'm guessing that would be the > main impact (well, depending on journaling behavior/etc), but if you > have more OSDs than that then you could have situations where one file > is getting rolled back, and some other file isn't, and so on. But you’d have a voting majority. > > With min_size=2 you're fairly safe from flapping because there will > always be two replicas that have the most recent version of every PG, > and so you can still tolerate a permanent failure of one of them. Exactly. > > size=2, min=2 doesn't suffer this failure mode, because anytime there > is flapping the PG goes inactive and no writes can be made, so when > the other OSD comes back up there is nothing to recover. Of course > this results in IO blocks and downtime, which is obviously > undesirable, but it is likely a more recoverable state than > inconsistent writes. Agreed, the difference between availability and durability. Depends what’s important to you. > > Apologies if I've gotten any of that wrong, but my understanding is > that it is these sorts of failure modes that cause min_size=1 to be a > trap. This isn't the sort of thing that typically happens in a RAID1 > config, or at least that admins don't think about. It’s both.

4 months, 2 weeks

2
4
0 0

CLT Meeting Minutes 2024-01-03

by David Orman

Happy 2024! Today's CLT meeting covered the following: 1. 2024 brings a focus on performance of Crimson (some information here: https://docs.ceph.com/en/reef/dev/crimson/crimson/ ) 1. Status is available here: https://github.com/ceph/ceph.io/pull/635 2. There will be a new Crimson performance weekly meeting that will be lead by Matan Breizman 1. This does not replace the existing performance weekly, and is focused on Crimson 2. An email will follow with more details about this meeting 2. Ceph Quarterly will be published on/around the 14th of January, 2024. 1. See https://ceph.io/en/community/cq/ for previous issues of CQ 3. A development freeze on Squid is tentatively scheduled for January 31, 2024 4. Upcoming releases 1. 16.2.15 is next (the last Pacific release) 1. Anticipated by the end of January 2. 17.2.8 will follow (Quincy) 3. 18.2.2 will follow this (Reef)

4 months, 2 weeks

1
0
0 0

Ceph Docs: active releases outdated

by Eugen Block

Hello and a happy new year! I'm wondering if there are some structural changes or something regarding the release page [1]. It still doesn't contain version 18.2.1 (Reef) and the latest two Quincy releases (17.2.6, 17.2.7) are missing as well. And for Pacific it's even worse, the latest entry is for 16.2.11 although we're close to 16.2.5 being released. I wanted to check the changelog for 18.2.1 which can be found in the news blog [2]. I'd understand if the changelogs would be published only in one place (the blog?) but then it should be at least referenced in the docs. Thanks, Eugen [1] https://docs.ceph.com/en/reef/releases/ [2] https://ceph.io/en/news/blog/2023/v18-2-1-reef-released/

4 months, 2 weeks

3
3
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2024