February 2023 - ceph-users

RGW cannot list or create openidconnect providers

by mat＠hazmat.dev

Hello, I'm attempting to setup an OpenIDConnect provider with RGW. I'm doing this using the boto3 API & Python. However it seems that the APIs are failing in some unexpected ways because radosgw was not setup correctly. There is sample code below, and yes, I know there are "secrets" in it - but this is an offline test lab so I am fine with this. The first error shows this in the logs. 2023-02-16T00:45:26.860-0500 7fe19fef7700 1 ====== starting new request req=0x7fe2ccb54680 ===== 2023-02-16T00:45:26.904-0500 7fe19def3700 0 req 17562030806519127926 0.044000439s ERROR: listing filtered objects failed: OIDC pool: default.rgw.meta: oidc_url.: (2) No such file or directory 2023-02-16T00:45:26.904-0500 7fe19aeed700 1 ====== req done req=0x7fe2ccb54680 op status=-2 http_status=404 latency=0.044000439s ====== 2023-02-16T00:45:26.904-0500 7fe19aeed700 1 beast: 0x7fe2ccb54680: 10.20.104.178 - authentik [16/Feb/2023:00:45:26.860 -0500] "POST / HTTP/1.1" 404 189 - "Boto3/1.26.71 Python/3.11.1 Linux/6.0.6-76060006-generic Botocore/1.29.72" - latency=0.044000439s So the object "oidc_url" is missing from the "default.rgw.meta" pool? rados --pool default.rgw.meta ls --all users.uid root.buckets users.uid authentik.buckets root test4 root .bucket.meta.test2:3866fac0-854b-48b5-b3b7-bf84a166a404.1165645.1 users.keys ZVBTLTYRRPY7JU39WOR9 users.uid authentik users.uid cephadmin users.keys NIVIV0JSKD9D2LDC3IH4 users.uid root users.email tester(a)lab.dev users.keys L70QT3LN71SQXWHS97Y4 root .bucket.meta.test:3866fac0-854b-48b5-b3b7-bf84a166a404.1204730.1 root .bucket.meta.test4:3866fac0-854b-48b5-b3b7-bf84a166a404.1204730.2 root test root test2 Well the object is clearly not there and I do not know how to fix this. The second error produces this error in the log: 2023-02-16T01:11:29.304-0500 7fe1976e6700 1 ====== starting new request req=0x7fe2ccb54680 ===== 2023-02-16T01:11:29.312-0500 7fe18c6d0700 1 ====== req done req=0x7fe2ccb54680 op status=-22 http_status=400 latency=0.008000083s ====== 2023-02-16T01:11:29.312-0500 7fe18c6d0700 1 beast: 0x7fe2ccb54680: 10.20.104.178 - authentik [16/Feb/2023:01:11:29.304 -0500] "POST / HTTP/1.1" 400 189 - "Boto3/1.26.71 Python/3.11.1 Linux/6.0.6-76060006-generic Botocore/1.29.72" - latency=0.008000083s Its much less clear what is going on here, it just returns 400. Boto raises this exception, "botocore.exceptions.ClientError: An error occurred (Unknown) when calling the CreateOpenIDConnectProvider operation: Unknown". Has anyone seen this before and know how to setup the correct objects for OpenidConnect? Version info ============================================== ceph version 17.2.5 (e04241aa9b639588fa6c864845287d2824cb6b55) quincy (stable) Examples below ============================================== # creating the client works fine - I can see my user authenticate in the radosgw logs access_key_id = 'L70QT3LN71SQXWHS97Y4' secret_access_key = 'QEXLa5V0Zm38068n3goDtm8V6WlaDwxVmAq9W2XV' iam = boto3.client('iam', aws_access_key_id=access_key_id, aws_secret_access_key=secret_access_key, region_name="default", endpoint_url="https://s3.lab") # First error providers_response = iam.list_open_id_connect_providers() # Second Error oidc_response = iam.create_open_id_connect_provider( # Issuer URL Url="https://login.lab/application/o/d7d64496e26c156ca9ea0802c5d7ed1c/", ClientIDList=['authentik'], ThumbprintList=['BDCC44F40254E7E1258DA4698833FFE2E8AECA3D3799044D8A1F97F7DFF20511'])

1 year, 2 months

2
2
1 0

ceph noout vs ceph norebalance, which is better for minor maintenance

by wkonitzer＠mirantis.com

Hi, We have a discussion going on about which is the correct flag to use for some maintenance on an OSD, should it be "noout" or "norebalance"? This was sparked because we need to take an OSD out of service for a short while to upgrade the firmware. One school of thought is: - "ceph norebalance" prevents automatic rebalancing of data between OSDs, which Ceph does to ensure all OSDs have roughly the same amount of data. - "ceph noout" on the other hand prevents OSDs from being marked as out-of-service during maintenance, which helps maintain cluster performance and availability. - Additionally, if another OSD fails while the "norebalance" flag is set, the data redundancy and fault tolerance of the Ceph cluster may be compromised. - So if we're going to maintain the performance and reliability we need to set the "ceph noout" flag to prevent the OSD from being marked as OOS during maintenance and allow the automatic data redistribution feature of Ceph to work as intended. The other opinion is: - With the noout flag set, Ceph clients are forced to think that OSD exists and is accessible - so they continue sending requests to such OSD. The OSD also remains in the crush map without any signs that it is actually out. If an additional OSD fails in the cluster with the noout flag set, Ceph is forced to continue thinking that this new failed OSD is OK. It leads to stalled or delayed response from the OSD side to clients. - Norebalance instead takes into account the in/out OSD status, but prevents data rebalance. Clients are also aware of the real OSD status, so no requests go to the OSD which is actually out. If an additional OSD fails - only the required temporary PG are created to maintain at least 2 existing copies of the same data (well, generally it is set by the pool min size). The upstream docs seem pretty clear that noout should be used for maintenance (https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-osd/), but the second opinion strongly suggests that norebalance is actually better and the Ceph docs are out of date. So what is the feedback from the wider community? Thanks, Will

1 year, 2 months

5
7
0 0

bluefs_db_type

by Stolte, Felix

Hey guys, most of my osds have HDD for block and SSD for db. But according to "ceph osd metadata" bluefs_db_type = hdd and bluefs_db_rotational = 1. lsblk -o name, rota reveals the following (sdb is db device for 3 hdds): sdb 0 ├─ceph--block--dbs--b77a8d7c--bdb5--420b--ad27--65e1d5080550-osd--block--db--b164ba4c--48c9--41a0--8b5e--ae3a6a23a22c 1 ├─ceph--block--dbs--b77a8d7c--bdb5--420b--ad27--65e1d5080550-osd--block--db--1c7aa9a1--791d--4ed6--8049--9fba8d5ac828 1 └─ceph--block--dbs--b77a8d7c--bdb5--420b--ad27--65e1d5080550-osd--block--db--ec92f9c6--d651--46ed--b6cd--4cf37c8ce284 1 I am pretty sure sdb had rota 0 during osd deployment (still has, but lvm volumes don’t). Question 1: Is the output of osd metadata for bluefs_db_type and bluefs_db_rotational relevant for how the osd process is treating the disks? (or does it just reflect the value of /sys/block/<device>/queue/rotational?) Question 2: How can I verify that the osd process is treating the db device as an ssd? If i remember correctly, the osd process is using different parameters if db is on SSD instead of HDD. Best regards Felix --------------------------------------------------------------------------------------------- --------------------------------------------------------------------------------------------- Forschungszentrum Juelich GmbH 52425 Juelich Sitz der Gesellschaft: Juelich Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Volker Rieke Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior --------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------

1 year, 2 months

1
0
0 0

RGW Service SSL HAProxy.cfg

by Jimmy Spets

Hi I am trying to setup the “High availability service for RGW” using SSL both to the HAProxy and from the HAProxy to the RGW backend. The SSL certificate gets applied to both HAProxy and the RGW. If I use the RGW instances directly they work as expected. The RGW config is as follows: service_type: rgw service_id: rgw service_name: rgw.rgw placement: label: rgw count_per_host: 2 spec: ssl: true rgw_frontend_port: 6443 rgw_frontend_ssl_certificate: | -----BEGIN CERTIFICATE---- -----END PRIVATE KEY----- Ingress as follows: service_type: ingress service_id: rgw.rgw placement: hosts: - cephrgw01 - cephrgw02 - cephrgw03 spec: backend_service: rgw.rgw virtual_ip: 172.16.1.130/16 frontend_port: 443 monitor_port: 1967 ssl_cert: | -----BEGIN CERTIFICATE----- -----END CERTIFICATE----- The issue is that the haproxy.cfg gets generated like this, without SSL enabled on the backends: # This file is generated by cephadm. global log127.0.0.1 local2 chroot/var/lib/haproxy pidfile/var/lib/haproxy/haproxy.pid maxconn8000 daemon stats socket /var/lib/haproxy/stats defaults modehttp logglobal optionhttplog optiondontlognull option http-server-close option forwardforexcept 127.0.0.0/8 optionredispatch retries3 timeout queue20s timeout connect5s timeout http-request1s timeout http-keep-alive 5s timeout client1s timeout server1s timeout check5s maxconn8000 frontend stats mode http bind 172.16.1.130:1967 bind localhost:1967 stats enable stats uri /stats stats refresh 10s stats auth admin:abcdefg http-request use-service prometheus-exporter if { path /metrics } monitor-uri /health frontend frontend bind 172.16.1.130:443 ssl crt /var/lib/haproxy/haproxy.pem default_backend backend backend backend option forwardfor balance static-rr option httpchk HEAD / HTTP/1.0 server rgw.rgw.cephrgw01.euvqmd 172.16.1.131:6443 check weight 100 server rgw.rgw.cephrgw01.aphsnx 172.16.1.131:6444 check weight 100 server rgw.rgw.cephrgw02.ovckaw 172.16.1.132:6443 check weight 100 server rgw.rgw.cephrgw02.jevtrb 172.16.1.132:6444 check weight 100 server rgw.rgw.cephrgw03.gzdame 172.16.1.133:6443 check weight 100 server rgw.rgw.cephrgw03.bchspq 172.16.1.133:6444 check weight 100 This of course does not work as the backend use SSL. Is there some configuration that I have missed or should I file a bug report? /Jimmy

1 year, 2 months

2
3
0 0

ceph-osd@86.service crashed at a random time.

by luckydog xf

Hello, lists. I have a 108 OSD ceph cluster. All OSDs work fine except one OSD-86. ceph-osd(a)86.service stopped working at a random time. The disk is normal by checking with `smarctl -a`. It could be fine for a few days after I restart it. Then it goes wrong again. I paste the related log here. It stopped at 05:26 UTC. --- 2023-02-17T05:26:37.795+0000 7ff525846700 0 log_channel(cluster) log [DBG] : 17.df scrub starts 2023-02-17T05:26:37.799+0000 7ff525846700 0 log_channel(cluster) log [DBG] : 17.df scrub ok 2023-02-17T05:26:38.779+0000 7ff527049700 0 log_channel(cluster) log [DBG] : 2.64 scrub starts 2023-02-17T05:26:38.783+0000 7ff527049700 0 log_channel(cluster) log [DBG] : 2.64 scrub ok 2023-02-17T05:26:38.871+0000 7ff526848700 1 osd.86 pg_epoch: 113734 pg[20.115( v 113733'56242916 (113711'56240668,113733'56242916] local-lis/les=113726/113727 n=1113 ec=440/440 lis/c=113726/113726 les/c/f=113727/113727/0 sis=113734) [105,86,97] r=1 lpr=113734 pi=[113726,113734)/1 luod=0'0 lua=113730'56242903 crt=113733'56242916 lcod 113733'56242915 mlcod 0'0 active mbc={}] start_peering_interval up [105,86,97] -> [105,86,97], acting [105,97] -> [105,86,97], acting_primary 105 -> 105, up_primary 105 -> 105, role -1 -> 1, features acting 4540138292840890367 upacting 4540138292840890367 2023-02-17T05:26:38.871+0000 7ff526848700 1 osd.86 pg_epoch: 113734 pg[20.115( v 113733'56242916 (113711'56240668,113733'56242916] local-lis/les=113726/113727 n=1113 ec=440/440 lis/c=113726/113726 les/c/f=113727/113727/0 sis=113734) [105,86,97] r=1 lpr=113734 pi=[113726,113734)/1 crt=113733'56242916 lcod 113733'56242915 mlcod 0'0 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray 2023-02-17T05:26:55.075+0000 7ff52784a700 -1 *** Caught signal (Segmentation fault) ** in thread 7ff52784a700 thread_name:tp_osd_tp ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) 1: (()+0x14420) [0x7ff54448a420] 2: (BlueStore::ExtentMap::decode_some(ceph::buffer::v15_2_0::list&)+0x31d) [0x561eeca36ebd] 3: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x241) [0x561eeca3de21] 4: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int, unsigned long)+0x153) [0x561eeca4ae53] 5: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x233) [0x561eeca4bf63] 6: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x2b5) [0x561eec873235] 7: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x35f) [0x561eec6f2b6f] 8: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x561eec5aa00b] 9: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x14c8) [0x561eec5bc648] 10: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x31b) [0x561eec5be67b] 11: (ceph::osd::scheduler::PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x16) [0x561eec7876b6] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x4db) [0x561eec51724b] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x561eecbd5353] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x561eecbd8154] 15: (()+0x8609) [0x7ff54447e609] 16: (clone()+0x43) [0x7ff5443a3133] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -7193> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command assert hook 0x561ef68ea610 -7192> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command abort hook 0x561ef68ea610 -7191> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command leak_some_memory hook 0x561ef68ea610 -7190> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perfcounters_dump hook 0x561ef68ea610 -7189> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command 1 hook 0x561ef68ea610 -7188> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perf dump hook 0x561ef68ea610 -7187> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perfcounters_schema hook 0x561ef68ea610 -7186> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perf histogram dump hook 0x561ef68ea610 -7185> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command 2 hook 0x561ef68ea610 -7184> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perf schema hook 0x561ef68ea610 -7183> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perf histogram schema hook 0x561ef68ea610 -7182> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command perf reset hook 0x561ef68ea610 -7181> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config show hook 0x561ef68ea610 -7180> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config help hook 0x561ef68ea610 -7179> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config set hook 0x561ef68ea610 -7178> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config unset hook 0x561ef68ea610 -7177> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config get hook 0x561ef68ea610 -7176> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config diff hook 0x561ef68ea610 -7175> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command config diff get hook 0x561ef68ea610 -7174> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command injectargs hook 0x561ef68ea610 -7173> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command log flush hook 0x561ef68ea610 -7172> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command log dump hook 0x561ef68ea610 -7171> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command log reopen hook 0x561ef68ea610 -7170> 2023-02-17T05:26:23.928+0000 7ff5440ded80 5 asok(0x561ef6990000) register_command dump_mempools hook 0x561ef7568068 -7169> 2023-02-17T05:26:23.936+0000 7ff5440ded80 10 monclient: get_monmap_and_config -7168> 2023-02-17T05:26:23.936+0000 7ff5440ded80 10 monclient: build_initial_monmap -7167> 2023-02-17T05:26:23.936+0000 7ff5440ded80 10 monclient: monmap: epoch 0 -- -50> 2023-02-17T05:26:44.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:14.771302+0000) -49> 2023-02-17T05:26:44.771+0000 7ff52103d700 5 osd.86 113735 heartbeat osd_stat(store_statfs(0x35e3cac2000/0x40000000/0x3aac7ffe000, data 0x45247e7e86/0x450493c000, compress 0x0/0x0/0x0, omap 0x155e40, meta 0x3feaa1c0), peers [0,9,11,12,13,15,17,18,19,21,22,24,26,27,30,31,33,34,37,39,40,43,46,49,51,55,56,57,60,61,62,64,65,66,67,69,70,71,73,78,79,80,82,83,84,85,87,88,89,91,92,93,94,96,97,100,101,102,103,105,106,107] op hist []) -48> 2023-02-17T05:26:45.179+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 550584320 unmapped: 1384448 heap: 551968768 old mem: 2845415832 new mem: 2845415832 -47> 2023-02-17T05:26:45.447+0000 7ff5430b4700 10 monclient: handle_auth_request added challenge on 0x561f1606f000 -46> 2023-02-17T05:26:45.767+0000 7ff535866700 10 monclient: tick -45> 2023-02-17T05:26:45.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:15.771503+0000) -44> 2023-02-17T05:26:46.183+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 550805504 unmapped: 1163264 heap: 551968768 old mem: 2845415832 new mem: 2845415832 -43> 2023-02-17T05:26:46.579+0000 7ff5428b3700 10 monclient: handle_auth_request added challenge on 0x561f1606ec00 -42> 2023-02-17T05:26:46.579+0000 7ff536868700 2 osd.86 113735 ms_handle_reset con 0x561f1606ec00 session 0x561f166c6f00 -41> 2023-02-17T05:26:46.767+0000 7ff535866700 10 monclient: tick -40> 2023-02-17T05:26:46.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:16.771672+0000) -39> 2023-02-17T05:26:47.075+0000 7ff52103d700 5 osd.86 113735 heartbeat osd_stat(store_statfs(0x35e3cac2000/0x40000000/0x3aac7ffe000, data 0x45247e7e86/0x450493c000, compress 0x0/0x0/0x0, omap 0x155e40, meta 0x3feaa1c0), peers [0,9,11,12,13,15,17,18,19,21,22,24,26,27,30,31,33,34,37,39,40,43,46,49,51,55,56,57,60,61,62,64,65,66,67,69,70,71,73,78,79,80,82,83,84,85,87,88,89,91,92,93,94,96,97,100,101,102,103,105,106,107] op hist []) -38> 2023-02-17T05:26:47.183+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 553959424 unmapped: 1155072 heap: 555114496 old mem: 2845415832 new mem: 2845415832 -37> 2023-02-17T05:26:47.183+0000 7ff537069700 5 bluestore.MempoolThread(0x561ef7616a68) _resize_shards cache_size: 2845415832 kv_alloc: 1140850688 kv_used: 110273280 meta_alloc: 1023410176 meta_used: 2977563 data_alloc: 654311424 data_used: 0 -36> 2023-02-17T05:26:47.575+0000 7ff52103d700 5 osd.86 113735 heartbeat osd_stat(store_statfs(0x35e3cac2000/0x40000000/0x3aac7ffe000, data 0x45247e7e86/0x450493c000, compress 0x0/0x0/0x0, omap 0x155e40, meta 0x3feaa1c0), peers [0,9,11,12,13,15,17,18,19,21,22,24,26,27,30,31,33,34,37,39,40,43,46,49,51,55,56,57,60,61,62,64,65,66,67,69,70,71,73,78,79,80,82,83,84,85,87,88,89,91,92,93,94,96,97,100,101,102,103,105,106,107] op hist []) -35> 2023-02-17T05:26:47.767+0000 7ff535866700 10 monclient: tick -34> 2023-02-17T05:26:47.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:17.771928+0000) -33> 2023-02-17T05:26:48.223+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 557449216 unmapped: 811008 heap: 558260224 old mem: 2845415832 new mem: 2845415832 -32> 2023-02-17T05:26:48.699+0000 7ff5428b3700 10 monclient: handle_auth_request added challenge on 0x561efca9f000 -31> 2023-02-17T05:26:48.767+0000 7ff535866700 10 monclient: tick -30> 2023-02-17T05:26:48.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:18.772104+0000) -29> 2023-02-17T05:26:49.227+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 560799744 unmapped: 606208 heap: 561405952 old mem: 2845415832 new mem: 2845415832 -28> 2023-02-17T05:26:49.275+0000 7ff52103d700 5 osd.86 113735 heartbeat osd_stat(store_statfs(0x35e3cac0000/0x40000000/0x3aac7ffe000, data 0x45247e7e86/0x450493c000, compress 0x0/0x0/0x0, omap 0x155e40, meta 0x3feaa1c0), peers [0,9,11,12,13,15,17,18,19,21,22,24,26,27,30,31,33,34,37,39,40,43,46,49,51,55,56,57,60,61,62,64,65,66,67,69,70,71,73,78,79,80,82,83,84,85,87,88,89,91,92,93,94,96,97,100,101,102,103,105,106,107] op hist []) -27> 2023-02-17T05:26:49.367+0000 7ff5438b5700 10 monclient: handle_auth_request added challenge on 0x561f13c69000 -26> 2023-02-17T05:26:49.767+0000 7ff535866700 10 monclient: tick -25> 2023-02-17T05:26:49.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:19.772303+0000) -24> 2023-02-17T05:26:50.231+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 565821440 unmapped: 827392 heap: 566648832 old mem: 2845415832 new mem: 2845415832 -23> 2023-02-17T05:26:50.295+0000 7ff5430b4700 10 monclient: handle_auth_request added challenge on 0x561efca9f400 -22> 2023-02-17T05:26:50.767+0000 7ff535866700 10 monclient: tick -21> 2023-02-17T05:26:50.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:20.772449+0000) -20> 2023-02-17T05:26:51.231+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 570171392 unmapped: 671744 heap: 570843136 old mem: 2845415832 new mem: 2845415832 -19> 2023-02-17T05:26:51.767+0000 7ff535866700 10 monclient: tick -18> 2023-02-17T05:26:51.767+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:21.772614+0000) -17> 2023-02-17T05:26:51.803+0000 7ff5428b3700 10 monclient: handle_auth_request added challenge on 0x561f13c68400 -16> 2023-02-17T05:26:52.123+0000 7ff53205f700 5 bluestore(/var/lib/ceph/osd/ceph-86) _kv_sync_thread utilization: idle 9.937826090s of 10.006035168s, submitted: 179 -15> 2023-02-17T05:26:52.183+0000 7ff537069700 5 bluestore.MempoolThread(0x561ef7616a68) _resize_shards cache_size: 2845415832 kv_alloc: 1140850688 kv_used: 113276944 meta_alloc: 1040187392 meta_used: 16149983 data_alloc: 654311424 data_used: 0 -14> 2023-02-17T05:26:52.247+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 574758912 unmapped: 278528 heap: 575037440 old mem: 2845415832 new mem: 2845415832 -13> 2023-02-17T05:26:52.343+0000 7ff5438b5700 10 monclient: handle_auth_request added challenge on 0x561f13c68000 -12> 2023-02-17T05:26:52.539+0000 7ff5430b4700 10 monclient: handle_auth_request added challenge on 0x561f167b4800 -11> 2023-02-17T05:26:52.555+0000 7ff5428b3700 10 monclient: handle_auth_request added challenge on 0x561f14eb4800 -10> 2023-02-17T05:26:52.771+0000 7ff535866700 10 monclient: tick -9> 2023-02-17T05:26:52.771+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:22.772842+0000) -8> 2023-02-17T05:26:52.775+0000 7ff52103d700 5 osd.86 113735 heartbeat osd_stat(store_statfs(0x35e3cabe000/0x40000000/0x3aac7ffe000, data 0x45247e7e86/0x450493c000, compress 0x0/0x0/0x0, omap 0x155e40, meta 0x3feaa1c0), peers [0,9,11,12,13,15,17,18,19,21,22,24,26,27,30,31,33,34,37,39,40,43,46,49,51,55,56,57,60,61,62,64,65,66,67,69,70,71,73,78,79,80,82,83,84,85,87,88,89,91,92,93,94,96,97,100,101,102,103,105,106,107] op hist []) -7> 2023-02-17T05:26:53.247+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 579764224 unmapped: 516096 heap: 580280320 old mem: 2845415832 new mem: 2845415832 -6> 2023-02-17T05:26:53.531+0000 7ff5438b5700 10 monclient: handle_auth_request added challenge on 0x561f14eb4000 -5> 2023-02-17T05:26:53.771+0000 7ff535866700 10 monclient: tick -4> 2023-02-17T05:26:53.771+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:23.773042+0000) -3> 2023-02-17T05:26:54.251+0000 7ff537069700 5 prioritycache tune_memory target: 4294967296 mapped: 583467008 unmapped: 1007616 heap: 584474624 old mem: 2845415832 new mem: 2845415832 -2> 2023-02-17T05:26:54.771+0000 7ff535866700 10 monclient: tick -1> 2023-02-17T05:26:54.771+0000 7ff535866700 10 monclient: _check_auth_rotating have uptodate secrets (they expire after 2023-02-17T05:26:24.773241+0000) 0> 2023-02-17T05:26:55.075+0000 7ff52784a700 -1 *** Caught signal (Segmentation fault) ** in thread 7ff52784a700 thread_name:tp_osd_tp ceph version 15.2.16 (d46a73d6d0a67a79558054a3a5a72cb561724974) octopus (stable) 1: (()+0x14420) [0x7ff54448a420] 2: (BlueStore::ExtentMap::decode_some(ceph::buffer::v15_2_0::list&)+0x31d) [0x561eeca36ebd] 3: (BlueStore::ExtentMap::fault_range(KeyValueDB*, unsigned int, unsigned int)+0x241) [0x561eeca3de21] 4: (BlueStore::_do_read(BlueStore::Collection*, boost::intrusive_ptr<BlueStore::Onode>, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int, unsigned long)+0x153) [0x561eeca4ae53] 5: (BlueStore::read(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ghobject_t const&, unsigned long, unsigned long, ceph::buffer::v15_2_0::list&, unsigned int)+0x233) [0x561eeca4bf63] 6: (ReplicatedBackend::be_deep_scrub(hobject_t const&, ScrubMap&, ScrubMapBuilder&, ScrubMap::object&)+0x2b5) [0x561eec873235] 7: (PGBackend::be_scan_list(ScrubMap&, ScrubMapBuilder&)+0x35f) [0x561eec6f2b6f] 8: (PG::build_scrub_map_chunk(ScrubMap&, ScrubMapBuilder&, hobject_t, hobject_t, bool, ThreadPool::TPHandle&)+0x8b) [0x561eec5aa00b] 9: (PG::chunky_scrub(ThreadPool::TPHandle&)+0x14c8) [0x561eec5bc648] 10: (PG::scrub(unsigned int, ThreadPool::TPHandle&)+0x31b) [0x561eec5be67b] 11: (ceph::osd::scheduler::PGScrub::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x16) [0x561eec7876b6] 12: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x4db) [0x561eec51724b] 13: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x403) [0x561eecbd5353] 14: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x561eecbd8154] 15: (()+0x8609) [0x7ff54447e609] 16: (clone()+0x43) [0x7ff5443a3133] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 rbd_mirror 0/ 5 rbd_replay 0/ 5 rbd_rwl 0/ 5 journaler 0/ 5 objectcacher 0/ 5 immutable_obj_cache 0/ 5 client 1/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore

1 year, 2 months

1
0
0 0

forever stuck "slow ops" osd

by Arvid Picciani

Hi, today our entire cluster froze. or anything that uses librbd to be specific. ceph version 16.2.10 The message that saved me was "256 slow ops, oldest one blocked for 2893 sec, osd.7 has slow ops" , because it makes it immediately clear that this osd is the issue. I stopped the osd, which made the cluster available again. Restarting the osd makes it stuck again, although that osd has nothing in the error log, and the underlying ssd is healthy. It's just that one out of 27. There's nothing unique about it. We use the same disk product in other osds, and the host is also running other osds just fine. How does this happen, and why can the cluster not recover from this automatically? For example by stopping the affected osd or at least having a timeout for ops. Thanks -- +4916093821054

1 year, 2 months

2
1
0 0

clt meeting summary [15/02/2023]

by Nizamudeen A

Hi all, today's topics were: - Labs: - Keeping a catalog - Have a dedicated group to debug/work through the issues. - Looking for interested parties that would like to contribute in the lab maintenance tasks - Poll for meeting time, looking for a central person to follow up / organize - No one's been actively coordinating on the lab issues apart from Laura. David Orman volunteered if we need help coordinating the lab issues - Reef release - [casey] things aren't looking good for end-of-february freeze - Since the whole thing depends on test-infra, can't really estimate the time frame. - The freeze maybe delayed - Dev Summit in Amsterdam: estimate how many would attend in person, remote - 50/50 of those present would attend (as per the voting) - Ad hoc virtual could work - Need to update the component leads page: https://ceph.io/en/community/team/ - Vikhyath volunteered before, so Josh will check with him. Regards, -- Nizamudeen A Software Engineer Red Hat <https://www.redhat.com/> <https://www.redhat.com/>

1 year, 2 months

2
2
0 0

User + Dev monthly meeting happening tomorrow, Feb. 16th!

by Laura Flores

Hi Ceph Users, The User + Dev monthly meeting is coming up tomorrow, Thursday, Feb. 16th at 3:00 PM UTC. Please add any topics you'd like to discuss to the agenda: https://pad.ceph.com/p/ceph-user-dev-monthly-minutes <https://pad.ceph.com/p/ceph-user-dev-monthly-minutes#L13> See you there, Laura Flores -- Laura Flores She/Her/Hers Software Engineer, Ceph Storage Red Hat Inc. <https://www.redhat.com> Chicago, IL lflores(a)redhat.com M: +17087388804 @RedHat <https://twitter.com/redhat> Red Hat <https://www.linkedin.com/company/red-hat> Red Hat <https://www.facebook.com/RedHatInc> <https://www.redhat.com>

1 year, 2 months

1
1
0 0

RGW archive zone lifecycle

by ondrej＠kuuk.la

Hi, I have two Ceph clusters in a multi-zone setup. The first one (master zone) would be accessible to users for their interaction using RGW. The second one is set to sync from the master zone with the tier type of the zone set as an archive (to version all files). My question here is. Is there an option to set a lifecycle for the version files saved on the archive zone? For example, keep only 5 versions per file or delete version files older than one year? Thanks a lot.

1 year, 2 months

2
1
0 0

how to sync data on two site CephFS

by zxcs

Hi, Experts, we already have a CephFS cluster, called A, and now we want to setup another CephFS cluster(called B) in other site. And we need to synchronize data with each other for some directory(if all directory can synchronize , then very very good), Means when we write a file in A cluster, this file can auto sync to B cluster, and when we create a file or directory on B Cluster, this file or directory can auto sync to A Cluster. our question is does there any best practices to do that on CephFS? Thanks in advance! Thanks, zx

1 year, 2 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users February 2023