January 2021 - ceph-users

PG inconsistent with empty inconsistent objects

by Seena Fallah

Hi, I'm facing something strange! One of the PGs in my pool got inconsistent and when I run `rados list-inconsistent-obj $PG_ID --format=json-pretty` the `inconsistents` key was empty! What is this? Is it a bug in Ceph or..? Thanks.

3 years, 3 months

3
4
0 0

Ceph 15.2.3 on Ubuntu 20.04 with odroid xu4 / python thread Problem

by Dominik H

Hi all, i try to run ceph client tools on an odroid xu4 (armhf) with Ubuntu 20.04 on python 3.8.5. Unfortunately there is the following error on each "ceph" command (even in ceph --help) Traceback (most recent call last): File "/usr/bin/ceph", line 1275, in <module> retval = main() File "/usr/bin/ceph", line 981, in main cluster_handle = run_in_thread(rados.Rados, File "/usr/lib/python3/dist-packages/ceph_argparse.py", line 1342, in run_in_thread raise Exception("timed out") Exception: timed out With this Server I access an existing Ceph-Cluster with the same hardware. I checked the code part, there is just a thread start and a join (waiting for finish a RadosThread). Maybe this is a python error in combination with armhf architecture? Maybe someone can help. Thanks and greetings Dominik

3 years, 3 months

3
8
0 0

librbd::DiffIterate: diff_object_map: failed to load object map rbd_object_map.

by Rafael Diaz Maurin

Hello cephers, I run Nautilus (14.2.15) Here is my context : each night a script take a snapshot from each RBD volume in a pool (all the disks of the VMs hosted) on my ceph production cluster. Then each snapshot is exported (rbd export-diff | rbd import-diff in SSH) towards my ceph bakup cluster. 0. Yesterday, and for the firt time, I had this error when I ran ''rbd -p MY_POOL du'' on the ceph backup cluster : 021-01-14 11:41:29.294 7f1f2fbcf0c0 -1 librbd::DiffIterate: diff_object_map: failed to load object map *rbd_object_map.781776950f8dd5.000000000000b740* 1. I tried to find the image concerned by the error, and I find this one, with a slight diffence with the number of the object-map : rbd image 'MY_IMAGE': size 30 GiB in 7680 objects order 22 (4 MiB objects) snapshot_count: 32 id: 781776950f8dd5 block_name_prefix: *rbd_data.781776950f8dd5* format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, operations op_features: snap-trash flags: create_timestamp: Thu Mar 19 01:00:27 2020 access_timestamp: Thu Mar 19 01:00:27 2020 modify_timestamp: Thu Jan 14 01:42:23 2021 2. Then I ran a check wich not show any error rbd -p MY_POOL object-map check "MY_IMAGE" Object Map Check: 100% complete...done. 3. Then, I still ran a rebuild rbd -p MY_POOL object-map rebuild "MY_IMAGE" Object Map Rebuild: 100% complete...done. 4. But the error is still here : 2021-01-15 09:33:58.775 7fa088e350c0 -1 librbd::DiffIterate: diff_object_map: failed to load object map rbd_object_map.781776950f8dd5.000000000000b740 Do someone know how to find the the image concerned by the object-map error and fix this error. Thank you for your help, Rafael

3 years, 3 months

3
5
0 0

Python API mon_comand()

by Robert Sander

Hi, I am trying to get some statistics via the Python API but fail to run the equivalent of "ceph df detail". One the command line I get: # ceph -f json df |jq .pools[0] { "name": "rbd", "id": 1, "stats": { "stored": 27410520278, "objects": 6781, "kb_used": 80382849, "bytes_used": 82312036566, "percent_used": 0.1416085809469223, "max_avail": 166317473792 } } # ceph -f json df detail |jq .pools[0] { "name": "rbd", "id": 1, "stats": { "stored": 27410520278, "objects": 6781, "kb_used": 80382849, "bytes_used": 82312036566, "percent_used": 0.1416085809469223, "max_avail": 166317473792, "quota_objects": 0, "quota_bytes": 0, "dirty": 6781, "rd": 309130743, "rd_bytes": 327278814208, "wr": 155492443, "wr_bytes": 155528225792, "compress_bytes_used": 0, "compress_under_bytes": 0, "stored_raw": 82231558358, "avail_raw": 498952444191 } } In Python I just get an error: # python Python 2.7.16 (default, Oct 10 2019, 22:02:15) [GCC 8.3.0] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import json >>> import rados >>> cluster = rados.Rados(conffile='/etc/ceph/ceph.conf') >>> cluster.connect() >>> cluster.mon_command(json.dumps({'prefix': 'df', 'format': 'json'}), b'') (0, '{"stats":{"total_bytes":643914731520,"total_avail_bytes":550794231808,"total_used_bytes":86678048768,"total_used_raw_bytes":93120499712,"total_used_raw_ratio":0.14461618661880493,"num_osds":6,"num_per_pool_osds":6},"stats_by_class":{"hdd":{"total_bytes":643914731520,"total_avail_bytes":550794231808,"total_used_bytes":86678048768,"total_used_raw_bytes":93120499712,"total_used_raw_ratio":0.14461618661880493}},"pools":[{"name":"rbd","id":1,"stats":{"stored":27410520278,"objects":6781,"kb_used":80382849,"bytes_used":82312036566,"percent_used":0.1416085809469223,"max_avail":166317473792}},{"name":"cephfs_data","id":3,"stats":{"stored":1282414464,"objects":307,"kb_used":3757248,"bytes_used":3847421952,"percent_used":0.0076519949361681938,"max_avail":166317473792}},{"name":"cephfs_metadata","id":4,"stats":{"stored":458803,"objects":22,"kb_used":2693,"bytes_used":2757248,"percent_used":5.5260434237425216e-06,"max_avail":166317473792}}]}\n', u'') >>> cluster.mon_command(json.dumps({'prefix': 'df detail', 'format': 'json'}), b'') (-22, '', u'command not known') >>> Anything I can do to get the output of "ceph df detail" via Python API? I would like to have the stats fields "rd", "wr", "rd_bytes" and "wr_bytes" per pool. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin http://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin

3 years, 3 months

2
1
0 0

.rgw.root was created wit a lot of PG

by Szabo, Istvan (Agoda)

Hi, Originally this pool was created with 512PG which makes couple of OSDs having 500PG 😲 What is the safe steps to copy over this pool? These are the files in this pool: default.realm period_config.f320e60d-8cff-4824-878e-c316423cc519 periods.18d63a25-8a50-4e17-9561-d452621f62fa.latest_epoch default.zonegroup.f320e60d-8cff-4824-878e-c316423cc519 zone_info.ba16656f-2191-40bb-bc39-9f19448d215d periods.6605bb4c-2226-4509-a3be-d5c95300fe14.1 default.zone.f320e60d-8cff-4824-878e-c316423cc519 zonegroup_info.f47a81ba-b214-4b8d-9b0e-84c14bc153cf periods.f320e60d-8cff-4824-878e-c316423cc519:staging realms.f320e60d-8cff-4824-878e-c316423cc519.control periods.6605bb4c-2226-4509-a3be-d5c95300fe14.latest_epoch realms.f320e60d-8cff-4824-878e-c316423cc519 realms_names.default periods.18d63a25-8a50-4e17-9561-d452621f62fa.1 zone_names.default zonegroups_names.default It’s single site cluster. Do I need to stop rados gateway? After I would follow the following steps: ceph osd pool create .rgw.root.new 8 rados cppool .rgw.root .rgw.root.new ceph osd pool delete .rgw.root .rgw.root --yes-i-really-really-mean-it ceph osd pool rename .rgw.root.new .rgw.root ceph osd pool application enable .rgw.root rgw Just not sure do I need to create outage request or not due to the rados stop. Thank you. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 3 months

2
3
0 0

Centos 8 2021 with ceph, how to move forward?

by Szabo, Istvan (Agoda)

Hi, Just curious how you guys move forward with this Centos 8 change. We just finished installing our full multisite cluster and looks like we need to change the operating system. So curious if you are using centos 8 with ceph, where you are going to move forward. Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 3 months

6
6
0 0

radosgw-admin sync status takes ages to print output

by Szabo, Istvan (Agoda)

Hello, I have a 3 DC octopus Multisite setup with bucket sync policy applied. I have 2 buckets where I’ve set the shard 24.000 and the other is 9.000 because they want to use 1 bucket but with a huge amount of objects (2.400.000.000 and 900.000.000) and in case of multisite we need to preshard the buckets as it is in the documentation. Do I need to fine tune something on the syncing to make this query faster? This is the output after 5-10 minutes query time not sure is it healthy or good or not to be honest, haven’t really find any good explanation about the output in the ceph documentation. From the master zone I can’r reallt even query because timed out, but in secondary zone can see this: radosgw-admin sync status realm 5fd28798-9195-44ac-b48d-ef3e95caee48 (realm) zonegroup 31a5ea05-c87a-436d-9ca0-ccfcbad481e3 (data) zone 61c9d940-fde4-4bed-9389-edc8d7741817 (sin) metadata sync syncing full sync: 0/64 shards incremental sync: 64/64 shards metadata is caught up with master data sync source: 9213182a-14ba-48ad-bde9-289a1c0c0de8 (hkg) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 128 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127] oldest incremental change not applied: 2021-01-14T12:01:00.131104+0700 [11] source: f20ddd64-924b-4f78-8d2d-dd6c65f98ba9 (ash) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 128 shards behind shards: [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127] oldest incremental change not applied: 2021-01-14T12:05:26.879014+0700 [98] Hope I can find some expert in the multisite area 😊 Thank you in advance. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 3 months

2
3
0 0

How to reset an OSD

by Pfannes, Fabian

We running a small Ceph cluster with two nodes. Our failureDomain is set to host to have the data replicated between the two hosts. The other night one host crashed hard and three OSDs won't recovert with either debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 -1 osd.23 0 OSD::init() : unable to read osd superblock debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 1 bluestore(/var/lib/ceph/osd/ceph-23) umount debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) allocation stats probe 0: cnt: 0 frags: 0 size: 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) probe -1: 0, 0, 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) probe -2: 0, 0, 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) probe -4: 0, 0, 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) probe -8: 0, 0, 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) probe -16: 0, 0, 0 debug 2021-01-13T08:13:17.855+0000 7f9bea85a700 0 bluestore(/var/lib/ceph/osd/ceph-23) ------------ debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 4 rocksdb: [db/db_impl.cc:390] Shutdown: canceling all background work debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 4 rocksdb: [db/db_impl.cc:563] Shutdown complete debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 1 bluefs umount debug 2021-01-13T08:13:17.855+0000 7f9bfbd6ef40 1 bdev(0x557150e20700 /var/lib/ceph/osd/ceph-23/block) close debug 2021-01-13T08:13:18.167+0000 7f9bfbd6ef40 1 freelist shutdown debug 2021-01-13T08:13:18.167+0000 7f9bfbd6ef40 1 bdev(0x557150e20000 /var/lib/ceph/osd/ceph-23/block) close debug 2021-01-13T08:13:18.411+0000 7f9bfbd6ef40 -1 ** ERROR: osd init failed: (22) Invalid argument or debug -2> 2021-01-13T08:13:29.991+0000 7f402c5f9700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 2795871023, got 2381104739 in db/000060.sst offset 748408 size 3819 code = 2 Rocksdb transaction: How can I delete the OSDs to get them back fully operational? Any help appreciated! /Fabian

3 years, 3 months

2
2
0 0

Decoding pgmap

by George Shuklin

There is a command `ceph pg getmap`. It produces a binary file. Are there any utility to decode it?

3 years, 3 months

1
0
0 0

Latency spike investigations on all SSD hardware cluster

by Martin Hronek

Hello fellow CEPH-users, we are currently investigating latency spikes in our CEPH(14.2.11) prod cluster, usually occurring when under heavy load. TLDR: Do you have an idea where to investigate some kv commit latency spikes on a CEPH cluster with a LSI 9300-8i HBA and all SSD(Intel, Micron) OSDs? The cluster consists of 3MDS nodes(2 active + 1 standby-replay), 3MON nodes(each running MGR+MON daemon) and 4OSD nodes(each having 8 SSD bluestore OSD disks). All nodes are running ubuntu latest 18.04 with kernel version 5.4(2 OSD Server still have 4.15 - but spike are seen on all of the OSD servers). As the spikes seem to be randomly distributed across time(under load) and OSD, we followed the spikes to find following messages on the OSD nodes: ``` bluestore(/var/lib/ceph/osd/ceph-31) log_latency slow operation observed for kv_sync, latency = 5.22298s bluestore(/var/lib/ceph/osd/ceph-31) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.5732s, txc = 0x55b1a98d9e00 ... bluestore(/var/lib/ceph/osd/ceph-31) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.50842s, txc = 0x55b1aa197800 bluestore(/var/lib/ceph/osd/ceph-31) log_latency_fn slow operation observed for _txc_committed_kv, latency = 5.5058s, txc = 0x55b1b7e75c00 ``` We found timely correlated kernel messages, which suggest that it might has something to do with the underlying SSDs ``` kernel: [3613612.312027] sd 4:0:10:0: attempting task abort!scmd(0x00000000dac86408), outstanding for 31384 ms & timeout 30000 ms kernel: [3613612.312034] sd 4:0:10:0: [sdg] tag#744 CDB: Write(10) 2a 00 be 11 b8 80 00 00 08 00 kernel: [3613612.312036] scsi target4:0:10: handle(0x0013), sas_address(0x4433221104000000), phy(4) kernel: [3613612.312038] scsi target4:0:10: enclosure logical id(0x500605b00e70a7b0), slot(7) kernel: [3613612.312039] scsi target4:0:10: enclosure level(0x0000), connector name( ) kernel: [3613612.312040] sd 4:0:10:0: No reference found at driver, assuming scmd(0x00000000dac86408) might have completed kernel: [3613612.312042] sd 4:0:10:0: task abort: SUCCESS scmd(0x00000000dac86408) ``` There are lots of the above blocks until the Kernel apparently has enough of them and just resets the device/interface: ``` kernel: [3613612.312267] sd 4:0:10:0: attempting task abort!scmd(0x00000000d7aaff5a), outstanding for 31388 ms & timeout 30000 ms kernel: [3613612.312269] sd 4:0:10:0: [sdg] tag#520 CDB: Write(10) 2a 00 be 11 b4 e0 00 00 08 00 kernel: [3613612.312269] scsi target4:0:10: handle(0x0013), sas_address(0x4433221104000000), phy(4) kernel: [3613612.312270] scsi target4:0:10: enclosure logical id(0x500605b00e70a7b0), slot(7) kernel: [3613612.312271] scsi target4:0:10: enclosure level(0x0000), connector name( ) kernel: [3613612.312272] sd 4:0:10:0: No reference found at driver, assuming scmd(0x00000000d7aaff5a) might have completed kernel: [3613612.312273] sd 4:0:10:0: task abort: SUCCESS scmd(0x00000000d7aaff5a) kernel: [3613612.653004] sd 4:0:10:0: Power-on or device reset occurred kernel: [3613613.254064] sd 4:0:10:0: Power-on or device reset occurred ``` OSD nodes are equipped with an "LSI 9300-8i SAS HBA" also we use two types of SSDs "Intel SSD D3-S4510 Series 1,92 TB", "Micron 5210 ION 1.92TB SSD". Since these resets happen on both SSDs we figured the least common denominator is the HBA so we did an Upgrade to the latest FW/BIOS on one OSD node. Sadly this did not solve the issue. * The question is now has someone a similar hardware configuration and issues with it ? * Do you have an idea what could be the cause of this behaviour ? * Or which part to investigate further ? Thanks for your hints and time reading :) M

3 years, 3 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users January 2021