June 2023 - ceph-users - lists.ceph.io

by Angelo Höngens

Hey, Just to confirm my understanding: If I set up a 3-osd cluster really fast with an EC42 pool, and I set the crush map to osd failover domain, the data will be distributed among the osd's, and of course there won't be protection against host failure. And yes, I know that's a bad idea, but I need the extra storage really fast, and it's a backup of other data. So availability is important, but now critical. If I then add 5 more hosts a week later, I can just edit the crush map and change the failover domain from osd to host, put the crush map back in, and ceph should automatically distribute all the pg's over the osd's again to be fully host-fault tolerant, right? Am I understanding this correctly? Angelo.

11 months

2
1
0 0

Removing the encryption: (essentially decrypt) encrypted RGW objects

by Jayanth Reddy

Hello Users, We've a big cluster (Quincy) with almost 1.7 billion RGW objects, and we've enabled SSE on as per https://docs.ceph.com/en/quincy/radosgw/encryption/#automatic-encryption-fo… (yes, we've chosen this insecure method to store the key) We're now in the process of implementing RGW multisite, but stuck due to https://tracker.ceph.com/issues/46062 and list at https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/PQW66JJ5DCR… Was wondering if there is a way to decrypt the objects in-place with the applied symmetric key. I tried to remove the rgw_crypt_default_encryption_key from the mon configuration database (on a test cluster), but as expected, RGW daemons throw 500 server errors as it can not work on encrypted objects. There is a PR being worked on about introducing the command option at https://github.com/ceph/ceph/pull/51842 but it appears it takes some time to be merged. Cheers, Jayanth Reddy

11 months

2
1
0 0

How does a "ceph orch restart SERVICE" affect availability?

by Mikael Öhman

The documentation very briefly explains a few core commands for restarting things; https://docs.ceph.com/en/quincy/cephadm/operations/#starting-and-stopping-d… but I feel I'm lacking quite some details of what is safe to do. I have a system in production, clusters connected via CephFS and some shared block devices. We would like to restart some things due to some new network configurations. Going daemon by daemon would take forever, so I'm curious as to what happens if one tries the command; ceph orch restart osd Will that try to be smart and just restart a few at a time to keep things up and available. Or will it just trigger a restart everywhere simultaneously. I guess in my current scenario, restarting one host at the time makes most sense, with a systemctl restart ceph-{fsid}.target and then checking that "ceph -s" says OK before proceeding to the next host, but I'm still curious as to what the "ceph orch restart xxx" command would do (but not enough to try it out in production) Best regards, Mikael Chalmers University of Technology

11 months

2
2
0 0

Ceph Pacific bluefs enospc bug with newly created OSDs

by Carsten Grommel

Hi all, we are experiencing the “bluefs enospc bug” again after redeploying all OSDs of our Pacific Cluster. I know that our cluster is a bit too utilized at the moment with 87.26 % raw usage but still this should not happen afaik. We never hat this problem with previous ceph versions and right now I am kind of out of ideas at how to tackle these crashes. Compacting the database did not help in the past either. Redeploy seems to no help in the long run as well. For documentation I used these commands to redeploy the osds: systemctl stop ceph-osd@${OSDNUM} ceph osd destroy --yes-i-really-mean-it ${OSDNUM} blkdiscard ${DEVICE} sgdisk -Z ${DEVICE} dmsetup remove ${DMDEVICE} ceph-volume lvm create --osd-id ${OSDNUM} --data ${DEVICE} Any ideas or possible solutions on this? I am not yet ready to upgrade our clusters to quincy, also I do presume that this bug is still present in quincy as well? Follow our cluster information: Crash Info: ceph crash info 2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25 { "assert_condition": "abort", "assert_file": "/build/ceph/src/os/bluestore/BlueFS.cc", "assert_func": "int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)", "assert_line": 2810, "assert_msg": "/build/ceph/src/os/bluestore/BlueFS.cc: In function 'int BlueFS::_flush_range(BlueFS::FileWriter*, uint64_t, uint64_t)' thread 7fd561810100 time 2023-06-19T23:23:51.261617+0200\n/build/ceph/src/os/bluestore/BlueFS.cc: 2810: ceph_abort_msg(\"bluefs enospc\")\n", "assert_thread_name": "ceph-osd", "backtrace": [ "/lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fd56225f730]", "gsignal()", "abort()", "(ceph::__ceph_abort(char const*, int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x1a7) [0x557bb3c65762]", "(BlueFS::_flush_range(BlueFS::FileWriter*, unsigned long, unsigned long)+0x1175) [0x557bb42e7945]", "(BlueFS::_flush(BlueFS::FileWriter*, bool, bool*)+0xa1) [0x557bb42e7ad1]", "(BlueFS::_flush(BlueFS::FileWriter*, bool, std::unique_lock<std::mutex>&)+0x2e) [0x557bb42f803e]", "(BlueRocksWritableFile::Append(rocksdb::Slice const&)+0x11b) [0x557bb431134b]", "(rocksdb::LegacyWritableFileWrapper::Append(rocksdb::Slice const&, rocksdb::IOOptions const&, rocksdb::IODebugContext*)+0x44) [0x557bb478e602]", "(rocksdb::WritableFileWriter::WriteBuffered(char const*, unsigned long)+0x333) [0x557bb4956feb]", "(rocksdb::WritableFileWriter::Append(rocksdb::Slice const&)+0x5d1) [0x557bb4955569]", "(rocksdb::BlockBasedTableBuilder::WriteRawBlock(rocksdb::Slice const&, rocksdb::CompressionType, rocksdb::BlockHandle*, bool)+0x11d) [0x557bb4b142e1]", "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::Slice const&, rocksdb::BlockHandle*, bool)+0x7d6) [0x557bb4b140ca]", "(rocksdb::BlockBasedTableBuilder::WriteBlock(rocksdb::BlockBuilder*, rocksdb::BlockHandle*, bool)+0x48) [0x557bb4b138e0]", "(rocksdb::BlockBasedTableBuilder::Flush()+0x9a) [0x557bb4b13890]", "(rocksdb::BlockBasedTableBuilder::Add(rocksdb::Slice const&, rocksdb::Slice const&)+0x192) [0x557bb4b133c8]", "(rocksdb::BuildTable(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::Env*, rocksdb::FileSystem*, rocksdb::ImmutableCFOptions const&, rocksdb::MutableCFOptions const&, rocksdb::FileOptions const&, rocksdb::TableCache*, rocksdb::InternalIteratorBase<rocksdb::Slice>*, std::vector<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> >, std::allocator<std::unique_ptr<rocksdb::FragmentedRangeTombstoneIterator, std::default_delete<rocksdb::FragmentedRangeTombstoneIterator> > > >, rocksdb::FileMetaData*, rocksdb::InternalKeyComparator const&, std::vector<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> >, std::allocator<std::unique_ptr<rocksdb::IntTblPropCollectorFactory, std::default_delete<rocksdb::IntTblPropCollectorFactory> > > > const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<unsigned long, std::allocator<unsigned long> >, unsigned long, rocksdb::SnapshotChecker*, rocksdb::CompressionType, unsigned long, rocksdb::CompressionOptions const&, bool, rocksdb::InternalStats*, rocksdb::TableFileCreationReason, rocksdb::EventLogger*, int, rocksdb::Env::IOPriority, rocksdb::TableProperties*, int, unsigned long, unsigned long, rocksdb::Env::WriteLifeTimeHint, unsigned long)+0x773) [0x557bb4a9aa7d]", "(rocksdb::DBImpl::WriteLevel0TableForRecovery(int, rocksdb::ColumnFamilyData*, rocksdb::MemTable*, rocksdb::VersionEdit*)+0x5de) [0x557bb4824676]", "(rocksdb::DBImpl::RecoverLogFiles(std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long*, bool, bool*)+0x1aa0) [0x557bb48232d0]", "(rocksdb::DBImpl::Recover(std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, bool, bool, bool, unsigned long*)+0x158a) [0x557bb4820846]", "(rocksdb::DBImpl::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**, bool, bool)+0x679) [0x557bb4825b25]", "(rocksdb::DB::Open(rocksdb::DBOptions const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const&, std::vector<rocksdb::ColumnFamilyHandle*, std::allocator<rocksdb::ColumnFamilyHandle*> >*, rocksdb::DB**)+0x52) [0x557bb4824efa]", "(RocksDBStore::do_open(std::ostream&, bool, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0xdaf) [0x557bb473b85f]", "(BlueStore::_open_db(bool, bool, bool)+0x44b) [0x557bb41ec20b]", "(BlueStore::_open_db_and_around(bool, bool)+0x2ef) [0x557bb425288f]", "(BlueStore::_mount()+0x9c) [0x557bb42551ec]", "(OSD::init()+0x38a) [0x557bb3d568da]", "main()", "__libc_start_main()", "_start()" ], "ceph_version": "16.2.11", "crash_id": "2023-06-19T21:23:51.285180Z_ac4105d7-cb09-45c8-a6e3-8a6bb6727b25", "entity_name": "osd.39", "os_id": "10", "os_name": "Debian GNU/Linux 10 (buster)", "os_version": "10 (buster)", "os_version_id": "10", "process_name": "ceph-osd", "stack_sig": "23f90145bebe39074210d4a79260e8977aec6b1c4d963740d1a04c3ddd4756a4", "timestamp": "2023-06-19T21:23:51.285180Z", "utsname_hostname": "cloud5-1567", "utsname_machine": "x86_64", "utsname_release": "5.10.144+1-ph", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Mon Sep 26 07:02:56 UTC 2022" } Utilization: ceph df --- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED ssd 168 TiB 21 TiB 146 TiB 146 TiB 87.26 TOTAL 168 TiB 21 TiB 146 TiB 146 TiB 87.26 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 4.7 MiB 48 14 MiB 0 2.1 TiB cephstor5 2 2048 52 TiB 14.27M 146 TiB 95.89 2.1 TiB cephfs_cephstor5_data 3 32 95 MiB 118.52k 1.4 GiB 0.02 2.1 TiB cephfs_cephstor5_metadata 4 16 352 MiB 166 1.0 GiB 0.02 2.1 TiB Versions: ceph versions { "mon": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 }, "mgr": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 }, "osd": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 48 }, "mds": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 3 }, "overall": { "ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific (stable)": 57 } } Kind regards Carsten Grommel ------------------------------- Profihost GmbH Expo Plaza 1 30539 Hannover Deutschland Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282 URL: http://www.profihost.com | E-Mail: info(a)profihost.com<mailto:info@profihost.com> Sitz der Gesellschaft: Hannover, USt-IdNr. DE249338561 Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 222926 Geschäftsführer: Marc Zocher, Dr. Claus Boyens, Daniel Hagemeier

11 months

3
5
0 0

How to repair pg in failed_repair state?

by 이 강우

A lot of pg in inconsistent state occurred. Most of them were repaired with ceph pg repair all, but in the case of 3 pg as shown below, it does not proceed further with failed_repair status. [root@cephvm1 ~]# ceph health detail HEALTH_ERR 30 scrub errors; Too many repaired reads on 7 OSDs; Possible data damage: 3 pgs inconsistent OSD_SCRUB_ERRORS 30 scrub errors OSD_TOO_MANY_REPAIRS Too many repaired reads on 7 OSDs osd.29 had 315 reads repaired osd.23 had 530 reads repaired osd.18 had 69 reads repaired osd.2 had 267 reads repaired osd.0 had 179 reads repaired osd.12 had 513 reads repaired osd.13 had 404 reads repaired PG_DAMAGED Possible data damage: 3 pgs inconsistent pg 2.2f is active+clean+inconsistent+failed_repair, acting [29,13,18] pg 2.46 is active+clean+inconsistent+failed_repair, acting [12,0,29] pg 2.5c is active+clean+inconsistent+failed_repair, acting [12,23,0] The query result of pg 2.2f is as follows, and the problem seems to be that the three peer versions are different. [root@cephvm1 ~]# ceph pg 2.2f query { "state": "active+clean+inconsistent+failed_repair", "snap_trimq": "[]", "snap_trimq_len": 0, "epoch": 426, "up": [ 29, 13, 18 ], "acting": [ 29, 13, 18 ], "acting_recovery_backfill": [ "13", "18", "29" ], "info": { "pgid": "2.2f", "last_update": "426'128436680", "last_complete": "426'128436680", "log_tail": "390'128433627", "last_user_version": 128436529, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 111, "epoch_pool_created": 67, "last_epoch_started": 426, "last_interval_started": 425, "last_epoch_clean": 426, "last_interval_clean": 425, "last_epoch_split": 111, "last_epoch_marked_full": 0, "same_up_since": 425, "same_interval_since": 425, "same_primary_since": 425, "last_scrub": "426'128436680", "last_scrub_stamp": "2023-06-21 15:57:53.645395", "last_deep_scrub": "426'128436680", "last_deep_scrub_stamp": "2023-06-21 15:57:53.645395", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557" }, "stats": { "version": "426'128436680", "reported_seq": "128628939", "reported_epoch": "426", "state": "active+clean+inconsistent+failed_repair", "last_fresh": "2023-06-21 15:57:53.645450", "last_change": "2023-06-21 15:57:53.645450", "last_active": "2023-06-21 15:57:53.645450", "last_peered": "2023-06-21 15:57:53.645450", "last_clean": "2023-06-21 15:57:53.645450", "last_became_active": "2023-06-21 14:03:02.233710", "last_became_peered": "2023-06-21 14:03:02.233710", "last_unstale": "2023-06-21 15:57:53.645450", "last_undegraded": "2023-06-21 15:57:53.645450", "last_fullsized": "2023-06-21 15:57:53.645450", "mapping_epoch": 425, "log_start": "390'128433627", "ondisk_log_start": "390'128433627", "created": 111, "last_epoch_clean": 426, "parent": "0.0", "parent_split_bits": 7, "last_scrub": "426'128436680", "last_scrub_stamp": "2023-06-21 15:57:53.645395", "last_deep_scrub": "426'128436680", "last_deep_scrub_stamp": "2023-06-21 15:57:53.645395", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557", "log_size": 3053, "ondisk_log_size": 3053, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 10888387166, "num_objects": 2610, "num_object_clones": 0, "num_object_copies": 7830, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 2610, "num_whiteouts": 0, "num_read": 191976, "num_read_kb": 10314827, "num_write": 128429383, "num_write_kb": 741542291, "num_scrub_errors": 3, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 3, "num_objects_recovered": 28, "num_bytes_recovered": 113242624, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 28 }, "up": [ 29, 13, 18 ], "acting": [ 29, 13, 18 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [], "up_primary": 29, "acting_primary": 29, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 426, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, "peer_info": [ { "peer": "13", "pgid": "2.2f", "last_update": "426'128436680", "last_complete": "426'128436680", "log_tail": "390'128433627", "last_user_version": 128436529, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 111, "epoch_pool_created": 67, "last_epoch_started": 426, "last_interval_started": 425, "last_epoch_clean": 426, "last_interval_clean": 425, "last_epoch_split": 111, "last_epoch_marked_full": 0, "same_up_since": 425, "same_interval_since": 425, "same_primary_since": 425, "last_scrub": "426'128436680", "last_scrub_stamp": "2023-06-21 15:57:53.645395", "last_deep_scrub": "426'128436680", "last_deep_scrub_stamp": "2023-06-21 15:57:53.645395", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557" }, "stats": { "version": "406'128436652", "reported_seq": "128628750", "reported_epoch": "424", "state": "peering", "last_fresh": "2023-06-21 14:03:00.219516", "last_change": "2023-06-21 14:03:00.219516", "last_active": "2023-06-12 09:43:51.161310", "last_peered": "2023-04-17 12:38:42.363058", "last_clean": "2023-04-17 12:38:42.363058", "last_became_active": "2023-04-17 10:54:25.756138", "last_became_peered": "2023-04-17 10:54:25.756138", "last_unstale": "2023-06-21 14:03:00.219516", "last_undegraded": "2023-06-21 14:03:00.219516", "last_fullsized": "2023-06-21 14:03:00.219516", "mapping_epoch": 425, "log_start": "390'128433627", "ondisk_log_start": "390'128433627", "created": 111, "last_epoch_clean": 418, "parent": "0.0", "parent_split_bits": 7, "last_scrub": "406'128436652", "last_scrub_stamp": "2023-06-21 08:35:59.025077", "last_deep_scrub": "406'128436652", "last_deep_scrub_stamp": "2023-06-21 08:35:59.025077", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557", "log_size": 3025, "ondisk_log_size": 3025, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 10888387166, "num_objects": 2610, "num_object_clones": 0, "num_object_copies": 7830, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 2610, "num_whiteouts": 0, "num_read": 191976, "num_read_kb": 10314827, "num_write": 128429383, "num_write_kb": 741542291, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 0 }, "up": [ 29, 13, 18 ], "acting": [ 29, 13, 18 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [], "up_primary": 29, "acting_primary": 29, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 426, "hit_set_history": { "current_last_update": "0'0", "history": [] } }, { "peer": "18", "pgid": "2.2f", "last_update": "426'128436680", "last_complete": "426'128436680", "log_tail": "390'128433627", "last_user_version": 128436529, "last_backfill": "MAX", "last_backfill_bitwise": 0, "purged_snaps": [], "history": { "epoch_created": 111, "epoch_pool_created": 67, "last_epoch_started": 426, "last_interval_started": 425, "last_epoch_clean": 426, "last_interval_clean": 425, "last_epoch_split": 111, "last_epoch_marked_full": 0, "same_up_since": 425, "same_interval_since": 425, "same_primary_since": 425, "last_scrub": "426'128436680", "last_scrub_stamp": "2023-06-21 15:57:53.645395", "last_deep_scrub": "426'128436680", "last_deep_scrub_stamp": "2023-06-21 15:57:53.645395", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557" }, "stats": { "version": "406'128436651", "reported_seq": "128628747", "reported_epoch": "406", "state": "active+clean+scrubbing+deep", "last_fresh": "2023-04-17 12:38:42.363058", "last_change": "2023-04-17 12:35:34.783904", "last_active": "2023-04-17 12:38:42.363058", "last_peered": "2023-04-17 12:38:42.363058", "last_clean": "2023-04-17 12:38:42.363058", "last_became_active": "2023-04-17 10:54:25.756138", "last_became_peered": "2023-04-17 10:54:25.756138", "last_unstale": "2023-04-17 12:38:42.363058", "last_undegraded": "2023-04-17 12:38:42.363058", "last_fullsized": "2023-04-17 12:38:42.363058", "mapping_epoch": 425, "log_start": "390'128433627", "ondisk_log_start": "390'128433627", "created": 111, "last_epoch_clean": 406, "parent": "0.0", "parent_split_bits": 7, "last_scrub": "390'128260640", "last_scrub_stamp": "2023-03-28 09:11:29.298557", "last_deep_scrub": "390'125680773", "last_deep_scrub_stamp": "2023-03-24 13:37:20.633751", "last_clean_scrub_stamp": "2023-03-28 09:11:29.298557", "log_size": 3024, "ondisk_log_size": 3024, "stats_invalid": false, "dirty_stats_invalid": false, "omap_stats_invalid": false, "hitset_stats_invalid": false, "hitset_bytes_stats_invalid": false, "pin_stats_invalid": false, "manifest_stats_invalid": false, "snaptrimq_len": 0, "stat_sum": { "num_bytes": 10888387166, "num_objects": 2610, "num_object_clones": 0, "num_object_copies": 7830, "num_objects_missing_on_primary": 0, "num_objects_missing": 0, "num_objects_degraded": 0, "num_objects_misplaced": 0, "num_objects_unfound": 0, "num_objects_dirty": 2610, "num_whiteouts": 0, "num_read": 191976, "num_read_kb": 10314827, "num_write": 128429383, "num_write_kb": 741542291, "num_scrub_errors": 0, "num_shallow_scrub_errors": 0, "num_deep_scrub_errors": 0, "num_objects_recovered": 0, "num_bytes_recovered": 0, "num_keys_recovered": 0, "num_objects_omap": 0, "num_objects_hit_set_archive": 0, "num_bytes_hit_set_archive": 0, "num_flush": 0, "num_flush_kb": 0, "num_evict": 0, "num_evict_kb": 0, "num_promote": 0, "num_flush_mode_high": 0, "num_flush_mode_low": 0, "num_evict_mode_some": 0, "num_evict_mode_full": 0, "num_objects_pinned": 0, "num_legacy_snapsets": 0, "num_large_omap_objects": 0, "num_objects_manifest": 0, "num_omap_bytes": 0, "num_omap_keys": 0, "num_objects_repaired": 0 }, "up": [ 29, 13, 18 ], "acting": [ 29, 13, 18 ], "avail_no_missing": [], "object_location_counts": [], "blocked_by": [], "up_primary": 29, "acting_primary": 29, "purged_snaps": [] }, "empty": 0, "dne": 0, "incomplete": 0, "last_epoch_started": 426, "hit_set_history": { "current_last_update": "0'0", "history": [] } } ], "recovery_state": [ { "name": "Started/Primary/Active", "enter_time": "2023-06-21 14:03:02.221320", "might_have_unfound": [], "recovery_progress": { "backfill_targets": [], "waiting_on_backfill": [], "last_backfill_started": "MIN", "backfill_info": { "begin": "MIN", "end": "MIN", "objects": [] }, "peer_backfill_info": [], "backfills_in_flight": [], "recovering": [], "pg_backend": { "pull_from_peer": [], "pushing": [] } }, "scrub": { "scrubber.epoch_start": "425", "scrubber.active": false, "scrubber.state": "INACTIVE", "scrubber.start": "MIN", "scrubber.end": "MIN", "scrubber.max_end": "MIN", "scrubber.subset_last_update": "0'0", "scrubber.deep": false, "scrubber.waiting_on_whom": [] } }, { "name": "Started", "enter_time": "2023-06-21 14:03:01.225451" } ], "agent_state": {} } ------------- "last_user_version": 128436529, "version": "426'128436680", "last_user_version": 128436529, "version": "406'128436652", "last_user_version": 128436529, "version": "406'128436651", Can I repari to a specific version?

11 months

1
0
0 0

RGW: Migrating a long-lived cluster to multi-site, fixing an EC pool mistake

by Christian Theune

Hi, we are running a cluster that has been alive for a long time and we tread carefully regarding updates. We are still a bit lagging and our cluster (that started around Firefly) is currently at Nautilus. We’re updating and we know we’re still behind, but we do keep running into challenges along the way that typically are still unfixed on main and - as I started with - have to tread carefully. Nevertheless, mistakes happen, and we found ourselves in this situation: we converted our RGW data pool from replicated (n=3) to erasure coded (k=10, m=3, with 17 hosts) but when doing the EC profile selection we missed that our hosts are not evenly balanced (this is a growing cluster and some machines have around 20TiB capacity for the RGW data pool, wheres newer machines have around 160TiB and we rather should have gone with k=4, m=3. In any case, having 13 chunks causes too many hosts to participate in each object. Going for k+m=7 will allow distribution to be more effective as we have 7 hosts that have the 160TiB sizing. Our original migration used the “cache tiering” approach, but that only works once when moving from replicated to EC and can not be used for further migrations. The amount of data is at 215TiB somewhat significant, so using an approach that scales when copying data[1] to avoid ending up with months of migration. I’ve run out of ideas doing this on a low-level (i.e. trying to fix it on a rados/pool level) and I guess we can only fix this on an application level using multi-zone replication. I have the setup nailed in general, but I’m running into issues with buckets in our staging and production environment that have `explicit_placement` pools attached, AFAICT is this an outdated mechanisms but there are no migration tools around. I’ve seen some people talk about patched versions of the `radosgw-admin metadata put` variant that (still) prohibits removing explicit placements. AFAICT those explicit placements will be synced to the secondary zone and the effect that I’m seeing underpins that theory: the sync runs for a while and only a few hundred objects show up in the new zone, as the buckets/objects are already found in the old pool that the new zone uses due to the explicit placement rule. I’m currently running out of ideas, but open for any other options. Looking at https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/ULKK5RU2VXL… I’m wondering whether the relevant patch is available somewhere, or whether I’ll have to try building that patch again on my own. Going through the docs and the code I’m actually wondering whether `explicit_placement` is actually a really crufty residual piece that won’t get used in newer clusters but older clusters don’t really have an option to get away from? In my specific case, the placement rules are identical to the explicit placements that are stored on (apparently older) buckets and the only thing I need to do is to remove them. I can accept a bit of downtime to avoid any race conditions if needed, so maybe having a small tool to just remove those entries while all RGWs are down would be fine. A call to `radosgw-admin bucket stat` takes about 18s for all buckets in production and I guess that would be a good comparison for what timing to expect when running an update on the metadata. I’ll also be in touch with colleagues from Heinlein and 42on but I’m open to other suggestions. Hugs, Christian [1] We currently have 215TiB data in 230M objects. Using the “official” “cache-flush-evict-all” approach was unfeasible here as it only yielded around 50MiB/s. Using cache limits and targetting the cache sizes to 0 caused proper parallelization and was able to flush/evict at almost constant 1GiB/s in the cluster. -- Christian Theune · ct(a)flyingcircus.io · +49 345 219401 0 Flying Circus Internet Operations GmbH · https://flyingcircus.io Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick

11 months

2
6
0 0

Recover OSDs from folder /var/lib/ceph/uuid/removed

by Malte Stroem

Hello, is it possible to recover an OSD if it was removed? The systemd service was removed but the block device is still listed under lsblk and the config files are still available under /var/lib/ceph/uuid/removed It is a containerized cluster. So I think we need to add the cephx entries, use ceph-volume, crush, and so on. Best regards, Malte

11 months

1
2
0 0

[question] Put with "tagging" is slowly?

by Louis Koo

2023-06-21T02:48:50.754+0000 7f1cd5b84700 1 beast: 0x7f1c4b26e630: 10.x.x.83 - xx [21/Jun/2023:02:48:47.653 +0000] "PUT /zhucan/deb/content/vol-26/chap-41/3a917ec7-02b3-4b45-8c0c-be32f4914708.bytes?tagging HTTP/1.1" 200 0 - "aws-sdk-java/1.12.299 Linux/3.10.0-1127.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 vendor/Red_Hat,_Inc. cfg/retry-mode/legacy" - latency=3.101041317s 2023-06-21T02:48:53.789+0000 7f1cf53c3700 1 beast: 0x7f1c4b26e630: 10.x.x.83 - xx [21/Jun/2023:02:48:50.758 +0000] "PUT /zhucan/deb/content/vol-26/chap-41/3a917ec7-02b3-4b45-8c0c-be32f4914708.properties?tagging HTTP/1.1" 200 0 - "aws-sdk-java/1.12.299 Linux/3.10.0-1127.el7.x86_64 OpenJDK_64-Bit_Server_VM/25.372-b07 java/1.8.0_372 vendor/Red_Hat,_Inc. cfg/retry-mode/legacy" - latency=3.031040430s cost more 3s, why?

11 months

1
0
0 0

Error while adding host : Error EINVAL: Traceback (most recent call last): File /usr/share/ceph/mgr/mgr_module.py, line 1756, in _handle_command

by Adiga, Anantha

Hi, I am seeing this error after an offline was deleted and while adding the host again. Thereafter, I have removed the /var/lib/cep folder and removed the ceph quincy image in the offline host. What is the cause of this issue and the solution. root@fl31ca104ja0201:/home/general# cephadm shell Inferring fsid d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e Using recent ceph image quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e<mailto:quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e> root@fl31ca104ja0201:/# root@fl31ca104ja0201:/# ceph orch host rm fl31ca104ja0302 --offline --force Removed offline host 'fl31ca104ja0302' root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0201,fl31ca104ja0202,fl31ca104ja0203 (age 28h) mgr: fl31ca104ja0203(active, since 6d), standbys: fl31ca104ja0202, fl31ca104ja0201 mds: 1/1 daemons up, 2 standby osd: 33 osds: 33 up (since 28h), 33 in (since 28h) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 24 pools, 737 pgs objects: 613.56k objects, 1.9 TiB usage: 2.9 TiB used, 228 TiB / 231 TiB avail pgs: 737 active+clean io: client: 161 MiB/s rd, 75 op/s rd, 0 op/s wr root@fl31ca104ja0201:/# ceph orch host add fl31ca104ja0302 10.45.219.5 Error EINVAL: Traceback (most recent call last): File "/usr/share/ceph/mgr/mgr_module.py", line 1756, in _handle_command return self.handle_command(inbuf, cmd) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 171, in handle_command return dispatch[cmd['prefix']].call(self, cmd, inbuf) File "/usr/share/ceph/mgr/mgr_module.py", line 462, in call return self.func(mgr, **kwargs) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 107, in <lambda> wrapper_copy = lambda *l_args, **l_kwargs: wrapper(*l_args, **l_kwargs) # noqa: E731 File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 96, in wrapper return func(*args, **kwargs) File "/usr/share/ceph/mgr/orchestrator/module.py", line 356, in _add_host return self._apply_misc([s], False, Format.plain) File "/usr/share/ceph/mgr/orchestrator/module.py", line 1092, in _apply_misc raise_if_exception(completion) File "/usr/share/ceph/mgr/orchestrator/_interface.py", line 225, in raise_if_exception e = pickle.loads(c.serialized_exception) TypeError: __init__() missing 2 required positional arguments: 'hostname' and 'addr' Thank you, Anantha

11 months

2
2
0 0

[rgw multisite] Perpetual behind

by Yixin Jin

Hi ceph gurus, I am experimenting with rgw multisite sync feature using Quincy release (17.2.5). I am using the zone-level sync, not bucket-level sync policy. During my experiment, somehow my setup got into a situation that it doesn't seem to get out of. One zone is perpetually behind the other, although there is no ongoing client request. Here is the output of my "sync status": root@mon1-z1:~# radosgw-admin sync status realm f90e4356-3aa7-46eb-a6b7-117dfa4607c4 (test-realm) zonegroup a5f23c9c-0640-41f2-956f-a8523eccecb3 (zg) zone bbe3e2a1-bdba-4977-affb-80596a6fe2b9 (z1) metadata sync no sync (zone is master) data sync source: 9645a68b-012e-4889-bf24-096e7478f786 (z2) syncing full sync: 0/128 shards incremental sync: 128/128 shards data is behind on 14 shards behind shards: [56,61,63,107,108,109,110,111,112,113,114,115,116,117] It stays behind forever while rgw is almost completely idle (1% of CPU). Any suggestion on how to drill deeper to see what happened? Thanks, Yixin

11 months

4
4
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2023