- ceph-users - lists.ceph.io

Re: [ceph-users] OSD's keep crasching after clusterreboot

by Ansgar Jazdzewski

another update, we now took the more destructive route and removed the cephfs pools (lucky we had only test date in the filesystem) Our hope was that within the startup-process the osd will delete the no longer needed PG, But this is NOT the Case. So we are still have the same issue the only difference is that the PG does not belong to a pool anymore. -360> 2019-08-07 14:52:32.655 7fb14db8de00 5 osd.44 pg_epoch: 196586 pg[23.f8s0(unlocked)] enter Initial -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1 /build/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 7fb14db8de00 time 2019-08-07 14:52:32.660169 /build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) we now can take one rout and try to delete the pg by hand in the OSD (bluestore) how this can be done? OR we try to upgrade to Nautilus and hope for the beset. any help hints are welcome, have a nice one Ansgar Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski <a.jazdzewski(a)googlemail.com>: > > Hi, > > as a follow-up: > * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6 > * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn > * ceph osd dump and ceph osd erasure-code-profile get cephfs > https://pastebin.com/TRLPaWcH > > as we try to dig more into it, it looks like a bug in the cephfs or > erasure-coding part of ceph. > > Ansgar > > > Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski > <a.jazdzewski(a)googlemail.com>: > > > > hi folks, > > > > we had to move one of our clusters so we had to boot all servers, now > > we found an Error on all OSD with the EC-Pool. > > > > do we miss some opitons, will an upgrade to 13.2.6 help? > > > > > > Thanks, > > Ansgar > > > > 2019-08-06 12:10:16.265 7fb337b83200 -1 > > /build/ceph-13.2.4/src/osd/ECUtil.h: In function > > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread > > 7fb337b83200 time 2019-08-06 12:10:16.263025 > > /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % > > stripe_size == 0) > > > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic > > (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char > > const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3: > > (ECBackend::ECBackend(PGBackend::Listener, coll_t const&, > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > > CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned > > long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t > > const&, std::map<std::cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> >, > > std::cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > >, std > > ::allocator<std::pair<std::__cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > const, > > std::cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > > > > const&, PGBackend::Listener, coll_t, > > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > > CephContext)+0x2f9 ) [0x9474e9] 5: > > (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>, > > PGPool const&, std::map<std::cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> >, > > std::cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > > std::char_tra its<char>, std::allocator<char> > >, > > std::allocator<std::pair<std::__cxx11::basic_string<char, > > std::char_traits<char>, std::allocator<char> > const, > > std::cxx11::basic_string<char, std::char_traits<char>, > > std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6: > > (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3) > > [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8: > > (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10: > > (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29) > > [0x717389] NOTE: a copy of the executable, or objdump -rdS > > <executable> is needed to interpret this.

4 years, 8 months

1
0
0 0

Re: [ceph-users] bluestore write iops calculation

by vitalif＠yourcmc.ru

> I can add RAM ans is there a way to increase rocksdb caching , can I > increase bluestore_cache_size_hdd to higher value to cache rocksdb? In recent releases it's governed by the osd_memory_target parameter. In previous releases it's bluestore_cache_size_hdd. Check release notes to know for sure. > This we have planned to add some SSDs and how many OSD's rocks db we > can add per SSDs and i guess if one SSD is down then all related OSDs > has to be re-installed. Yes. At least you'd better not put all 24 block.db's on a single SSD :) 4-8 HDDs per an SSD is usually fine. Also check db_used_bytes in `ceph daemon osd.0 perf dump` (replace 0 with actual OSD numbers) to figure out how much space your DBs use. If it's below 30gb you're lucky because in that case DBs will fit on 30GB SSD partitions. https://yourcmc.ru/wiki/Ceph_performance#About_block.db_sizing -- Vitaliy Filippov

4 years, 8 months

1
0
0 0

Re: [ceph-users] New CRUSH device class questions

by Paul Emmerich

On Wed, Aug 7, 2019 at 9:30 AM Robert LeBlanc <robert(a)leblancnet.us> wrote: >> # ceph osd crush rule dump replicated_racks_nvme >> { >> "rule_id": 0, >> "rule_name": "replicated_racks_nvme", >> "ruleset": 0, >> "type": 1, >> "min_size": 1, >> "max_size": 10, >> "steps": [ >> { >> "op": "take", >> "item": -44, >> "item_name": "default~nvme" <------------ >> }, >> { >> "op": "chooseleaf_firstn", >> "num": 0, >> "type": "rack" >> }, >> { >> "op": "emit" >> } >> ] >> } >> ``` > > > Yes, our HDD cluster is much like this, but not Luminous, so we created as separate root with SSD OSD for the metadata and set up a CRUSH rule for the metadata pool to be mapped to SSD. I understand that the CRUSH rule should have a `step take default class ssd` which I don't see in your rule unless the `~` in the item_name means device class. ~ is the internal implementation of device classes. Internally it's still using separate roots, that's how it stays compatible with older clients that don't know about device classes. And since it wasn't mentioned here yet: consider upgrading to Nautilus to benefit from the new and improved accounting for metadata space. You'll be able to see how much space is used for metadata and quotas should work properly for metadata usage. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 > > Thanks > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 years, 8 months

1
0
0 0

out of memory bluestore osds

by Jaime Ibar

Hi all, we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks each. Recently we redeployed the osds of one of them using bluestore backend, however, after this, we're facing Out of memory errors(invoked oom-killer) and the OS kills one of the ceph-osd process. The osd is restarted automatically and back online after one minute. We're running Ubuntu 16.04, kernel 4.15.0-55-generic. The server has 32GB of RAM and 4GB of swap partition. All the disks are hdd, no ssd disks. Bluestore settings are the default ones "osd_memory_target": "4294967296" "osd_memory_cache_min": "134217728" "bluestore_cache_size": "0" "bluestore_cache_size_hdd": "1073741824" "bluestore_cache_autotune": "true" As stated in the documentation, bluestore assigns by default 4GB of RAM per osd(1GB of RAM for 1TB). So in this case 48GB of RAM would be needed. Am I right? Are these the minimun requirements for bluestore? In case adding more RAM is not an option, can any of osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd be decrease to fit in our server specs? Would this have any impact on performance? Thanks Jaime -- Jaime Ibar High Performance & Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | jaime(a)tchpc.tcd.ie Tel: +353-1-896-3725

4 years, 8 months

2
1
0 0

Re: [ceph-users] 14.2.2 - OSD Crash

by Igor Fedotov

Hi Manuel, as Brad pointed out timeouts and suicides are rather consequences of some other issues with OSDs. I recall at least two recent relevant tickets: https://tracker.ceph.com/issues/36482 https://tracker.ceph.com/issues/40741 (see last comments) Both had massive and slow reads from RocksDB which caused timeouts.. Visible symptom for both cases was unexpectedly high read I/O from underlying disks (main and/or DB). You can use iotop for inspection.., These were worsened by having significant part of DB at spinners due to spillovers. So wondering what's your layout in this respect: what drives back troublesome OSDs, is there any spillover to slow device, how massive it is? Also could you please inspect your OSD logs for the presence of lines containing "slow operation observed" substring. And share them if any.. Hope this helps. Thanks, Igor On 8/7/2019 2:16 AM, EDH - Manuel Rios Fernandez wrote: > > Hi > > We got a pair of OSD located in node that crash randomly since 14.2.2 > > OS Version : Centos 7.6 > > There’re a ton of lines before crash , I will unespected: > > -- > > 3045> 2019-08-07 00:39:32.013 7fe9a4996700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 > > -3044> 2019-08-07 00:39:32.013 7fe9a3994700 1 heartbeat_map > is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 > > -3043> 2019-08-07 00:39:32.033 7fe9a4195700 1 heartbeat_map > is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 > > -3042> 2019-08-07 00:39:32.033 7fe9a4996700 1 heartbeat_map > is_healthy 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15 > > -- > > ----- > > Some hundred lines of: > > -164> 2019-08-07 00:47:36.628 7fe9a3994700 1 heartbeat_map is_healthy > 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 > > -163> 2019-08-07 00:47:36.632 7fe9a3994700 1 heartbeat_map > is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 > > -162> 2019-08-07 00:47:36.632 7fe9a3994700 1 heartbeat_map > is_healthy 'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60 > > ----- > > -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick > > -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:50:21.756453) > > -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick > > -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:50:31.756604) > > -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick > > -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:50:41.756788) > > -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick > > -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:50:51.756982) > > -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick > > -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:01.757206) > > -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick > > -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:11.757364) > > -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick > > -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:21.757535) > > -64> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map > clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had timed out > after 15 > > -63> 2019-08-07 00:51:52.861 7fe987e49700 1 heartbeat_map > clear_timeout 'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed > out after 150 > > -62> 2019-08-07 00:51:52.948 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 > old cache_size: 1994018210 new cache size: 1992784572 > > -61> 2019-08-07 00:51:52.948 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 1992784572 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: > 763363328 meta_used: 654593191 data_alloc: 452984832 data_used: 455929856 > > -60> 2019-08-07 00:51:57.923 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 1994110827 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: > 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -59> 2019-08-07 00:51:57.973 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 1994110827 new cache size: 1994442069 > > -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick > > -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:31.757684) > > -56> 2019-08-07 00:52:02.933 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 1995765747 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: > 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -55> 2019-08-07 00:52:02.983 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 1995765747 new cache size: 1996096345 > > -54> 2019-08-07 00:52:07.943 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 1997417449 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: > 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -53> 2019-08-07 00:52:07.993 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 1997417449 new cache size: 1997747404 > > -52> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: tick > > -51> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:41.757855) > > -50> 2019-08-07 00:52:12.952 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 1999065941 kv_alloc: 763363328 kv_used: 749381098 meta_alloc: > 763363328 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -49> 2019-08-07 00:52:13.002 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 1999065941 new cache size: 1999395254 > > -48> 2019-08-07 00:52:17.962 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2000711226 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -47> 2019-08-07 00:52:18.012 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2000711226 new cache size: 2001039899 > > -46> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: tick > > -45> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:51:51.758043) > > -44> 2019-08-07 00:52:22.971 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2002353314 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -43> 2019-08-07 00:52:23.022 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2002353314 new cache size: 2002681348 > > -42> 2019-08-07 00:52:27.982 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2003992210 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -41> 2019-08-07 00:52:28.031 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2003992210 new cache size: 2004319607 > > -40> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: tick > > -39> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:01.758219) > > -38> 2019-08-07 00:52:32.991 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2005627920 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -37> 2019-08-07 00:52:33.041 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2005627920 new cache size: 2005954680 > > -36> 2019-08-07 00:52:38.001 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2007260450 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -35> 2019-08-07 00:52:38.051 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2007260450 new cache size: 2007586575 > > -34> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: tick > > -33> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:11.758447) > > -32> 2019-08-07 00:52:43.011 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2008889806 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -31> 2019-08-07 00:52:43.061 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2008889806 new cache size: 2009215297 > > -30> 2019-08-07 00:52:48.021 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2010515995 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -29> 2019-08-07 00:52:48.071 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2010515995 new cache size: 2010840853 > > -28> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: tick > > -27> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:21.758631) > > -26> 2019-08-07 00:52:53.031 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2012139023 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -25> 2019-08-07 00:52:53.081 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2012139023 new cache size: 2012463250 > > -24> 2019-08-07 00:52:58.042 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2013758896 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -23> 2019-08-07 00:52:58.092 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2013758896 new cache size: 2014082492 > > -22> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: tick > > -21> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:31.758799) > > -20> 2019-08-07 00:53:03.052 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2015375620 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -19> 2019-08-07 00:53:03.102 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2015375620 new cache size: 2015698587 > > -18> 2019-08-07 00:53:08.062 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2016989201 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -17> 2019-08-07 00:53:08.112 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2016989201 new cache size: 2017311541 > > -16> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: tick > > -15> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:41.759013) > > -14> 2019-08-07 00:53:13.071 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2018599645 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -13> 2019-08-07 00:53:13.121 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2018599645 new cache size: 2018921358 > > -12> 2019-08-07 00:53:18.081 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2020206960 kv_alloc: 771751936 kv_used: 749381098 meta_alloc: > 771751936 meta_used: 654590799 data_alloc: 452984832 data_used: 451538944 > > -11> 2019-08-07 00:53:18.130 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2020206960 new cache size: 2020528048 > > -10> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: tick > > -9> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:52:51.759214) > > -8> 2019-08-07 00:53:23.090 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2021811150 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: > 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 > > -7> 2019-08-07 00:53:23.140 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2021811150 new cache size: 2022131613 > > -6> 2019-08-07 00:53:28.100 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2023412220 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: > 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 > > -5> 2019-08-07 00:53:28.150 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2023412220 new cache size: 2023732060 > > -4> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: tick > > -3> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: > _check_auth_rotating have uptodate secrets (they expire after > 2019-08-07 00:53:01.759334) > > -2> 2019-08-07 00:53:33.110 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: > 2025010178 kv_alloc: 780140544 kv_used: 749381098 meta_alloc: > 780140544 meta_used: 654590799 data_alloc: 461373440 data_used: 451538944 > > -1> 2019-08-07 00:53:33.160 7fe99966c700 5 > bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: > 4294967296 heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 > old cache_size: 2025010178 new cache size: 2025329397 > > 0> 2019-08-07 00:53:37.655 7fe987e49700 -1 *** Caught signal > (Aborted) ** > > in thread 7fe987e49700 thread_name:tp_osd_tp > > ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) > nautilus (stable) > > 1: (()+0xf5d0) [0x7fe9a7cba5d0] > > 2: (pthread_kill()+0x31) [0x7fe9a7cb79d1] > > 3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char > const*, unsigned long)+0x466) [0x55fef8748176] > > 4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x7b) > [0x55fef874878b] > > 5: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > std::vector<ObjectStore::Transaction, > std::allocator<ObjectStore::Transaction> >&, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xa9e) > [0x55fef86085de] > > 6: > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x7f) [0x55fef81cd7ff] > > 7: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, > ThreadPool::TPHandle*)+0x58) [0x55fef8118298] > > 8: (OSD::dequeue_peering_evt(OSDShard*, PG*, > std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202) > [0x55fef81767c2] > > 9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, > ThreadPool::TPHandle&)+0x50) [0x55fef83eb490] > > 10: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x9f4) [0x55fef816aef4] > > 11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433) > [0x55fef8769ce3] > > 12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fef876cd80] > > 13: (()+0x7dd5) [0x7fe9a7cb2dd5] > > 14: (clone()+0x6d) [0x7fe9a6b7202d] > > About server load: > > [root@CEPH008 ~]# top > > top - 00:57:30 up 186 days, 22 min, 1 user, load average: 11.65, > 13.42, 13.51 > > Tasks: 316 total, 1 running, 315 sleeping, 0 stopped, 0 zombie > > %Cpu(s): 2.3 us, 1.2 sy, 0.0 ni, 74.1 id, 22.4 wa, 0.0 hi, 0.1 > si, 0.0 st > > KiB Mem : 65737480 total, 431824 free, 49046608 used, 16259048 > buff/cache > > KiB Swap: 29241340 total, 19406504 free, 9834836 used. 15917556 avail Mem > > Currently the server is doing some deep-scrub that we got off during > the last two weeks due a node evict and a new node install. > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 years, 8 months

2
2
0 0

Nautilus - Balancer is always on

by EDH - Manuel Rios Fernandez

Hi All, ceph mgr module disable balancer Error EINVAL: module 'balancer' cannot be disabled (always-on) Whats the way to restart balanacer? Restart MGR service? I wanna suggest to Balancer developer to setup a ceph-balancer.log for this module get more information about whats doing. Regards Manuel

4 years, 8 months

1
0
0 0

Re: [ceph-users] Error Mounting CephFS

by Frank Schilder

On Centos7, the option "secretfile" requires installation of ceph-fuse. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: ceph-users <ceph-users-bounces(a)lists.ceph.com> on behalf of Yan, Zheng <ukernel(a)gmail.com> Sent: 07 August 2019 10:10:19 To: DHilsbos(a)performair.com Cc: ceph-users Subject: Re: [ceph-users] Error Mounting CephFS On Wed, Aug 7, 2019 at 3:46 PM <DHilsbos(a)performair.com> wrote: > > All; > > I have a server running CentOS 7.6 (1810), that I want to set up with CephFS (full disclosure, I'm going to be running samba on the CephFS). I can mount the CephFS fine when I use the option secret=, but when I switch to secretfile=, I get an error "No such process." I installed ceph-common. > > Is there a service that I'm not aware I should be starting? > Do I need to install another package? > mount.ceph is missing. check if it exists and is located in $PATH > Thank you, > > Dominic L. Hilsbos, MBA > Director - Information Technology > Perform Air International Inc. > DHilsbos(a)PerformAir.com > www.PerformAir.com > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users(a)lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 years, 8 months

1
0
0 0

Re: [ceph-users] OSD's keep crasching after clusterreboot

by Ansgar Jazdzewski

Hi, as a follow-up: * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6 * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn * ceph osd dump and ceph osd erasure-code-profile get cephfs https://pastebin.com/TRLPaWcH as we try to dig more into it, it looks like a bug in the cephfs or erasure-coding part of ceph. Ansgar Am Di., 6. Aug. 2019 um 14:50 Uhr schrieb Ansgar Jazdzewski <a.jazdzewski(a)googlemail.com>: > > hi folks, > > we had to move one of our clusters so we had to boot all servers, now > we found an Error on all OSD with the EC-Pool. > > do we miss some opitons, will an upgrade to 13.2.6 help? > > > Thanks, > Ansgar > > 2019-08-06 12:10:16.265 7fb337b83200 -1 > /build/ceph-13.2.4/src/osd/ECUtil.h: In function > 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread > 7fb337b83200 time 2019-08-06 12:10:16.263025 > /build/ceph-13.2.4/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % > stripe_size == 0) > > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic > (stable) 1: (ceph::ceph_assert_fail(char const, char const, int, char > const)+0x102) [0x7fb32eeb83c2] 2: (()+0x2e5587) [0x7fb32eeb8587] 3: > (ECBackend::ECBackend(PGBackend::Listener, coll_t const&, > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > CephContext, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned > long)+0x4de) [0xa4cbbe] 4: (PGBackend::build_pg_backend(pg_pool_t > const&, std::map<std::cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> >, > std::cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > >, std > ::allocator<std::pair<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const, > std::cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > > > > const&, PGBackend::Listener, coll_t, > boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore, > CephContext)+0x2f9 ) [0x9474e9] 5: > (PrimaryLogPG::PrimaryLogPG(OSDService, std::shared_ptr<OSDMap const>, > PGPool const&, std::map<std::cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> >, > std::cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> >, std::less<std::cxx11::basic_string<char, > std::char_tra its<char>, std::allocator<char> > >, > std::allocator<std::pair<std::__cxx11::basic_string<char, > std::char_traits<char>, std::allocator<char> > const, > std::cxx11::basic_string<char, std::char_traits<char>, > std::allocator<char> > > > > const&, spg_t)+0x138) [0x8f96e8] 6: > (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x11d3) > [0x753553] 7: (OSD::load_pgs()+0x4a9) [0x758339] 8: > (OSD::init()+0xcd3) [0x7619c3] 9: (main()+0x3678) [0x64d6a8] 10: > (libc_start_main()+0xf0) [0x7fb32ca68830] 11: (_start()+0x29) > [0x717389] NOTE: a copy of the executable, or objdump -rdS > <executable> is needed to interpret this.

4 years, 8 months

1
0
0 0

Fwd: Re: [ceph-users] Ceph Scientific Computing User Group

by Jan Fajerski

Including new ceph-users list. ----- Forwarded message from Mike Perez <miperez(a)redhat.com> ----- Date: Fri, 2 Aug 2019 10:08:20 -0700 From: Mike Perez <miperez(a)redhat.com> To: Kevin Hrpcek <kevin.hrpcek(a)ssec.wisc.edu> CC: "ceph-users(a)lists.ceph.com" <ceph-users(a)lists.ceph.com> Subject: Re: [ceph-users] Ceph Scientific Computing User Group We have scheduled the next meeting on the community calendar for August 28 at 14:30 UTC. Each meeting will then take place on the last Wednesday of each month. Here's the pad to collect agenda/notes: [1]https://pad.ceph.com/p/Ceph_Science_User_Group_Index -- Mike Perez (thingee) On Tue, Jul 23, 2019 at 10:40 AM Kevin Hrpcek <[2]kevin.hrpcek(a)ssec.wisc.edu> wrote: Update We're going to hold off until August for this so we can promote it on the Ceph twitter with more notice. Sorry for the inconvenience if you were planning on the meeting tomorrow. Keep a watch on the list, twitter, or ceph calendar for updates. Kevin On 7/5/19 11:15 PM, Kevin Hrpcek wrote: We've had some positive feedback and will be moving forward with this user group. The first virtual user group meeting is planned for July 24th at 4:30pm central European time/10:30am American eastern time. We will keep it to an hour in length. The plan is to use the ceph bluejeans video conferencing and it will be put on the ceph community calendar. I will send out links when it is closer to the 24th. The goal of this user group is to promote conversations and sharing ideas for how ceph is used in the the scientific/hpc/htc communities. Please be willing to discuss your use cases, cluster configs, problems you've had, shortcomings in ceph, etc... Not everyone pays attention to the ceph lists so feel free to share the meeting information with others you know that may be interested in joining in. Contact me if you have questions, comments, suggestions, or want to volunteer a topic for meetings. I will be brainstorming some conversation starters but it would also be interesting to have people give a deep dive into their use of ceph and what they have built around it to support the science being done at their facility. Kevin On 6/17/19 10:43 AM, Kevin Hrpcek wrote: Hey all, At cephalocon some of us who work in scientific computing got together for a BoF and had a good conversation. There was some interest in finding a way to continue the conversation focused on ceph in scientific computing and htc/hpc environments. We are considering putting together monthly video conference user group meeting to facilitate sharing thoughts and ideas for this part of the ceph community. At cephalocon we mostly had teams present from the EU so I'm interested in hearing how much community interest there is in a ceph+science/HPC/HTC user group meeting. It will be impossible to pick a time that works well for everyone but initially we considered something later in the work day for EU countries. Reply to me if you're interested and please include your timezone. Kevin _______________________________________________ ceph-users mailing list [3]ceph-users(a)lists.ceph.com [4]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [5]ceph-users(a)lists.ceph.com [6]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list [7]ceph-users(a)lists.ceph.com [8]http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com References 1. https://pad.ceph.com/p/Ceph_Science_User_Group_Index 2. mailto:kevin.hrpcek@ssec.wisc.edu 3. mailto:ceph-users@lists.ceph.com 4. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 5. mailto:ceph-users@lists.ceph.com 6. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 7. mailto:ceph-users@lists.ceph.com 8. http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users(a)lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ----- End forwarded message ----- -- Jan Fajerski Engineer Enterprise Storage SUSE Linux GmbH, GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 21284 (AG Nürnberg)

4 years, 8 months

1
0
0 0

Can kstore be used as OSD objectstore backend when deploying a Ceph Storage Cluster? If can, how to?

by R.R.Yuan

Hi, All, When deploying a development cluster, there are three types of OSD objectstore backend: filestore, bluestore and kstore. But there is no "--kstore" option when using "ceph-deploy osd"command to deploy a real ceph cluster. Can kstore be used as OSD objectstore backend when deploy a real ceph cluster? If can, how to ? Thanks a lot R.R.Yuan

4 years, 8 months

1
0
0 0

2024

2023

2022

2021

2020

2019