Hi all,
We have a ceph cluster in production with 6 osds servers (with 16x8TB
disks), 3 mons/mgrs and 3 mdss. Both public and cluster networks are in
10GB and works well.
After a major crash in april, we turned the option bluefs_buffered_io to
false to workaround the large write bug when bluefs_buffered_io was
true (we were in version 14.2.8 and the default value at this time was
true).
Since that time, we regularly have some osds wrongly marked down by the
cluster after heartbeat timeout (heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15).
Generally the osd restart and the cluster is back healthy, but several
time, after many of these kick-off the osd reach the
osd_op_thread_suicide_timeout and goes down definitely.
We increased the osd_op_thread_timeout and
osd_op_thread_suicide_timeout... The problems still occurs (but less
frequently).
Few days ago, we upgraded to 14.2.11 and revert the timeout to their
default value, hoping that it will solve the problem (we thought that it
should be related to this bug https://tracker.ceph.com/issues/45943),
but it didn't. We still have some osds wrongly marked down.
Can somebody help us to fix this problem ?
Thanks.
Here is an extract of an osd log at failure time:
---------------------------------
2020-08-28 02:19:05.019 7f03f1384700 0 log_channel(cluster) log [DBG] :
44.7d scrub starts
2020-08-28 02:19:25.755 7f040e43d700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
2020-08-28 02:19:25.755 7f040dc3c700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
this last line is repeated more than 1000 times
...
2020-08-28 02:20:17.484 7f040d43b700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
2020-08-28 02:20:17.551 7f03f1384700 0
bluestore(/var/lib/ceph/osd/ceph-16) log_latency_fn slow operation
observed for _collection_list, latency = 67.3532s, lat = 67s cid
=44.7d_head start GHMAX end GHMAX max 25
...
2020-08-28 02:20:22.600 7f040dc3c700 1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
2020-08-28 02:21:20.774 7f03f1384700 0
bluestore(/var/lib/ceph/osd/ceph-16) log_latency_fn slow operation
observed for _collection_list, latency = 63.223s, lat = 63s cid
=44.7d_head start
#44:beffc78d:::rbd_data.1e48e8ab988992.00000000000011bd:0# end #MAX# max
2147483647
2020-08-28 02:21:20.774 7f03f1384700 1 heartbeat_map reset_timeout
'OSD::osd_op_tp thread 0x7f03f1384700' had timed out after 15
2020-08-28 02:21:20.805 7f03f1384700 0 log_channel(cluster) log [DBG] :
44.7d scrub ok
2020-08-28 02:21:21.099 7f03fd997700 0 log_channel(cluster) log [WRN] :
Monitor daemon marked osd.16 down, but it is still running
2020-08-28 02:21:21.099 7f03fd997700 0 log_channel(cluster) log [DBG] :
map e609411 wrongly marked me down at e609410
2020-08-28 02:21:21.099 7f03fd997700 1 osd.16 609411
start_waiting_for_healthy
2020-08-28 02:21:21.119 7f03fd997700 1 osd.16 609411 start_boot
2020-08-28 02:21:21.124 7f03f0b83700 1 osd.16 pg_epoch: 609410
pg[36.3d0( v 609409'481293 (449368'478292,609409'481293]
local-lis/les=609403/609404 n=154651 ec=435353/435353 lis/c
609403/609403 les/c/f 609404/609404/0 609410/609410/608752) [25,72] r=-1
lpr=609410 pi=[609403,609410)/1 luod=0'0 lua=609392'481198
crt=609409'481293 lcod 609409'481292 active mbc={}]
start_peering_interval up [25,72,16] -> [25,72], acting [25,72,16] ->
[25,72], acting_primary 25 -> 25, up_primary 25 -> 25, role 2 -> -1,
features acting 4611087854031667199 upacting 4611087854031667199
...
2020-08-28 02:21:21.166 7f03f0b83700 1 osd.16 pg_epoch: 609411
pg[36.56( v 609409'480511 (449368'477424,609409'480511]
local-lis/les=609403/609404 n=153854 ec=435353/435353 lis/c
609403/609403 les/c/f 609404/609404/0 609410/609410/609410) [103,102]
r=-1 lpr=609410 pi=[609403,609410)/1 crt=609409'480511 lcod
609409'480510 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
2020-08-28 02:21:21.307 7f04073b0700 1 osd.16 609413 set_numa_affinity
public network em1 numa node 0
2020-08-28 02:21:21.307 7f04073b0700 1 osd.16 609413 set_numa_affinity
cluster network em2 numa node 0
2020-08-28 02:21:21.307 7f04073b0700 1 osd.16 609413 set_numa_affinity
objectstore and network numa nodes do not match
2020-08-28 02:21:21.307 7f04073b0700 1 osd.16 609413 set_numa_affinity
not setting numa affinity
2020-08-28 02:21:21.566 7f040a435700 1 osd.16 609413 tick checking mon
for new map
2020-08-28 02:21:22.515 7f03fd997700 1 osd.16 609414 state: booting ->
active
2020-08-28 02:21:22.515 7f03f0382700 1 osd.16 pg_epoch: 609414
pg[36.20( v 609409'483167 (449368'480117,609409'483167]
local-lis/les=609403/609404 n=155171 ec=435353/435353 lis/c
609403/609403 les/c/f 609404/609404/0 609414/609414/609361) [97,16,72]
r=1 lpr=609414 pi=[609403,609414)/1 crt=609409'483167 lcod 609409'483166
unknown NOTIFY mbc={}] start_peering_interval up [97,72] -> [97,16,72],
acting [97,72] -> [97,16,72], acting_primary 97 -> 97, up_primary 97 ->
97, role -1 -> 1, features acting 4611087854031667199 upacting
4611087854031667199
...
2020-08-28 02:21:22.522 7f03f1384700 1 osd.16 pg_epoch: 609414
pg[36.2f3( v 609409'479796 (449368'476712,609409'479796]
local-lis/les=609403/609404 n=154451 ec=435353/435353 lis/c
609403/609403 les/c/f 609404/609404/0 609414/609414/609414) [16,34,21]
r=0 lpr=609414 pi=[609403,609414)/1 crt=609409'479796 lcod 609409'479795
mlcod 0'0 unknown NOTIFY mbc={}] start_peering_interval up [34,21] ->
[16,34,21], acting [34,21] -> [16,34,21], acting_primary 34 -> 16,
up_primary 34 -> 16, role -1 -> 0, features acting 4611087854031667199
upacting 4611087854031667199
2020-08-28 02:21:22.522 7f03f1384700 1 osd.16 pg_epoch: 609414
pg[36.2f3( v 609409'479796 (449368'476712,609409'479796]
local-lis/les=609403/609404 n=154451 ec=435353/435353 lis/c
609403/609403 les/c/f 609404/609404/0 609414/609414/609414) [16,34,21]
r=0 lpr=609414 pi=[609403,609414)/1 crt=609409'479796 lcod 609409'479795
mlcod 0'0 unknown mbc={}] state<Start>: transitioning to Primary
2020-08-28 02:21:24.738 7f03f1384700 0 log_channel(cluster) log [DBG] :
36.2f3 scrub starts
2020-08-28 02:22:18.857 7f03f1384700 0 log_channel(cluster) log [DBG] :
36.2f3 scrub ok
Are you unable to complete your homework assignment? Are you looking for a reliable online homework help service provider? LiveWebTutors is one of the best and most reliable companies when it comes to providing an assignment writing service. You can connect with the experts whether you are in need of a dissertation writing service or coursework help service. The professionals will ensure that your writing needs are covered within the given deadline and that too as per the instructions stated by the college professor! So, get your grades better by connecting with the professionals of LiveWebTutors now!
Visit us: https://www.livewebtutors.com/usa/coursework-help
Hi!
We've recently upgraded all our clusters from Mimic to Octopus (15.2.4). Since
then, our largest cluster is experiencing random crashes on OSDs attached to the
mon hosts.
This is the crash we are seeing (cut for brevity, see links in post scriptum):
{
"ceph_version": "15.2.4",
"utsname_release": "4.15.0-72-generic",
"assert_condition": "r == 0",
"assert_func": "void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)",
"assert_file": "/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc <http://bluestore.cc/>",
"assert_line": 11430,
"assert_thread_name": "bstore_kv_sync",
"assert_msg": "/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc <http://bluestore.cc/>: In function 'void BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)' thread 7fc56311a700 time 2020-08-26T08:52:24.917083+0200\n/build/ceph-15.2.4/src/os/bluestore/BlueStore.cc <http://bluestore.cc/>: 11430: FAILED ceph_assert(r == 0)\n",
"backtrace": [
"(()+0x12890) [0x7fc576875890]",
"(gsignal()+0xc7) [0x7fc575527e97]",
"(abort()+0x141) [0x7fc575529801]",
"(ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1a5) [0x559ef9ae97b5]",
"(ceph::__ceph_assertf_fail(char const*, char const*, int, char const*, char const*, ...)+0) [0x559ef9ae993f]",
"(BlueStore::_txc_apply_kv(BlueStore::TransContext*, bool)+0x3a0) [0x559efa0245b0]",
"(BlueStore::_kv_sync_thread()+0xbdd) [0x559efa07745d]",
"(BlueStore::KVSyncThread::entry()+0xd) [0x559efa09cd3d]",
"(()+0x76db) [0x7fc57686a6db]",
"(clone()+0x3f) [0x7fc57560a88f]"
]
}
Right before the crash occurs, we see the following message in the crash log:
-3> 2020-08-26T08:52:24.787+0200 7fc569b2d700 2 rocksdb: [db/db_impl_compaction_flush.cc:2212 <http://db_impl_compaction_flush.cc:2212/>] Waiting after background compaction error: Corruption: block checksum mismatch: expected 2548200440, got 2324967102 in db/815839.sst offset 67107066 size 3808, Accumulated background error counts: 1
-2> 2020-08-26T08:52:24.852+0200 7fc56311a700 -1 rocksdb: submit_common error: Corruption: block checksum mismatch: expected 2548200440, got 2324967102 in db/815839.sst offset 67107066 size 3808 code = 2 Rocksdb transaction:
In short, we see a Rocksdb corruption error after background compaction, when this happens.
When an OSD crashes, which happens about 10-15 times a day, it restarts and
resumes work without any further problems.
We are pretty confident that this is not a hardware issue, due to the following facts:
* The crashes occur on 5 different hosts over 3 different racks.
* There is no smartctl/dmesg output that could explain it.
* It usually happens to a different OSD that did not crash before.
Still we checked the following on a few OSDs/hosts:
* We can do a manual compaction, both offline and online.
* We successfully ran "ceph-bluestore-tool fsck --deep yes" on one of the OSDs.
* We manually compacted a number of OSDs, one of which crashed hours later.
The only thing we have noticed so far: It only happens to OSDs that are attached
to a mon host. *None* of the non-mon host OSDs have had a crash!
Does anyone have a hint what could be causing this? We currently have no good
theory that could explain this, much less have a fix or workaround.
Any help would be greatly appreciated.
Denis
Crash: https://public-resources.objects.lpg.cloudscale.ch/osd-crash/meta.txt <https://public-resources.objects.lpg.cloudscale.ch/osd-crash/meta.txt>
Log: https://public-resources.objects.lpg.cloudscale.ch/osd-crash/log.txt <https://public-resources.objects.lpg.cloudscale.ch/osd-crash/log.txt>
Hi everyone, a bucket was overquota, (default quota of 300k objects per bucket), I enabled the object quota for this bucket and set a quota of 600k objects.
We are on Luminous (12.2.12) and dynamic resharding is disabled, I manually do the resharding from 3 to 6 shards.
Since then, radosgw-admin bucket stats report a `rgw.none` in the usage section for this bucket.
I search the mailing-lists, bugzilla, github, it's look like I can ignore the rgw.none stats. (0 byte object, entry left in the index marked as cancelled...)
but, the num_object in rgw.none is part of the quota usage.
I bump the quota to 800k object to workaround the problem. (without resharding)
Is there a way I can garbage collect the rgw.none?
Is this problem fixed in Mimic/Nautilus/Octopus?
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 417827
},
"rgw.main": {
"size": 1390778138502,
"size_actual": 1391581007872,
"size_utilized": 1390778138502,
"size_kb": 1358181776,
"size_kb_actual": 1358965828,
"size_kb_utilized": 1358181776,
"num_objects": 305637
}
},
Thanks!
Hello again
So I have changed the network configuration.
Now my Ceph is reachable from outside, this also means all osd’s of all nodes are reachable.
I still have the same behaviour which is a timeout.
The client can resolve all nodes with their hostnames.
The mon’s are still listening on the internal network so the nat rule is still there.
I have set “public bind addr” to the external ip and restarted the mon but it’s still not working.
[root@testnode1 ~]# ceph config get mon.public_bind_addr
WHO MASK LEVEL OPTION VALUE RO
mon advanced public_bind_addr v2:[ext-addr]:0/0 *
Do I have to change them somewhere else too?
Thanks in advance,
Simon
Von: Janne Johansson [mailto:icepic.dz@gmail.com]
Gesendet: 27 August 2020 20:01
An: Simon Sutter <ssutter(a)hosttech.ch>
Betreff: Re: [ceph-users] cephfs needs access from two networks
Den tors 27 aug. 2020 kl 12:05 skrev Simon Sutter <ssutter(a)hosttech.ch<mailto:ssutter@hosttech.ch>>:
Hello Janne
Oh I missed that point. No, the client cannot talk directly to the osds.
In this case it’s extremely difficult to set this up.
This is an absolute requirement to be a ceph client.
How is the mon telling the client, which host and port of the osd, it should connect to?
The same port and ip that the ODS called into the mon with when it started up and joined the clusster.
Can I have an influence on it?
Well, you set the ip on the OSD hosts, and the port ranges in use for OSDs are changeable/settable, but it would not really help the above-mentioned client.
Von: Janne Johansson [mailto:icepic.dz@gmail.com<mailto:icepic.dz@gmail.com>]
Gesendet: 26 August 2020 15:09
An: Simon Sutter <ssutter(a)hosttech.ch<mailto:ssutter@hosttech.ch>>
Cc: ceph-users(a)ceph.io<mailto:ceph-users@ceph.io>
Betreff: Re: [ceph-users] cephfs needs access from two networks
Den ons 26 aug. 2020 kl 14:16 skrev Simon Sutter <ssutter(a)hosttech.ch<mailto:ssutter@hosttech.ch>>:
Hello,
So I know, the mon services can only bind to just one ip.
But I have to make it accessible to two networks because internal and external servers have to mount the cephfs.
The internal ip is 10.99.10.1 and the external is some public-ip.
I tried nat'ing it with this: "firewall-cmd --zone=public --add-forward-port=port=6789:proto=tcp:toport=6789:toaddr=10.99.10.1 -permanent"
So the nat is working, because I get a "ceph v027" (alongside with some gibberish) when I do a telnet "telnet *public-ip* 6789"
But when I try to mount it, I get just a timeout:
mount -vvvv -t ceph *public-ip*:6789:/testing /mnt -o name=test,secretfile=/root/ceph.client. test.key
mount error 110 = Connection timed out
The tcpdump also recognizes a "Ceph Connect" packet, coming from the mon.
How can I get around this problem?
Is there something I have missed?
Any ceph client will need direct access to all OSDs involved also. Your mail doesn't really say if the cephfs-mounting client can talk to OSDs?
In ceph, traffic is not shuffled via mons, mons only tell the client which OSDs it needs to talk to, then all IO goes directly from client to any involved OSD servers.
--
May the most significant bit of your life be positive.
--
May the most significant bit of your life be positive.
The mons get their bind address from the monmap I believe. So this means
changing in the monmap the ip-addresses of the monitors with the
monmaptool.
Regards
Marcel
> Hello again
>
> So I have changed the network configuration.
> Now my Ceph is reachable from outside, this also means all osdâs of all
> nodes are reachable.
> I still have the same behaviour which is a timeout.
>
> The client can resolve all nodes with their hostnames.
> The monâs are still listening on the internal network so the nat rule is
> still there.
> I have set âpublic bind addrâ to the external ip and restarted the mon
> but itâs still not working.
>
> [root@testnode1 ~]# ceph config get mon.public_bind_addr
> WHO MASK LEVEL OPTION VALUE RO
> mon advanced public_bind_addr v2:[ext-addr]:0/0 *
>
> Do I have to change them somewhere else too?
>
> Thanks in advance,
> Simon
>
Hi all,
I tried to set bucket quota using admin API as shown below:
admin/user?quota&uid=bse&bucket=test"a-type=bucket
with payload in json format:
{
"enabled": true,
"max_size": 1099511627776,
"max_size_kb": 1073741824,
"max_objects": -1
}
it returned success but the quota change did not happen, as confirmed by
'radosgw-admin bucket stats --bucket=test' command.
Am I missing something obvious? Please kindly advise/suggest.
By the way, I am using ceph mimic (v13.2.4). Setting quota by radosgw-admin
quota set --bucket=${BUCK} --max-size=1T --quota-scope=bucket works, but I
want to do it programmatically.
Thanks in advance,
-Youzhong
Is there a way to remove an OSD spec from the mgr? I've got one in there that I don't want. It shows up when I do "ceph orch osd spec --preview", and I can't find any way to get rid of it.