Hi,
We are experiencing very slow "rm" on client lately, and it's not clear
what is wrong. The symptom shows like this:
on client:
# cat /sys/kernel/debug/ceph/*/mdsc
3436737 mds0 rmdir #100216d0113/utils
(hpc/session/H9MKDmgVn2unmmR0Xox1SiGmABFKDmABFKDmmH9XDmABFKDmzmjnGo/pilot/radical/utils)
mds:
# ceph daemon mds.velikaponca dump_ops_in_flight
{
"ops": [
{
"description": "client_request(client.14321210:3436737
rmdir #0x100216d0113/utils 2019-07-03 09:48:02.000548 caller_uid=0,
caller_gid=0{})",
"initiated_at": "2019-07-03 09:48:02.001005",
"age": 4.046726,
"duration": 4.046759,
"type_data": {
"flag_point": "failed to xlock, waiting",
"reqid": "client.14321210:3436737",
"op_type": "client_request",
"client_info": {
"client": "client.14321210",
"tid": 3436737
},
"events": [
{
"time": "2019-07-03 09:48:02.001005",
"event": "initiated"
},
{
"time": "2019-07-03 09:48:02.001005",
"event": "header_read"
},
{
"time": "2019-07-03 09:48:02.001006",
"event": "throttled"
},
{
"time": "2019-07-03 09:48:02.001011",
"event": "all_read"
},
{
"time": "2019-07-03 09:48:02.001131",
"event": "dispatched"
},
{
"time": "2019-07-03 09:48:02.001261",
"event": "failed to wrlock, waiting"
},
{
"time": "2019-07-03 09:48:02.001744",
"event": "failed to xlock, waiting"
},
{
"time": "2019-07-03 09:48:02.010963",
"event": "failed to xlock, waiting"
}
]
}
}
],
"num_ops": 1
}
The client requests appear to be very fast, but rmdir tends to fail to
xclock frequently, and it takes up to 10s before the operation
completes, which slows down a lot the services running on the client,
and it's also not related to the load on ceph servers. Restarting mds
solves the issue for few minutes, but then it reappears.
The version is mimic 13.2.6, and the cluster is healthy, most of the
clients are on 5.1.* kernel. The server experiencing the issues is on
centos7 kernel 3.10.0-957.21.3.el7.x86_64, we have also tested it with
5.1.15 kernel showing the same symptoms.
Any ideas how to solve this problem?
Cheers,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-477-3166
-------------------------------------------------------------
Hello,
we are experiencing crashing OSDs in multiple independent Ceph clusters.
Each OSD has very similar log entries regarding the crash as far as I
can tell.
Example log: https://pastebin.com/raw/vQ2AJ5ud
I can provide you with more log files. They are too large for pastebin
and I'm not aware of this mailing lists email attachement policy.
Every log consists of the following entries:
2019-07-10 21:36:31.903886 7f322aeff700 -1 rocksdb: submit_transaction
error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key =
0x00000000000008c1'.0000461231.00000000000125574325' Value size = 184)
Put( Prefix = M key = 0x00000000000008c1'._fastinfo' Value size = 186)
Put( Prefix = O key =
0x7f80000000000000015806b4'(!rbd_data.7c012a6b8b4567.000000000000004e!='0xfffffffffffffffeffffffffffffffff6f002f0000'x'
Value size = 325)
Put( Prefix = O key =
0x7f80000000000000015806b4'(!rbd_data.7c012a6b8b4567.000000000000004e!='0xfffffffffffffffeffffffffffffffff'o'
Value size = 1608)
Put( Prefix = L key = 0x000000000226dc7a Value size = 16440)
2019-07-10 21:36:31.913113 7f322aeff700 -1
/build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_kv_sync_thread()' thread 7f322aeff700 time 2019-07-10
21:36:31.903909
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
ceph version 12.2.12-7-g1321c5e91f
(1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)
Unfortunately I'm unable to interpret the dumps. I hope you can help me
with this issue.
Regards,
Daniel
--
Mit freundlichen Grüßen
Daniel Aberger
Ihr Profihost Team
-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland
Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info(a)profihost.com
Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
Hello,
we are experiencing crashing OSDs in multiple independent Ceph clusters.
Each OSD has very similar log entries regarding the crash as far as I
can tell.
Example log: https://pastebin.com/raw/vQ2AJ5ud
I can provide you with more log files. They are too large for pastebin
and I'm not aware of this mailing lists email attachement policy.
Every log consists of the following entries:
2019-07-10 21:36:31.903886 7f322aeff700 -1 rocksdb: submit_transaction
error: Corruption: block checksum mismatch code = 2 Rocksdb transaction:
Put( Prefix = M key =
0x00000000000008c1'.0000461231.00000000000125574325' Value size = 184)
Put( Prefix = M key = 0x00000000000008c1'._fastinfo' Value size = 186)
Put( Prefix = O key =
0x7f80000000000000015806b4'(!rbd_data.7c012a6b8b4567.000000000000004e!='0xfffffffffffffffeffffffffffffffff6f002f0000'x'
Value size = 325)
Put( Prefix = O key =
0x7f80000000000000015806b4'(!rbd_data.7c012a6b8b4567.000000000000004e!='0xfffffffffffffffeffffffffffffffff'o'
Value size = 1608)
Put( Prefix = L key = 0x000000000226dc7a Value size = 16440)
2019-07-10 21:36:31.913113 7f322aeff700 -1
/build/ceph/src/os/bluestore/BlueStore.cc: In function 'void
BlueStore::_kv_sync_thread()' thread 7f322aeff700 time 2019-07-10
21:36:31.903909
/build/ceph/src/os/bluestore/BlueStore.cc: 8808: FAILED assert(r == 0)
ceph version 12.2.12-7-g1321c5e91f
(1321c5e91f3d5d35dd5aa5a0029a54b9a8ab9498) luminous (stable)
Unfortunately I'm unable to interpret the dumps. I hope you can help me
with this issue.
Regards,
Daniel
--
Mit freundlichen Grüßen
Daniel Aberger
Ihr Profihost Team
-------------------------------
Profihost AG
Expo Plaza 1
30539 Hannover
Deutschland
Tel.: +49 (511) 5151 8181 | Fax.: +49 (511) 5151 8282
URL: http://www.profihost.com | E-Mail: info(a)profihost.com
Sitz der Gesellschaft: Hannover, USt-IdNr. DE813460827
Registergericht: Amtsgericht Hannover, Register-Nr.: HRB 202350
Vorstand: Cristoph Bluhm, Sebastian Bluhm, Stefan Priebe
Aufsichtsrat: Prof. Dr. iur. Winfried Huck (Vorsitzender)
I want to setup ceph install but facing this error when run this command
#ceph-deploy install ceph-deploy monnode1 osd0 osd1
[ceph-deploy][WARNIN] E: Sub-process /usr/bin/dpkg returned an error code
(1)
[ceph-deploy][ERROR ] RuntimeError: command returned non-zero exit status:
100
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: env
DEBIAN_FRONTEND=noninteractive DEBIAN_PRIORITY=critical apt-get
--assume-yes -q --no-install-recommends install -o
Dpkg::Options::=--force-confnew ceph ceph-osd ceph-mds ceph-mon radosgw
Now what can I do?
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…>
Virus-free.
www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campai…>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
As a brand newbie with CEPH I'm trying to figure out if it is a good solution given the hardware I have to work with and the problem I need solve.
At first look CEPH does appear to do what is required, namely a replicating file system. But it appears it might be too "heavy".
The environment is 2 Ubuntu Linux servers each with 2 hard drives. They will be clustered. Nothing, other than the hardware and OS, is decided. We can create any solution that works.
One of the servers in the cluster will be primary and it will fail over to the secondary until the primary is brought back on line.
Does Ceph make sense for keeping the hard drives/filesystems synced?
Thanks for your input.
Joseph
The information transmitted is intended only for the person or entity to which it is addressed and may contain proprietary, business-confidential and/or privileged material. If you are not the intended recipient of this message you are hereby notified that any use, review, retransmission, dissemination, distribution, reproduction or any action taken in reliance upon this message is prohibited. If you received this in error, please contact the sender and delete the material from any computer.
Hi community,
we’re designing our first Ceph cluster here and we’re struggeling with the MON (and also MDS) nodes RAM recommendations. It sais „like 1GB per daemon instance“, but what daemons are meant here? The range is from per MON (MDS) daemon, which would be fairly low, to per OSD daemon, which would be fairly high. Any help would be appreciated.
Kind regards
Sebastian
--
Sebastian Wiesner
IT-Systemadministrator / Application Developer
Department of Operative Processes and Systems (OPS)
Dresden University of Technology
Center for Information Services and High Performance Computing (ZIH)
D-01062 Dresden, Germany
phone: +49 351 463-37828
fax: +49 351 463-32164
e-mail: sebastian.wiesner(a)tu-dresden.de
WWW: http://www.tu-dresden.de/zih
Greetings,
I was wondering if there are limitations or known performance concerns if
we increase the number of buckets per account to more than 1000? Our
usecase was to have close to 1500 buckets per account with approx 40k
objects per bucket on average
Thanks,
Rajesh
--
Save a tree. Don't print this e-mail unless it's really necessary.
Hi all,
I am using ceph 14.2.1 (Nautilus)
I am unable to increase the pg_num of a pool.
I have a pool named Backup, the current pg_num is 64 : ceph osd pool get
Backup pg_num => result pg_num: 64
And when I try to increase it using the command
ceph osd pool set Backup pg_num 512 => result "set pool 6 pg_num to 512"
And then I check with the command : ceph osd pool get Backup pg_num =>
result pg_num: 64
I don't how to increase the pg_num of a pool, I also tried the autoscale
module, but it doesn't work (unable to activate the autoscale, always
warn mode).
Thank you for your help,
Cabeur.
---
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel antivirus Avast.
https://www.avast.com/antivirus
Hi all,
I am having trouble with our cluster getting consistent RBD latencies to our KVM virtual machines connected via the KRBD driver. When measuring with tools like rbd perf image iotop, we constantly see latency spike up from around 1-2ms to 100+ms. This seems to kill our Windows VM SQL performance. I essentially have 2 questions:
1) Am I missing something with my configuration that should be applied to get consistent low latency to the VM guests?
2) When measuring the disks, it seems that sequential IO results in higher latency vs random IO. Is this correct or is there a way to tweak this using the KRBD driver?
Configuration:
3 x MON/MGR nodes
12 x OSD nodes (24 x HDD, 2 x NVMe for DB and WAL)
KVM clients attaching the RBD images via KRBD
1 pool w/ 16384 PGs
Ceph version 14.2.1
Ceph.conf:
[global]
mon host = 10.97.11.17,10.97.11.27,10.97.11.37
public network = 10.97.11.0/24
cluster network = 10.97.12.0/24
auth cluster required = cephx
auth service required = cephx
auth client required = cephx
osd journal size = 30720
osd pool default size = 3
osd pool default min size = 2
osd pool default pg num = 4096
osd pool default pgp num = 4096
osd crush chooseleaf type = 1
[osd]
bluestore_default_buffered_write = false
bluestore_default_buffered_read = true
Thank you,