Hi Ilya,
ISTR there were some anti-spam measures put in place. Is your account
waiting for manual approval? If so, David should be able to help.
Yes if I remember correctly I get waiting approval when I try to log in.
Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9287
ffff911a9a26bd00 fail -12
Dec 1 03:14:36 c04 kernel: ceph: build_snap_context 100020c9283
It is failing to allocate memory. "low load" isn't very specific,
can you describe the setup and the workload in more detail?
4 nodes (osd, mon combined), the 4th node has local cephfs mount, which
is rsync'ing some files from vm's. 'low load' I have sort of test setup,
going to production. Mostly the nodes are below a load of 1 (except when
the concurrent rsync starts)
How many snapshots do you have?
Don't know how to count them. I have script running on a 2000 dirs. If
one of these dirs is not empty it creates a snapshot. So in theory I
could have 2000 x 7 days = 14000 snapshots.
(btw the cephfs snapshots are in a different tree than the rsync is
using)
Do you keep track of memory consumption on the node?
A bit, attached is nagios graph. I have 100GB in this node. Since then,
I disabled all the hugepages (2MB, 1GB) I created there, to free up more
memory.
Finally, you say "crash" in the subject.
Does the kernel actually
crash or perhaps it locks up? If it actually crashes, do you have the
panic message?
Whole server was gone. The logs are from the remote syslog server.
New situation is with more memory and kernel updated to
3.10.0-1062.4.3.el7.x86_64, rsync is very slow and I have kworker 100%
load