I have also found below error in dmesg.

[332884.028810] systemd-journald[6240]: Failed to parse kernel command line, ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited, code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result 'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited, code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result 'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited, code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted or uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.

On Tue, Sep 10, 2019 at 3:04 PM Amudhan P <amudhan83@gmail.com> wrote:

Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.

My current setup:
3 nodes, 1 node contain two bricks and other 2 nodes contain single brick each.

Volume is a 3 replica, I am trying to simulate node failure.

I powered down one host and started getting msg in other systems when running any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.

what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume and some just waiting for IO.

output from "sudo ceph -s"
cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
1 osds down
2 hosts (3 osds) down
Degraded data redundancy: 5313488/7970232 objects degraded (66.667%), 64 pgs degraded

services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up {0=mon01=up:active}
osd: 4 osds: 1 up, 2 in

data:
pools: 2 pools, 64 pgs
objects: 2.66 M objects, 206 GiB
usage: 421 GiB used, 3.2 TiB / 3.6 TiB avail
pgs: 5313488/7970232 objects degraded (66.667%)
64 active+undersized+degraded

io:
client: 79 MiB/s rd, 24 op/s rd, 0 op/s wr

output from : sudo ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
0 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
3 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
1 hdd 1.81940 1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00 0
2 hdd 1.81940 1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00 64
TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00 STDDEV: 0.03

regards
Amudhan