On Tue, Sep 10, 2019 at 3:42 PM Ashley Merrick <singapore@amerrick.co.uk> wrote:

What's specs ate the machines?

Recovery work will use more memory the general clean operation and looks like your maxing out the available memory on the machines during CEPH trying to recover.

---- On Tue, 10 Sep 2019 18:10:50 +0800 amudhan83@gmail.com wrote ----

I have also found below error in dmesg.

[332884.028810] systemd-journald[6240]: Failed to parse kernel command line, ignoring: Cannot allocate memory
[332885.054147] systemd-journald[6240]: Out of memory.
[332894.844765] systemd[1]: systemd-journald.service: Main process exited, code=exited, status=1/FAILURE
[332897.199736] systemd[1]: systemd-journald.service: Failed with result 'exit-code'.
[332906.503076] systemd[1]: Failed to start Journal Service.
[332937.909198] systemd[1]: ceph-crash.service: Main process exited, code=exited, status=1/FAILURE
[332939.308341] systemd[1]: ceph-crash.service: Failed with result 'exit-code'.
[332949.545907] systemd[1]: systemd-journald.service: Service has no hold-off time, scheduling restart.
[332949.546631] systemd[1]: systemd-journald.service: Scheduled restart job, restart counter is at 7.
[332949.546781] systemd[1]: Stopped Journal Service.
[332949.566402] systemd[1]: Starting Journal Service...
[332950.190332] systemd[1]: ceph-osd@1.service: Main process exited, code=killed, status=6/ABRT
[332950.190477] systemd[1]: ceph-osd@1.service: Failed with result 'signal'.
[332950.842297] systemd-journald[6249]: File /var/log/journal/8f2559099bf54865adc95e5340d04447/system.journal corrupted or uncleanly shut down, renaming and replacing.
[332951.019531] systemd[1]: Started Journal Service.

On Tue, Sep 10, 2019 at 3:04 PM Amudhan P <amudhan83@gmail.com> wrote:
Hi,

I am using ceph version 13.2.6 (mimic) on test setup trying with cephfs.

My current setup:
3 nodes, 1 node contain two bricks and other 2 nodes contain single brick each.

Volume is a 3 replica, I am trying to simulate node failure.

I powered down one host and started getting msg in other systems when running any command
"-bash: fork: Cannot allocate memory" and system not responding to commands.

what could be the reason for this?
at this stage, I could able to read some of the data stored in the volume and some just waiting for IO.

output from "sudo ceph -s"
cluster:
id: 7c138e13-7b98-4309-b591-d4091a1742b4
health: HEALTH_WARN
1 osds down
2 hosts (3 osds) down
Degraded data redundancy: 5313488/7970232 objects degraded (66.667%), 64 pgs degraded

services:
mon: 1 daemons, quorum mon01
mgr: mon01(active)
mds: cephfs-tst-1/1/1 up {0=mon01=up:active}
osd: 4 osds: 1 up, 2 in

data:
pools: 2 pools, 64 pgs
objects: 2.66 M objects, 206 GiB
usage: 421 GiB used, 3.2 TiB / 3.6 TiB avail
pgs: 5313488/7970232 objects degraded (66.667%)
64 active+undersized+degraded

io:
client: 79 MiB/s rd, 24 op/s rd, 0 op/s wr

output from : sudo ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS
0 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
3 hdd 1.81940 0 0 B 0 B 0 B 0 0 0
1 hdd 1.81940 1.00000 1.8 TiB 211 GiB 1.6 TiB 11.34 1.00 0
2 hdd 1.81940 1.00000 1.8 TiB 210 GiB 1.6 TiB 11.28 1.00 64
TOTAL 3.6 TiB 421 GiB 3.2 TiB 11.31
MIN/MAX VAR: 1.00/1.00 STDDEV: 0.03

regards
Amudhan

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io