[ceph-users] Re: Recover pgs from failed osds

31 Aug 2020

osd_memory_target of failed osd in one ceph-osd node changed to 6G but
other osd_memory_target is 3G, starting failed osd with 6G memory_target
causes other osd "down" in ceph-osd node! and failed osd is still down.

On Mon, Aug 31, 2020 at 2:19 PM Eugen Block &lt;eblock(a)nde.ag&gt; wrote:

...
  Can you try the opposite and turn up the memory_target
and only try to
 start a single OSD?

 Zitat von Vahideh Alinouri &lt;vahideh.alinouri(a)gmail.com&gt;om>:

  osd_memory_target is changed to 3G, starting
failed osd causes ceph-osd
 nodes crash! and failed osd is still "down"

 On Fri, Aug 28, 2020 at 1:13 PM Vahideh Alinouri < 
vahideh.alinouri(a)gmail.com&gt;
  wrote:

> Yes, each osd node has 7 osds with 4 GB memory_target.
>
>
> On Fri, Aug 28, 2020, 12:48 PM Eugen Block &lt;eblock(a)nde.ag&gt; wrote:
>
>> Just to confirm, each OSD node has 7 OSDs with 4 GB memory_target?
>> That leaves only 4 GB RAM for the rest, and in case of heavy load the
>> OSDs use even more. I would suggest to reduce the memory_target to 3
>> GB and see if they start successfully.
>>
>>
>> Zitat von Vahideh Alinouri &lt;vahideh.alinouri(a)gmail.com&gt;om>:
>>
>> > osd_memory_target is 4294967296.
>> > Cluster setup:
>> > 3 mon, 3 mgr, 21 osds on 3 ceph-osd nodes in lvm scenario.  ceph-osd
>> nodes
>> > resources are 32G RAM - 4 core CPU - osd disk 4TB - 9 osds have
>> > block.wal on SSDs.  Public network is 1G and cluster network is 10G.
>> > Cluster installed and upgraded using ceph-ansible.
>> >
>> > On Thu, Aug 27, 2020 at 7:01 PM Eugen Block &lt;eblock(a)nde.ag&gt; wrote:
>> >
>> >> What is the memory_target for your OSDs? Can you share more details
>> >> about your setup? You write about high memory, are the OSD nodes
>> >> affected by OOM killer? You could try to reduce the 
osd_memory_target
 >> >> and see if that helps bring the
OSDs back up. Splitting the PGs is a
>> >> very heavy operation.
>> >>
>> >>
>> >> Zitat von Vahideh Alinouri &lt;vahideh.alinouri(a)gmail.com&gt;om>:
>> >>
>> >> > Ceph cluster is updated from nautilus to octopus. On ceph-osd
 nodes
 >> we
>> >> have
>> >> > high I/O wait.
>> >> >
>> >> > After increasing one of pool’s pg_num from 64 to 128 according to
>> warning
>> >> > message (more objects per pg), this lead to high cpu load and ram
>> usage
>> >> on
>> >> > ceph-osd nodes and finally crashed the whole cluster. Three osds,
>> one on
>> >> > each host, stuck at down state (osd.34 osd.35 osd.40).
>> >> >
>> >> > Starting the down osd service causes high ram usage and cpu load
 and
 >> >> > ceph-osd node to crash
until the osd service fails.
>> >> >
>> >> > The active mgr service on each mon host will crash after consuming
>> almost
>> >> > all available ram on the physical hosts.
>> >> >
>> >> > I need to recover pgs and solving corruption. How can i recover
>> unknown
>> >> and
>> >> > down pgs? Is there any way to starting up failed osd?
>> >> >
>> >> >
>> >> > Below steps are done:
>> >> >
>> >> > 1- osd nodes’ kernel was upgraded to 5.4.2 before ceph cluster
>> upgrading.
>> >> > Reverting to previous kernel 4.2.1 is tested for iowate 
decreasing,
 >> but
>> >> it
>> >> > had no effect.
>> >> >
>> >> > 2- Recovering 11 pgs on failed osds by export them using
>> >> > ceph-objectstore-tools utility and import them on other osds. The
>> result
>> >> > followed: 9 pgs are “down” and 2 pgs are “unknown”.
>> >> >
>> >> > 2-1) 9 pgs export and import successfully but status is “down”
>> because of
>> >> > "peering_blocked_by" 3 failed osds. I cannot lost osds
because of
>> >> > preventing unknown pgs from getting lost. pgs size in K and M.
>> >> >
>> >> > "peering_blocked_by": [
>> >> >
>> >> > {
>> >> >
>> >> > "osd": 34,
>> >> >
>> >> > "current_lost_at": 0,
>> >> >
>> >> > "comment": "starting or marking this osd lost may
let us proceed"
>> >> >
>> >> > },
>> >> >
>> >> > {
>> >> >
>> >> > "osd": 35,
>> >> >
>> >> > "current_lost_at": 0,
>> >> >
>> >> > "comment": "starting or marking this osd lost may
let us proceed"
>> >> >
>> >> > },
>> >> >
>> >> > {
>> >> >
>> >> > "osd": 40,
>> >> >
>> >> > "current_lost_at": 0,
>> >> >
>> >> > "comment": "starting or marking this osd lost may
let us proceed"
>> >> >
>> >> > }
>> >> >
>> >> > ]
>> >> >
>> >> >
>> >> > 2-2) 1 pg (2.39) export and import successfully, but after 
starting
 >> osd
>> >> > service (pg import to it), ceph-osd node RAM and CPU consumption
>> increase
>> >> > and cause ceph-osd node to crash until the osd service fails.
 Other
 >> osds
>> >> > become "down" on ceph-osd node. pg status is “unknown”. I
cannot  use
 >> >> > "force-create-pg"
because of data lost. pg 2.39 size is 19G.
>> >> >
>> >> > # ceph pg map 2.39
>> >> >
>> >> > osdmap e40347 pg 2.39 (2.39) -> up [32,37] acting [32,37]
>> >> >
>> >> > # ceph pg 2.39 query
>> >> >
>> >> > Error ENOENT: i don't have pgid 2.39
>> >> >
>> >> >
>> >> > *pg 2.39 info on failed osd:
>> >> >
>> >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-34*
 --op
 >> info
>> >> > --pgid 2.39
>> >> >
>> >> > {
>> >> >
>> >> > "pgid": "2.39",
>> >> >
>> >> > "last_update": "35344'6456084",
>> >> >
>> >> > "last_complete": "35344'6456084",
>> >> >
>> >> > "log_tail": "35344'6453182",
>> >> >
>> >> > "last_user_version": 10595821,
>> >> >
>> >> > "last_backfill": "MAX",
>> >> >
>> >> > "purged_snaps": [],
>> >> >
>> >> > "history": {
>> >> >
>> >> > "epoch_created": 146,
>> >> >
>> >> > "epoch_pool_created": 79,
>> >> >
>> >> > "last_epoch_started": 25208,
>> >> >
>> >> > "last_interval_started": 25207,
>> >> >
>> >> > "last_epoch_clean": 25208,
>> >> >
>> >> > "last_interval_clean": 25207,
>> >> >
>> >> > "last_epoch_split": 370,
>> >> >
>> >> > "last_epoch_marked_full": 0,
>> >> >
>> >> > "same_up_since": 8347,
>> >> >
>> >> > "same_interval_since": 25207,
>> >> >
>> >> > "same_primary_since": 8321,
>> >> >
>> >> > "last_scrub": "35328'6440139",
>> >> >
>> >> > "last_scrub_stamp":
"2020-08-19T12:00:59.377593+0430",
>> >> >
>> >> > "last_deep_scrub": "35261'6031075",
>> >> >
>> >> > "last_deep_scrub_stamp":
"2020-08-17T01:59:26.606037+0430",
>> >> >
>> >> > "last_clean_scrub_stamp":
"2020-08-19T12:00:59.377593+0430",
>> >> >
>> >> > "prior_readable_until_ub": 0
>> >> >
>> >> > },
>> >> >
>> >> > "stats": {
>> >> >
>> >> > "version": "35344'6456082",
>> >> >
>> >> > "reported_seq": "11733156",
>> >> >
>> >> > "reported_epoch": "35344",
>> >> >
>> >> > "state": "active+clean",
>> >> >
>> >> > "last_fresh":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_change":
"2020-08-19T12:00:59.377747+0430",
>> >> >
>> >> > "last_active":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_peered":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_clean":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_became_active":
"2020-08-06T00:23:51.016769+0430",
>> >> >
>> >> > "last_became_peered":
"2020-08-06T00:23:51.016769+0430",
>> >> >
>> >> > "last_unstale":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_undegraded":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "last_fullsized":
"2020-08-19T14:16:18.587435+0430",
>> >> >
>> >> > "mapping_epoch": 8347,
>> >> >
>> >> > "log_start": "35344'6453182",
>> >> >
>> >> > "ondisk_log_start": "35344'6453182",
>> >> >
>> >> > "created": 146,
>> >> >
>> >> > "last_epoch_clean": 25208,
>> >> >
>> >> > "parent": "0.0",
>> >> >
>> >> > "parent_split_bits": 7,
>> >> >
>> >> > "last_scrub": "35328'6440139",
>> >> >
>> >> > "last_scrub_stamp":
"2020-08-19T12:00:59.377593+0430",
>> >> >
>> >> > "last_deep_scrub": "35261'6031075",
>> >> >
>> >> > "last_deep_scrub_stamp":
"2020-08-17T01:59:26.606037+0430",
>> >> >
>> >> > "last_clean_scrub_stamp":
"2020-08-19T12:00:59.377593+0430",
>> >> >
>> >> > "log_size": 2900,
>> >> >
>> >> > "ondisk_log_size": 2900,
>> >> >
>> >> > "stats_invalid": false,
>> >> >
>> >> > "dirty_stats_invalid": false,
>> >> >
>> >> > "omap_stats_invalid": false,
>> >> >
>> >> > "hitset_stats_invalid": false,
>> >> >
>> >> > "hitset_bytes_stats_invalid": false,
>> >> >
>> >> > "pin_stats_invalid": false,
>> >> >
>> >> > "manifest_stats_invalid": false,
>> >> >
>> >> > "snaptrimq_len": 0,
>> >> >
>> >> > "stat_sum": {
>> >> >
>> >> > "num_bytes": 19749578960,
>> >> >
>> >> > "num_objects": 2442,
>> >> >
>> >> > "num_object_clones": 20,
>> >> >
>> >> > "num_object_copies": 7326,
>> >> >
>> >> > "num_objects_missing_on_primary": 0,
>> >> >
>> >> > "num_objects_missing": 0,
>> >> >
>> >> > "num_objects_degraded": 0,
>> >> >
>> >> > "num_objects_misplaced": 0,
>> >> >
>> >> > "num_objects_unfound": 0,
>> >> >
>> >> > "num_objects_dirty": 2442,
>> >> >
>> >> > "num_whiteouts": 0,
>> >> >
>> >> > "num_read": 16120686,
>> >> >
>> >> > "num_read_kb": 82264126,
>> >> >
>> >> > "num_write": 19731882,
>> >> >
>> >> > "num_write_kb": 379030181,
>> >> >
>> >> > "num_scrub_errors": 0,
>> >> >
>> >> > "num_shallow_scrub_errors": 0,
>> >> >
>> >> > "num_deep_scrub_errors": 0,
>> >> >
>> >> > "num_objects_recovered": 2861,
>> >> >
>> >> > "num_bytes_recovered": 21673259070,
>> >> >
>> >> > "num_keys_recovered": 32,
>> >> >
>> >> > "num_objects_omap": 2,
>> >> >
>> >> > "num_objects_hit_set_archive": 0,
>> >> >
>> >> > "num_bytes_hit_set_archive": 0,
>> >> >
>> >> > "num_flush": 0,
>> >> >
>> >> > "num_flush_kb": 0,
>> >> >
>> >> > "num_evict": 0,
>> >> >
>> >> > "num_evict_kb": 0,
>> >> >
>> >> > "num_promote": 0,
>> >> >
>> >> > "num_flush_mode_high": 0,
>> >> >
>> >> > "num_flush_mode_low": 0,
>> >> >
>> >> > "num_evict_mode_some": 0,
>> >> >
>> >> > "num_evict_mode_full": 0,
>> >> >
>> >> > "num_objects_pinned": 0,
>> >> >
>> >> > "num_legacy_snapsets": 0,
>> >> >
>> >> > "num_large_omap_objects": 0,
>> >> >
>> >> > "num_objects_manifest": 0,
>> >> >
>> >> > "num_omap_bytes": 152,
>> >> >
>> >> > "num_omap_keys": 16,
>> >> >
>> >> > "num_objects_repaired": 0
>> >> >
>> >> > },
>> >> >
>> >> > "up": [
>> >> >
>> >> > 40,
>> >> >
>> >> > 35,
>> >> >
>> >> > 34
>> >> >
>> >> > ],
>> >> >
>> >> > "acting": [
>> >> >
>> >> > 40,
>> >> >
>> >> > 35,
>> >> >
>> >> > 34
>> >> >
>> >> > ],
>> >> >
>> >> > "avail_no_missing": [],
>> >> >
>> >> > "object_location_counts": [],
>> >> >
>> >> > "blocked_by": [],
>> >> >
>> >> > "up_primary": 40,
>> >> >
>> >> > "acting_primary": 40,
>> >> >
>> >> > "purged_snaps": []
>> >> >
>> >> > },
>> >> >
>> >> > "empty": 0,
>> >> >
>> >> > "dne": 0,
>> >> >
>> >> > "incomplete": 0,
>> >> >
>> >> > "last_epoch_started": 25208,
>> >> >
>> >> > "hit_set_history": {
>> >> >
>> >> > "current_last_update": "0'0",
>> >> >
>> >> > "history": []
>> >> >
>> >> > }
>> >> >
>> >> > }
>> >> >
>> >> >
>> >> > *pg 2.39 info on osd which import to it:
>> >> >
>> >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/*ceph-37*
 --op
 >> info
>> >> > --pgid 2.39
>> >> >
>> >> > PG '2.39' not found
>> >> >
>> >> >
>> >> > 2-3) 1 pg (2.79) is lost! This pg is not found on any of three
 failed
 >> >> osds
>> >> > (osd.34 osd.35 osd.40)! status is “unknown”. pg 2.79 export is
>> failed: "
>> >> >  PG '2.79' not found"
>> >> >
>> >> >
>> >> >
>> >> > # ceph pg map 2.79
>> >> >
>> >> > Error ENOENT: i don't have pgid 2.79
>> >> >
>> >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-34 --op
>> info
>> >> > --pgid 2.79
>> >> >
>> >> > PG '2.79' not found
>> >> >
>> >> >
>> >> > 3- Using 
https://gitlab.lbader.de/kryptur/ceph-recovery/tree/master
 >> but
>> >> it
>> >> > does not work for recent ceph versions and tested on “hammer”
>> release.
>> >> >
>> >> > 4- Using
>> https://ceph.io/planet/recovering-from-a-complete-node-failure/
>> >> > but in lvm scenario I could not mount failed osd lv to new
>> >> > /var/lib/ceph/osd/ceph-x* .*Could not prepare and activate new
 osd to
 >> >> > failed osd disk.
>> >> >
>> >> > 5- Setting pool min_size=1 that down pgs belong to it, restart
 osds
 >> that
>> >> > pgs import to them but no changes.
>> >> >
>> >> > 6- Seting pool min_size=1 that pg 2.39 belong to it, restart osds
>> that pg
>> >> > import to them but no changes.
>> >> >
>> >> > 7- Repairing failed osds using ceph-objectstore-tools, making “in”
>> and
>> >> > starting them but no changes.
>> >> >
>> >> > # ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-x --op
>> repair
>> >> >
>> >> >
>> >> > 8- Repairing 2 unknown pgs, but no changes.
>> >> >
>> >> > # ceph pg repaire 2.39
>> >> >
>> >> > # ceph pg repair 2.79
>> >> >
>> >> > 9- Forcing recovery 2 unknown pgs, but no changes.
>> >> >
>> >> > # ceph pg force-recovery 2.39
>> >> >
>> >> > # ceph pg force-recovery 2.79
>> >> >
>> >> > 10- Check PID count in ceph-osd nodes because of osd services
 failed
 >> to
>> >> > start.
>> >> >
>> >> > kernel.pid.max = 4194304
>> >> >
>> >> > 11- Raising osd_op_thread_suicide_timeout=900, but no change.
>> >> > _______________________________________________
>> >> > ceph-users mailing list -- ceph-users(a)ceph.io
>> >> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list -- ceph-users(a)ceph.io
>> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> >>
>>
>>
>>
>> 

2024

2023

2022

2021

2020

2019

[ceph-users] Re: Recover pgs from failed osds