After confirming that the corruption was limited to a single object, we deleted the object
(first via radosgw-admin, and the via a rados rm), and restarted the new OSD in the set.
The backfill has continued past the point of the original crash, so things are looking
promising.
I'm still concerned about how we ended up in this state, clearly there was a very sad
disk involved, but it's concerning that the object was corrupted across all OSDs in
the set, and also concerning that the user had no idea the file was corrupted (200 HTTP
codes returned).
I'm going dig further into the logs of the original primary to see if there are
anything that looks suspicious around the time of the original write, but for now I'm
glad to have a happy cluster in time for the weekend!
Cheers,
Tom
-----Original Message-----
From: Byrne, Thomas (STFC,RAL,SC) <tom.byrne(a)stfc.ac.uk>
Sent: 10 December 2020 23:34
To: 'ceph-users' <ceph-users(a)ceph.io>
Cc: Dan van der Ster <daniel.vanderster(a)cern.ch>
Subject: [ceph-users] Re: Incomplete PG due to primary OSD crashing during
EC backfill - get_hash_info: Mismatch of total_chunk_size 0
A few more things of note after more poking with the help of Dan vdS.
1) The object that the backfill is crashing on has an mtime of a few minutes
before the original primary died this morning, and a 'rados get' gives an
input/output error. So it looks like a new object that was possibly corrupted by
the dying primary OSD. I can't see any disk I/O error in any of the PGs OSD
logs when trying the 'get', but I do see this error in most of the OSDs logs:
2020-12-10 23:22:31.840 7fc7161e3700 0 osd.4134 pg_epoch: 1162547
pg[11.214s8( v 1162547'714924 (1162114'711864,1162547'714924] local-
lis/les=1162304/1162305 n=133402 ec=1069520/992 lis/c 1162304/1125301
les/c/f 1162305/1125302/257760 1162303/1162304/1162301)
[2147483647,1708,2099,1346,4309,777,5098,4501,4134,217,4643]p1708(1)
r=8 lpr=1162304 pi=[1125301,1162304)/2 luod=0'0 crt=1162547'714924
active mbc={}] get_hash_info: Mismatch of total_chunk_size 0
2020-12-10 23:22:31.840 7fc7161e3700 -1 log_channel(cluster) log [ERR] :
Corruption detected: object 11:28447b4a:::962de230-ed6c-44f2-ab02-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head is missing hash_info
This error was present in OSD logs: 1708, 2099, 1346, 4309, 777, 5098, 4501,
4134, and absent in: 217, 4643 (possibly because they are unused parity
shards?). Checking on one of the FileStore OSDs that returned the error
message, the underlying file is present and the correct size at least.
I'm checking all objects in the PG now for corruption, I'm only 25% through
the 133385 objects in the PG, but that object is the only corrupted one I've
seen so far, so hopefully it is an isolated corruption. If so, I can possibly try
deleting the problematic object and seeing if the backfill can continue.
2) This PG is a mix of FileStore and BlueStore OSDs, all 14.2.9. The original
primary that died (1466) was FileStore. Bluestore: 1708, 4309, 5098, 4501,
4134, 4643, Filestore: 2099, 1346, 777, 217.
PG query for reference:
https://pastebin.com/ZUUH2mQ6
Cheers,
Tom
-----Original Message-----
From: Byrne, Thomas (STFC,RAL,SC) <tom.byrne(a)stfc.ac.uk>
Sent: 10 December 2020 18:40
To: 'ceph-users' <ceph-users(a)ceph.io>
Subject: [ceph-users] Incomplete PG due to primary OSD crashing during
EC backfill - get_hash_info: Mismatch of total_chunk_size 0
Hi all,
Got an odd issue that I'm not sure how to solve on our Nautilus 14.2.9
EC cluster.
The primary OSD of an EC 8+3 PG died this morning with a very sad disk
(thousands of pending sectors). After the down out interval a new 'up'
primary was assigned and the backfill started. Twenty minutes later
the acting primary (not the new 'up' primary) started crashing with a
"get_hash_info:
Mismatch of total_chunk_size 0" error (see
log below)
This crash always happens at the same object, with different acting
primaries, and with a different new 'up' primary. I can't see anything
in the logs that points to a particular OSD being the issue, so I
suspect there is a corrupted object in the PG that is causing issues,
but I'm not sure how to dig into this further. The PG is currently
active (but degraded), but only whilst nobackfill or noout are set
(+turning the new OSD off), and if the flags are unset the backfill
will eventually crash enough OSDs to render the PG incomplete, which
is not ideal. I would appreciate being able to resolve this so I can
go back to letting Ceph deal with down OSDs itself :)
Does anyone have some pointers on how to dig into or resolve this?
Happy to create a tracker ticket and post more logs if this looks like a bug.
Thanks,
Tom
OSD log with debug_osd=20 (preamble cut from subsequent lines in an
attempt to improve readability...):
2020-12-10 15:14:16.130 7fc0a1575700 10 osd.1708 pg_epoch: 1162259
pg[11.214s1( v 1162255'714638 (1162110'711564,1162255'714638] local-
lis/les=1162253/1162254 n=133385 ec=1069520/992 lis/c
1162253/1125301
les/c/f 1162254/1125302/257760
1162252/1162253/1162253)
[2449,1708,2099,1346,4309,777,5098,4501,4134,217,4643]/[2147483647,170
8,2099,1346,4309,777,5098,4501,4134,217,4643]p1708(1)
backfill=[2449(0)]
r=1 lpr=1162253 pi=[1125301,1162253)/3 rops=1 crt=1162255'714638 lcod
1162254'714637 mlcod 1162254'714637
active+undersized+degraded+remapped+backfilling mbc={}]
run_recovery_op: starting RecoveryOp(hoid=11:28447b4a:::962de230-ed6c-
44f2-ab02-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-
05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head v=1162125'713150
missing_on=2449(0)
missing_on_shards=0
recovery_info=ObjectRecoveryInfo(11:28447b4a:::962de230-ed6c-44f2-
ab02
-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%
2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dIApqLNDPp.22:head@1162
125'713150, size: 4194304, copy_subset: [],
clone_subset: {}, snapset:
0=[]:{}) recovery_progress=ObjectRecoveryProgress(first,
data_recovered_to:0, data_complete:false, omap_recovered_to:,
omap_complete:true, error:false) obc refcount=3 state=IDLE
waiting_on_pushes= extent_requested=0,0)
continue_recovery_op: continuing
RecoveryOp(hoid=11:28447b4a:::962de230-ed6c-44f2-ab02-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head v=1162125'713150
missing_on=2449(0)
missing_on_shards=0
recovery_info=ObjectRecoveryInfo(11:28447b4a:::962de230-ed6c-44f2-
ab02
-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head@1162125'713150, size:
4194304, copy_subset: [],
clone_subset: {}, snapset: 0=[]:{})
recovery_progress=ObjectRecoveryProgress(first, data_recovered_to:0,
data_complete:false, omap_recovered_to:, omap_complete:true,
error:false) obc refcount=4 state=IDLE waiting_on_pushes=
extent_requested=0,0)
get_hash_info: Getting attr on 11:28447b4a:::962de230-ed6c-44f2-ab02-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head
get_hash_info: not in cache 11:28447b4a:::962de230-ed6c-44f2-ab02-
788c52ea6a82.3210530112.122__multipart_201%2fin5%2fexp_4-05-
737%2fprocessed%2fspe%2fsqw_187570.nxspe.2~bgZPo_rC64ZXJWKyTfdn4dI
ApqLNDPp.22:head
get_hash_info: found on disk, size 524288
get_hash_info: Mismatch of total_chunk_size 0
2020-12-10 15:14:16.136 7fc0a1575700 -1 /home/jenkins-
build/build/workspace/ceph-
build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST
/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-
14.2.9/src/osd/ECBackend.cc: In function 'void
ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)' thread 7fc0a1575700 time 2020-12-10
15:14:16.132060
/home/jenkins-build/build/workspace/ceph-
build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST
/centos7/MACHINE_SIZE/gigantic/release/14.2.9/rpm/el7/BUILD/ceph-
14.2.9/src/osd/ECBackend.cc: 585: FAILED ceph_assert(op.hinfo)
ceph version 14.2.9 (581f22da52345dba46ee232b73b990f06029a2a0)
nautilus
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x14a) [0x55e1569acf7d]
2: (()+0x4cb145) [0x55e1569ad145]
3: (ECBackend::continue_recovery_op(ECBackend::RecoveryOp&,
RecoveryMessages*)+0x1764) [0x55e156da5834]
4: (ECBackend::run_recovery_op(PGBackend::RecoveryHandle*, int)+0x65b)
[0x55e156da6c6b]
5: (PrimaryLogPG::recover_backfill(unsigned long,
ThreadPool::TPHandle&,
bool*)+0x1491) [0x55e156c26681]
6: (PrimaryLogPG::start_recovery_ops(unsigned long,
ThreadPool::TPHandle&, unsigned long*)+0x114c) [0x55e156c29f7c]
7: (OSD::do_recovery(PG*, unsigned int, unsigned long,
ThreadPool::TPHandle&)+0x2ff) [0x55e156a8b32f]
8: (PGRecovery::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x19) [0x55e156d1aa19]
9: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x90f) [0x55e156aa6b4f]
10: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6)
[0x55e15704b216]
11: (ShardedThreadPool::WorkThreadSharded::entry()+0x10)
[0x55e15704dd30]
12: (()+0x7ea5) [0x7fc0c2d5dea5]
13: (clone()+0x6d) [0x7fc0c1c208dd]
2020-12-10 15:14:16.143 7fc0a1575700 -1 *** Caught signal (Aborted) **
This email and any attachments are intended solely for the use of the
named recipients. If you are not the intended recipient you must not
use, disclose, copy or distribute this email or any of its attachments
and should notify the sender immediately and delete this email from
your system. UK Research and Innovation (UKRI) has taken every
reasonable precaution to minimise risk of this email or any
attachments containing viruses or malware but the recipient should
carry out its own virus and malware checks before opening the
attachments. UKRI does not accept any liability for any losses or
damages which the recipient may sustain due to presence of any
viruses. Opinions, conclusions or other information in this message
and attachments that are not related directly to UKRI business are solely
those of
the author and do not represent the views of UKRI.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
This email and any attachments are intended solely for the use of the named
recipients. If you are not the intended recipient you must not use, disclose,
copy or distribute this email or any of its attachments and should notify the
sender immediately and delete this email from your system. UK Research and
Innovation (UKRI) has taken every reasonable precaution to minimise risk of
this email or any attachments containing viruses or malware but the recipient
should carry out its own virus and malware checks before opening the
attachments. UKRI does not accept any liability for any losses or damages
which the recipient may sustain due to presence of any viruses. Opinions,
conclusions or other information in this message and attachments that are
not related directly to UKRI business are solely those of the author and do not
represent the views of UKRI.
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io
This email and any attachments are intended solely for the use of the named recipients. If
you are not the intended recipient you must not use, disclose, copy or distribute this
email or any of its attachments and should notify the sender immediately and delete this
email from your system. UK Research and Innovation (UKRI) has taken every reasonable
precaution to minimise risk of this email or any attachments containing viruses or malware
but the recipient should carry out its own virus and malware checks before opening the
attachments. UKRI does not accept any liability for any losses or damages which the
recipient may sustain due to presence of any viruses. Opinions, conclusions or other
information in this message and attachments that are not related directly to UKRI business
are solely those of the author and do not represent the views of UKRI.