Hi Dan,
it went unnoticed and is in all log files + rotated. I also wondered about the difference in #auth keys and looked at it. However, we have only 23 auth keys (its a small test cluster). No idea what the 77/78 mean. Maybe including some history?
I went ahead and rebuilt the mon store before I got your e-mail, so no more debugging possible unless some log info might be useful.
I'm more wondering why this is not flagged as a health issue. Is it harmless? What if things degrade even more over time?
In older versions (well, luminous) it seems that it was flagged as an error. It would also be nice to have a command like "ceph mon repair" or "ceph mon resync" instead of having to do a complete manual daemon rebuild.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: Dan van der Ster <dvanders(a)gmail.com>
Sent: 03 January 2023 19:41
To: Frank Schilder
Cc: Eugen Block; ceph-users(a)ceph.io
Subject: Re: [ceph-users] Re: mon scrub error (scrub mismatch)
Hi Frank,
Can you work backwards in the logs to when this first appeared?
The scrub error is showing that mon.0 has 78 auth keys and the other
two have 77. So you'd have query the auth keys of each mon to see if
you get a different response each time (e.g. ceph auth list), and
compare with what you expect.
Cheers, Dan
On Tue, Jan 3, 2023 at 9:29 AM Frank Schilder <frans(a)dtu.dk> wrote:
>
> Hi Eugen,
>
> thanks for your answer. All our mons use rocksdb.
>
> I found some old threads, but they never really explained anything. What irritates me is that this is a silent corruption. If you don't read the logs every day, you will not see it, ceph status reports health ok. That's also why I'm wondering if this is a real issue or not.
>
> It would be great if someone could shed light on (1) how serious this is, (2) why it doesn't trigger a health warning/error and (3) why the affected mon doesn't sync back from the majority right away.
>
> Thanks and best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Eugen Block <eblock(a)nde.ag>
> Sent: 03 January 2023 15:04:34
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] Re: mon scrub error (scrub mismatch)
>
> Hi Frank,
>
> I had this a few years back and ended up recreating the MON with the
> scrub mismatch, so in your case it probably would be mon.0. To test if
> the problem still exists you can trigger a mon scrub manually:
>
> ceph mon scrub
>
> Are all MONs on rocksdb back end in this cluster? I didn't check back
> then if this was the case in our cluster, so I'm just wondering if
> that could be an explanation.
>
> Regards,
> Eugen
>
> Zitat von Frank Schilder <frans(a)dtu.dk>:
>
> > Hi all,
> >
> > we have these messages in our logs daily:
> >
> > 1/3/23 12:20:00 PM[INF]overall HEALTH_OK
> > 1/3/23 12:19:46 PM[ERR] mon.2 ScrubResult(keys
> > {auth=77,config=2,health=11,logm=10} crc
> > {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> > 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> > {auth=78,config=2,health=11,logm=9} crc
> > {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> > 1/3/23 12:19:46 PM[ERR]scrub mismatch
> > 1/3/23 12:19:46 PM[ERR] mon.1 ScrubResult(keys
> > {auth=77,config=2,health=11,logm=10} crc
> > {auth=688385498,config=4279003239,health=3522308637,logm=132403602})
> > 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys
> > {auth=78,config=2,health=11,logm=9} crc
> > {auth=325876668,config=4279003239,health=3522308637,logm=1083913445})
> > 1/3/23 12:19:46 PM[ERR]scrub mismatch
> > 1/3/23 12:17:04 PM[INF]Cluster is now healthy
> > 1/3/23 12:17:04 PM[INF]Health check cleared: MON_CLOCK_SKEW (was:
> > clock skew detected on mon.tceph-02)
> >
> > Cluster is health OK:
> >
> > # ceph status
> > cluster:
> > id: bf1f51f5-b381-4cf7-b3db-88d044c1960c
> > health: HEALTH_OK
> >
> > services:
> > mon: 3 daemons, quorum tceph-01,tceph-02,tceph-03 (age 3M)
> > mgr: tceph-01(active, since 8w), standbys: tceph-03, tceph-02
> > mds: fs:1 {0=tceph-02=up:active} 2 up:standby
> > osd: 9 osds: 9 up (since 3M), 9 in
> >
> > task status:
> >
> > data:
> > pools: 4 pools, 321 pgs
> > objects: 9.94M objects, 336 GiB
> > usage: 1.6 TiB used, 830 GiB / 2.4 TiB avail
> > pgs: 321 active+clean
> >
> > Unfortunately, google wasn't of too much help. Is this scrub error
> > something to worry about?
> >
> > Thanks and best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Hi Felix,
On Thu, Dec 15, 2022 at 8:03 PM Stolte, Felix <f.stolte(a)fz-juelich.de> wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime is about 6h. Until thursday last week, we had exactly the same 17 Files. On thursday at 13:05 a snapshot was created and our active mds crashed once at this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state LOCK_XLOCK || state LOCK_XLOCKDONE)
This crash is the same as detailed in
https://tracker.ceph.com/issues/49132. Fix is under backport to p/q
releases.
>
> 12 Minutes lates the unlink_local error crashes appeared again. This time with a new file. During debugging we noticed a MTU mismatch between MDS (1500) and client (9000) with cephfs kernel mount. The client is also creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I uploaded the mds logs of the first crash along with the information above to https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 1500 on all nodes in the ceph public network on the weekend also.
>
> If you need a debug level 20 log of the ScatterLock for further analysis, i could schedule snapshots at the end of our workdays and increase the debug level 5 Minutes arround snap shot creation.
>
> Regards
> Felix
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> Am 02.12.2022 um 20:08 schrieb Patrick Donnelly <pdonnell(a)redhat.com>:
>
> On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix <f.stolte(a)fz-juelich.de> wrote:
>
> Script is running for ~2 hours and according to the line count in the memo file we are at 40% (cephfs is still online).
>
> We had to modify the script putting a try/catch arround the for loop in line 78 to 87. For some reasons there are some objects (186 at this moment) which throw an UnicodeDecodeError exception during the iteration:
>
> <rados.OmapIterator object at 0x7f9606f8bcf8> Traceback (most recent call last): File "first-damage.py", line 138, in <module> traverse(f, ioctx) File "first-damage.py", line 79, in traverse for (dnk, val) in it: File "rados.pyx", line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line 311, in rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 10-11: invalid continuation byte
>
> Don’t know if this is because of the filesystem still running. We saved the object names in a separate file and i will investigate further tomorrow. We should be able to modify the script to only check for the objects which threw the exception instead of searching through the whole pool again.
>
> That shouldn't be caused by teh fs running. It may be you have some
> file names which have invalid unicode characters?
>
> Regarding the mds logfiles with debug 20:
> We cannot run this debug level for longer than one hour since the logfile size increase is to high for the local storage on the mds servers where logs are stored (don’t have a central logging yet).
>
> Okay.
>
> But if you are just interested in the time frame arround the crash, i could set the debug level to 20, trigger the crash on the weekend and sent you the logs.
>
> The crash is unlikely to point to what causes the corruption. I was
> hoping we could locate an instance of damage while the MDS is running.
>
> Regards Felix
>
>
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> ---------------------------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------------
>
> Am 01.12.2022 um 20:51 schrieb Patrick Donnelly <pdonnell(a)redhat.com>:
>
> On Thu, Dec 1, 2022 at 3:55 AM Stolte, Felix <f.stolte(a)fz-juelich.de> wrote:
>
>
> I set debug_mds=20 in ceph.conf and inserted it on the running daemon via "ceph daemon mds.mon-e2-1 config set debug_mds 20“. I have to check with my superiors, if i am allowed to provide yout the logs though.
>
>
> Suggest using `ceph config set` instead of ceph.conf. It's much easier.
>
> Regarding the tool:
> <pool> is refering to the cephfs_metadata pool? (just want to be sure)
>
>
> Yes.
>
> How long will the runs gonna take? We have 15M Objects in our metadata pool and 330M in data pools
>
>
> Not sure. You can monitor the number of lines generated on the memo
> file to get an idea of objects/s.
>
> You can speed test the tool without bringing the file system by
> **not** using `--remove`.
>
> Regarding the root cause:
> As far as i can tell, all damaged inodes have been only accessed via two samba servers running with ctdb. We are also running nfs gateways on different systems, but there hasn’t been a damaged inode (yet).
>
> Samba Servers running Ubuntu 18.04 with kernel 5.4.0-132 and samba version 4.7.6.
> Cephfs is accessed via kernel mount and
>
> ceph version is 16.2.10 across all nodes
> we have one filesystem and two data pools and using cehpfs snapshots
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
>
>
> --
> Patrick Donnelly, Ph.D.
> He / Him / His
> Principal Software Engineer
> Red Hat, Inc.
> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
--
Cheers,
Venky
-- Resending this mail, it seems ceph-users(a)ceph.io was down for the last
few days.
I see many users recently reporting that they have been struggling with
this Onode::put race condition issue[1] on both the latest Octopus and
pacific.
Igor opened a PR [2] to address this issue, I've been reviewing it for a
while and looks good to me.
I'm hoping this could get some priority from the community.
For those who had been hitting this issue, I would like to share a
workaround that could very likely unblock you:
During the investigation of this issue, I found this race condition always
happens after the bluestore onode cache size becomes 0.
Setting debug_bluestore = 1/30 will allow you to see the cache size after
the crash:
---
2022-10-25T00:47:26.562+0000 7f424f78e700 30
bluestore.MempoolThread(0x564a9dae2a68)
_resize_shards max_shard_onodes: 0 max_shard_buffer: 8388608
---
This is apparently wrong as this means the bluestore metadata cache is
basically disabled,
but it makes much sense to explain why we are hitting the race condition so
easily -- An onode will be trimmed right away after it's unpinned.
Keep going with the investigation, it turns out the culprit for the 0-sized
cache is the leak that happened in bluestore_cache_other mempool
Please refer to the bug tracker [3] which has the detail of the leak issue,
it was already fixed by [4], and the next Pacific point release will have
it.
But it was never backported to Octopus, so if you are hitting the same:
For Octopus, you can manually backport this patch to fix the leak and
prevent the race condition from happening.
For Pacific, you can wait for 16.2.11 (or manually backport the fix as well
if can't wait).
By the way, I'm backporting the fix to ubuntu Octopus and Pacific through
this SRU [5], so it will be landed in ubuntu's package soon.
[1] https://tracker.ceph.com/issues/56382
[2] https://github.com/ceph/ceph/pull/47702
[3] https://tracker.ceph.com/issues/56424
[4] https://github.com/ceph/ceph/pull/46911
[5] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1996010
Cheers,
Dongdong
Hi,
I found some mailing list archive links from my notes to throw "Page not
found" errors, e.g.
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/J4U24YRJEJ…
Looking around in the archive web interface, it appears only some of the
most recent threads are found, everything else says "no email threads
could be found for this month".
Could somebody please look into this?
Regards
Matthias Ferdinand
Hello. I really screwed up my ceph cluster. Hoping to get data off it
so I can rebuild it.
In summary, too many changes too quickly caused the cluster to develop
incomplete pgs. Some PGS were reporting that OSDs were to be probes.
I've created those OSD IDs (empty), however this wouldn't clear
incompletes. Incompletes are part of EC pools. Running 17.2.5.
This is the overall state:
cluster:
id: 49057622-69fc-11ed-b46e-d5acdedaae33
health: HEALTH_WARN
Failed to apply 1 service(s): osd.dashboard-admin-1669078094056
1 hosts fail cephadm check
cephadm background work is paused
Reduced data availability: 28 pgs inactive, 28 pgs incomplete
Degraded data redundancy: 55 pgs undersized
2 slow ops, oldest one blocked for 4449 sec, daemons
[osd.25,osd.50,osd.51] have slow ops.
These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph
pg ls incomplete ]:
2.35 23199 0 0 0 95980273664 0
0 2477 incomplete 10s 2104'46277 28260:686871
[44,4,37,3,40,32]p44 [44,4,37,3,40,32]p44
2023-01-03T03:54:47.821280+0000 2022-12-29T18:53:09.287203+0000
14 queued for deep scrub
2.53 22821 0 0 0 94401175552 0
0 2745 remapped+incomplete 10s 2104'45845 28260:565267
[60,48,52,65,67,7]p60 [60]p60
2023-01-03T10:18:13.388383+0000 2023-01-03T10:18:13.388383+0000
408 queued for scrub
2.9f 22858 0 0 0 94555983872 0
0 2736 remapped+incomplete 10s 2104'45636 28260:759872
[56,59,3,57,5,32]p56 [56]p56
2023-01-03T10:55:49.848693+0000 2023-01-03T10:55:49.848693+0000
376 queued for scrub
2.be 22870 0 0 0 94429110272 0
0 2661 remapped+incomplete 10s 2104'45561 28260:813759
[41,31,37,9,7,69]p41 [41]p41
2023-01-03T14:02:15.790077+0000 2023-01-03T14:02:15.790077+0000
360 queued for scrub
2.e4 22953 0 0 0 94912278528 0
0 2648 remapped+incomplete 20m 2104'46048 28259:732896
[37,46,33,4,48,49]p37 [37]p37
2023-01-02T18:38:46.268723+0000 2022-12-29T18:05:47.431468+0000
18 queued for deep scrub
17.78 20169 0 0 0 84517834400 0
0 2198 remapped+incomplete 10s 3735'53405 28260:1243673
[4,37,2,36,66,0]p4 [41]p41
2023-01-03T14:21:41.563424+0000 2023-01-03T14:21:41.563424+0000
348 queued for scrub
17.d8 20328 0 0 0 85196053130 0
0 1852 remapped+incomplete 10s 3735'54458 28260:1309564
[38,65,61,37,58,39]p38 [53]p53
2023-01-02T18:32:35.371071+0000 2022-12-28T19:08:29.492244+0000
21 queued for deep scrub
At present I'm unable to reliably access my data due to incomplete pages
above. I'll post whatever outputs requested (won't post now as it can be
rather verbose). Is there hope?
Hi,
The Quincy documentation shows that we could set the Prometheus
retention_time within a service specification:
https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#setting-up-pro…
When trying this "ceph orch apply" only shows:
Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument
'retention_time'
It looks like release 17.2.5 does not contain this code yet.
Why is the content of the documentation already online when
https://github.com/ceph/ceph/pull/47943 has not been released yet?
Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de
Tel: 030-405051-43
Fax: 030-405051-19
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
On Wednesday, January 4, 2023 10:35:56 AM EST John Zachary Dover wrote:
> Do you use the header navigation bar on docs.ceph.com? See the attached
> file (sticky_header.png) if you are unsure of what "header navigation bar"
> means. In the attached file, the header navigation bar is indicated by
> means of two large, ugly, red-and-green arrows.
>
> *Cards on the Table*
> The navigation bar is the kind of thing that is sometimes referred to as a
> "sticky header", and it can get in the way of linked-to sections. I would
> like to remove this header bar. If there is community support for the
> header bar, though, I won't remove it.
>
> *What is Zac Complaining About?*
> Follow this procedure to see the behavior that has provoked my complaint:
>
> 1. Go to https://docs.ceph.com/en/quincy/glossary/
> 2. Scroll down to the "Ceph Cluster Map" entry.
> 3. Click the "Cluster Map" link in the line that reads "See Cluster Map".
> 4. Notice that the header navigation bar obscures the headword "Cluster
> Map".
>
> If you have any opinion at all on this matter, voice it. Please.
>
FWIW I am not able to reproduce the problem you are describing. In all cases
the thin blue-green bar appeared above the term with the selected anchor
link.
I tried Firefox (108, Linux), Chromium (107, Linux) and for giggles Firefox on
Android. In all cases things looked fine to me and the selected term was not
hidden by that nav bar. I share because I was surprised by the result given
that others on the list seem to see the problem. But I also don't see what I
would describe as "two large, ugly, red-and-green arrows." Perhaps the page
is rendering differently for some people and we don't hit the issue in that
case?
PS. I also didn't see the png file in question. Perhaps this list strips
attachments?