Dear All,
I'm trying to recover failed MDS metadata by following the link below but
having troubles. Thanks in advance.
Question1: how to scan 2 data pools with scan_extents (cmd 1). The cmd
didn't work with two pools specified. Should I scan one then the other?
Question2: As to scan_inodes (cmd 2), should I only specify the first data
pool per the document. I'm concerned if the 2nd pool is not scanned then
that'll cause metadata loss.
*my fs name: cephfs, data pools: cephfs_hdd, cephfs_ssd*
cmd 1: cephfs-data-scan scan_extents --filesystem cephfs cephfs_hdd
cephfs_ssd
cmd 2: cephfs-data-scan scan_inodes --filesystem cephfs cephfs_hdd
cephfs-data-scan scan_extents [<data pool> [<extra data pool>
...]]cephfs-data-scan scan_inodes [<data pool>]cephfs-data-scan
scan_links
Note, the data pool parameters for ‘scan_extents’, ‘scan_inodes’ and
‘cleanup’ commands are optional, and usually the tool will be able to
detect the pools automatically. Still you may override this. The
‘scan_extents’ command needs all data pools to be specified,* while
‘scan_inodes’ and ‘cleanup’ commands need only the main data pool.*
*https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/
<https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/>*
--
Best Regards,
*Justin Li*
IT Support/Systems Administrator
*Justin.Li2030(a)Gmail.com <Justin.Li2030(a)Gmail.com>*
<http://www.linkedin.com/in/justinli7>
Hi,
lately, we have had some issues with our MDSs (Ceph version 16.2.10
Pacific).
Part of them are related to MDS being behind on trimming.
I checked the documentation and found the following information (
https://docs.ceph.com/en/pacific/cephfs/health-messages/):
> CephFS maintains a metadata journal that is divided into *log segments*.
The length of journal (in number of segments) is controlled by the setting
mds_log_max_segments, and when the number of segments exceeds that setting
the MDS starts writing back metadata so that it can remove (trim) the
oldest segments. If this writeback is happening too slowly, or a software
bug is preventing trimming, then this health message may appear. The
threshold for this message to appear is controlled by the config option
mds_log_warn_factor, the default is 2.0.
Some resources on the web (https://www.suse.com/support/kb/doc/?id=000019740)
indicated that a solution would be to change the `mds_log_max_segments`.
Which I did:
```
ceph --cluster floki tell mds.* injectargs '--mds_log_max_segments=400000'
```
Of course, the warning disappeared, but I have a feeling that I just hid
the problem. Pushing a value to 400'000 when the default value is 512 is a
lot.
Why is the trimming not taking place? How can I troubleshoot this further?
Best,
Emmanuel
Hi all,
I have an annoying problem with a specific ceph fs client. We have a file server on which we re-export kernel mounts via samba (all mounts with noshare option). On one of these re-exports we have recurring problems. Today I caught it with
2023-05-10T13:39:50.963685+0200 mds.ceph-23 (mds.1) 1761 : cluster [WRN] client.205899841 isn't responding to mclientcaps(revoke), ino 0x20011d3e5cb pending pAsLsXsFscr issued pAsLsXsFscr, sent 61.705410 seconds ago
and I wanted to look up what path the inode 0x20011d3e5cb points to. Unfortunately, the command
ceph tell "mds.*" dump inode 0x20011d3e5cb
crashes an MDS in a way that it restarts itself, but doesn't seem to come back clean (it does not fail over to a stand-by). If I repeat the command above, it crashes the MDS again. Execution on other MDS daemons succeeds, for example:
# ceph tell "mds.ceph-24" dump inode 0x20011d3e5cb
2023-05-10T14:14:37.091+0200 7fa47ffff700 0 client.210149523 ms_handle_reset on v2:192.168.32.88:6800/3216233914
2023-05-10T14:14:37.124+0200 7fa4857fa700 0 client.210374440 ms_handle_reset on v2:192.168.32.88:6800/3216233914
dump inode failed, wrong inode number or the inode is not cached
The caps recall gets the client evicted at some point but it doesn't manage to come back clean. On a single ceph fs mount point I see this
# ls /shares/samba/rit-oil
ls: cannot access '/shares/samba/rit-oil': Stale file handle
All other mount points are fine, just this one acts up. A "mount -o remount /shares/samba/rit-oil" crashed the entire server and I had to do a cold reboot. On reboot I see this message: https://imgur.com/a/bOSLxBb , which only occurs on this one file server (we are running a few of those). Does this point to a more serious problem, like a file system corruption? Should I try an fs scrub on the corresponding path?
Some info about the system:
The file server's kernel version is quite recent, updated two weeks ago:
$ uname -r
4.18.0-486.el8.x86_64
# cat /etc/redhat-release
CentOS Stream release 8
Our ceph cluster is octopus latest and we use the packages from the octopus el8 repo on this server.
We have several such shares and they all work fine. It is only on one share where we have persistent problems with the mount point hanging or the server freezing and crashing.
After working hours I will try a proper fail of the "broken" MDS to see if I can execute the dump inode command without it crashing everything.
In the mean time, any hints would be appreciated. I see that we have an exceptionally large MDS log for the problematic one. Any hint what to look for would be appreciated, it contains a lot from the recovery operations:
# pdsh -w ceph-[08-17,23-24] ls -lh "/var/log/ceph/ceph-mds.ceph-??.log"
ceph-23: -rw-r--r--. 1 ceph ceph 15M May 10 14:28 /var/log/ceph/ceph-mds.ceph-23.log *** huge ***
ceph-24: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 /var/log/ceph/ceph-mds.ceph-24.log
ceph-10: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 /var/log/ceph/ceph-mds.ceph-10.log
ceph-13: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 /var/log/ceph/ceph-mds.ceph-13.log
ceph-08: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 /var/log/ceph/ceph-mds.ceph-08.log
ceph-15: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 /var/log/ceph/ceph-mds.ceph-15.log
ceph-17: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 /var/log/ceph/ceph-mds.ceph-17.log
ceph-14: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 /var/log/ceph/ceph-mds.ceph-14.log
ceph-09: -rw-r--r--. 1 ceph ceph 16K May 10 14:28 /var/log/ceph/ceph-mds.ceph-09.log
ceph-16: -rw-r--r--. 1 ceph ceph 15K May 10 14:28 /var/log/ceph/ceph-mds.ceph-16.log
ceph-11: -rw-r--r--. 1 ceph ceph 14K May 10 14:28 /var/log/ceph/ceph-mds.ceph-11.log
ceph-12: -rw-r--r--. 1 ceph ceph 394 May 10 14:02 /var/log/ceph/ceph-mds.ceph-12.log
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Hello,
I have a Qunicy (17.2.6) cluster, looking to create a multi-zone /
multi-region RGW service and have a few questions with respect to published
docs - https://docs.ceph.com/en/quincy/radosgw/multisite/.
In general, I understand the process as:
1. Create a new REALM, ZONEGROUP, ZONE:
radosgw-admin realm create --rgw-realm=my_new_realm [--default]
radosgw-admin zonegroup create --rgw-zonegroup=my_country
--endpoints=http://rgw1:80 --rgw-realm=my_new_realm --master –default
radosgw-admin zone create --rgw-zonegroup=my_country --rgw-zone=my-region *\*
--master --default *\*
--endpoints={http://fqdn}[,{http://fqdn}]
## Question:
If I have multiple RGWs deployed on my cluster, do I specify all of
them as individual endpoints? OR specifying one rgw automatically
propagates config throughout all?
2. Create SYSTEM user
radosgw-admin user create --uid="synchronization-user"
--display-name="Synchronization User" --system
radosgw-admin zone modify --rgw-zone={zone-name} --access-key={access-key}
--secret={secret}
radosgw-admin period update --commit
## Question:
The SYSTEM user is used only for replication? Will creating new
REALM, ZONGROUP, ZONE reset any administrative access to management of
RGWs through ceph-dashboard?
3. Remove DEFAULT REALM, ZONEGROUP, ZONE and supporting pools
radosgw-admin zonegroup delete --rgw-zonegroup=default --rgw-zone=default
radosgw-admin period update --commit
radosgw-admin zone delete --rgw-zone=default
radosgw-admin period update --commit
radosgw-admin zonegroup delete --rgw-zonegroup=default
radosgw-admin period update --commit
ceph osd pool rm default.rgw.control default.rgw.control
--yes-i-really-really-mean-it
ceph osd pool rm default.rgw.data.root default.rgw.data.root
--yes-i-really-really-mean-it
ceph osd pool rm default.rgw.gc default.rgw.gc --yes-i-really-really-mean-it
ceph osd pool rm default.rgw.log default.rgw.log --yes-i-really-really-mean-it
ceph osd pool rm default.rgw.users.uid default.rgw.users.uid
--yes-i-really-really-mean-it
4. UPDATING CEPH CONFIG FILE / RGW CONFIG VIA CEPH ORCH
# QUESTION:
Since I’m using ceph orch would I simply set rgw_zone property via
CLUSTER -> CONFIGURATION on ceph-dashboard?
Thank you.
Hi all,
on an NFS re-export of a ceph-fs (kernel client) I observe a very strange error. I'm un-taring a larger package (1.2G) and after some time I get these errors:
ln: failed to create hard link 'file name': Read-only file system
The strange thing is that this seems only temporary. When I used "ln src dst" for manual testing, the command failed as above. However, after that I tried "ln -v src dst" and this command created the hard link with exactly the same path arguments. During the period when the error occurs, I can't see any FS in read-only mode, neither on the NFS client nor the NFS server. Funny thing is that file creation and write still works, its only the hard-link creation that fails.
For details, the set-up is:
file-server: mount ceph-fs at /shares/path, export /shares/path as nfs4 to other server
other server: mount /shares/path as NFS
More precisely, on the file-server:
fstab: MON-IPs:/shares/folder /shares/nfs/folder ceph defaults,noshare,name=NAME,secretfile=sec.file,mds_namespace=FS-NAME,_netdev 0 0
exports: /shares/nfs/folder -no_root_squash,rw,async,mountpoint,no_subtree_check DEST-IP
On the host at DEST-IP:
fstab: FILE-SERVER-IP:/shares/nfs/folder /mnt/folder nfs defaults,_netdev 0 0
Both, the file server and the client server are virtual machines. The file server is on Centos 8 stream (4.18.0-338.el8.x86_64) and the client machine is on AlmaLinux 8 (4.18.0-425.13.1.el8_7.x86_64).
When I change the NFS export from "async" to "sync" everything works. However, that's a rather bad workaround and not a solution. Although this looks like an NFS issue, I'm afraid it is a problem with hard links and ceph-fs. It looks like a race with scheduling and executing operations on the ceph-fs kernel mount.
Has anyone seen something like that?
Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
Dear Ceph community,
on our way towards getting our cluster to a current Ceph release, we updated all hosts and clients to Nautilus 14.2.22. But despite setting `ceph osd set-require-min-compat-client nautilus`, the release reported by `ceph features` is still "luminous".
Is this supposed to be like this? If not, does anyone have an idea what might be missing to make the features being reported as "nautilus" as well?
```
~ # ceph mon dump
epoch 66
fsid b67bad36-3273-11e3-a2ed-0200000311bf
last_changed 2022-12-12 12:20:39.244333
created 2013-10-11 14:57:32.291514
min_mon_release 14 (nautilus)
0: [v2:172.20.4.10:3300/0,v1:172.20.4.10:6789/0] mon.host1
1: [v2:172.20.4.100:3300/0,v1:172.20.4.100:6789/0] mon.host2
2: [v2:172.20.4.101:3300/0,v1:172.20.4.101:6789/0] mon.host3
dumped monmap epoch 66
~ # ceph features
{
"mon": [
{
"features": "0x3ffddff8ffecffff",
"release": "luminous",
"num": 3
}
],
"osd": [
{
"features": "0x3ffddff8ffecffff",
"release": "luminous",
"num": 14
}
],
"client": [
{
"features": "0x3ffddff8ffecffff",
"release": "luminous",
"num": 137
}
],
"mgr": [
{
"features": "0x3ffddff8ffecffff",
"release": "luminous",
"num": 3
}
]
}
```
All the best
--
Oliver Schmidt · os(a)flyingcircus.io · Systems Engineer
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
Sorry Patrick, last email was restricted as attachment size. I attached a link for you to download the log. Thanks.
https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?us…
Justin Li
Senior Technical Officer
School of Information Technology
Faculty of Science, Engineering and Built Environment
For ICT Support please see https://www.deakin.edu.au/sebeicthelp
Deakin University
Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC 3125
+61 3 9246 8932
Justin.li(a)deakin.edu.au
http://www.deakin.edu.au/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
-----Original Message-----
From: Justin Li
Sent: Wednesday, May 24, 2023 8:21 AM
To: Patrick Donnelly <pdonnell(a)redhat.com>
Cc: ceph-users(a)ceph.io
Subject: RE: [ceph-users] [Help appreciated] ceph mds damaged
Hi Patrick,
I attached two logs here. Those two servers are one of the monitors and MDSs. Let me know if you need more logs. Thanks.
Justin Li
Senior Technical Officer
School of Information Technology
Faculty of Science, Engineering and Built Environment For ICT Support please see https://www.deakin.edu.au/sebeicthelp
Deakin University
Melbourne Burwood Campus, 221 Burwood Highway, Burwood, VIC 3125
+61 3 9246 8932
Justin.li(a)deakin.edu.au
http://www.deakin.edu.au/
Deakin University CRICOS Provider Code 00113B
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
-----Original Message-----
From: Patrick Donnelly <pdonnell(a)redhat.com>
Sent: Wednesday, May 24, 2023 7:35 AM
To: Justin Li <justin.li(a)deakin.edu.au>
Cc: ceph-users(a)ceph.io
Subject: Re: [ceph-users] [Help appreciated] ceph mds damaged
Hello Justin,
On Tue, May 23, 2023 at 4:55 PM Justin Li <justin.li(a)deakin.edu.au> wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get back on. Checked the MDS log and found below. See cluster info from below as well. Appreciate it if anyone can point me to the right direction. Thanks.
>
>
> MDS log:
>
> 2023-05-24T06:21:36.831+1000 7efe56e7d700 1 mds.0.cache.den(0x600
> 1005480d3b2) loaded already corrupt dentry: [dentry
> #0x100/stray0/1005480d3b2 [19ce,head] rep(a)0,-2.0<mailto:rep@0,-2.0>
> NULL (dversion lock) pv=0 v=2154265030 ino=(nil) state=0
> 0x556433addb80]
>
> -5> 2023-05-24T06:21:36.831+1000 7efe56e7d700 -1 mds.0.damage
> notify_dentry Damage to dentries in fragment * of ino 0x600is fatal
> because it is a system directory for this rank
>
> -4> 2023-05-24T06:21:36.831+1000 7efe56e7d700 5 mds.beacon.posco
> set_want_state: up:active -> down:damaged
>
> -3> 2023-05-24T06:21:36.831+1000 7efe56e7d700 5 mds.beacon.posco
> Sending beacon down:damaged seq 5339
>
> -2> 2023-05-24T06:21:36.831+1000 7efe56e7d700 10 monclient:
> _send_mon_message to mon.ceph-3 at v2:10.120.0.146:3300/0
>
> -1> 2023-05-24T06:21:37.659+1000 7efe60690700 5 mds.beacon.posco
> received beacon reply down:damaged seq 5339 rtt 0.827966
>
> 0> 2023-05-24T06:21:37.659+1000 7efe56e7d700 1 mds.posco respawn!
>
>
> Cluster info:
> root@ceph-1:~# ceph -s
> cluster:
> id: e2b93a76-2f97-4b34-8670-727d6ac72a64
> health: HEALTH_ERR
> 1 filesystem is degraded
> 1 filesystem is offline
> 1 mds daemon damaged
>
> services:
> mon: 3 daemons, quorum ceph-1,ceph-2,ceph-3 (age 26h)
> mgr: ceph-3(active, since 15h), standbys: ceph-1, ceph-2
> mds: 0/1 daemons up, 3 standby
> osd: 135 osds: 133 up (since 10h), 133 in (since 2w)
>
> data:
> volumes: 0/1 healthy, 1 recovering; 1 damaged
> pools: 4 pools, 4161 pgs
> objects: 230.30M objects, 276 TiB
> usage: 836 TiB used, 460 TiB / 1.3 PiB avail
> pgs: 4138 active+clean
> 13 active+clean+scrubbing
> 10 active+clean+scrubbing+deep
>
>
>
> root@ceph-1:~# ceph health detail
> HEALTH_ERR 1 filesystem is degraded; 1 filesystem is offline; 1 mds
> daemon damaged [WRN] FS_DEGRADED: 1 filesystem is degraded
> fs cephfs is degraded
> [ERR] MDS_ALL_DOWN: 1 filesystem is offline
> fs cephfs is offline because no MDS is active for it.
> [ERR] MDS_DAMAGE: 1 mds daemon damaged
> fs cephfs mds.0 is damaged
Do you have a complete log you can share? Try:
https://docs.ceph.com/en/quincy/man/8/ceph-post-file/
To get your upgrade to complete, you may set:
ceph config set mds mds_go_bad_corrupt_dentry false
--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone.
Deakin University does not warrant that this email and any attachments are error or virus free.
I recently upgraded to Quincy, and toggled on the BULK flag of a few pools. As a result, my cluster has been spending the last several days shuffling data while growing the pool pg counts. That in turn has resulted in a steadily increasing number of pgs being flagged PG_NOT_DEEP_SCRUBBED. And that has resulted in my getting hundreds of alert emails about pgs not being deep scrubbed, because I get a new email whenever the count changes.
I tried using "ceph health mute PG_NOT_DEEP_SCRUBBED --sticky", but all that did (in terms of the email alerts) was make the emails say "HEALTH_OK" instead of "HEALTH WARN", which is less than helpful.
I haven't found a way to stop the cluster from sending me these alert emails other than turning off email notifications entirely. If there is one, I'd love to know what it is. If not, I feel like there ought to be one, either as part of muting the health warning, or as a separate toggle. Hundreds of emails over what is expected behavior is rather silly.
-----
Edward Huyer
Golisano College of Computing and Information Sciences
Rochester Institute of Technology
Golisano 70-2373
152 Lomb Memorial Drive
Rochester, NY 14623
585-475-6651
erhvks(a)rit.edu<mailto:erhvks@rit.edu>
Obligatory Legalese:
The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information.
Hi all,
we did a major update from Pacific to Quincy (17.2.5) a month ago
without any problems.
Now we have tried a minor update from 17.2.5 to 17.2.6 (ceph orch
upgrade). It stucks at mds upgrade phase. At this point the cluster
tries to scale down mds (ceph fs set max_mds 1). We waited a few hours.
We are running two active mds with 1 standby. No subdir pinning
configured. CephFS data pool: 575 TB
While Upgrading, Rank 1 MDS remains in state stopping. During this state
clients are not able to reconnect. So we paused this upgrade and set
max_mds to 2 back again and fail rank 1. After that, standby becomes active.
In the mds (rank 1 in stopping state) logs we can see: waiting for
strays to migrate
In our second try, we have evicted all clients first without success.
We make daily snapshots on / and rotate them via snapshot scheduler
after one week.
Is there a way to get rid of stray entries without scale down mds or do
we have to wait longer?
We had about the same amount of strays before we did the major upgrade.
So, it is a bit curious.
Current output from ceph perf dump
Rank0:
"num_strays": 417304,
"num_strays_delayed": 3,
"num_strays_enqueuing": 0,
"strays_created": 567879,
"strays_enqueued": 561803,
"strays_reintegrated": 13751,
"strays_migrated": 4,
Rank1:
ceph daemon mds.fdi-cephfs.ceph-service-13.rwdkqs perf dump | grep stray
"num_strays": 172528,
"num_strays_delayed": 0,
"num_strays_enqueuing": 0,
"strays_created": 418365,
"strays_enqueued": 396142,
"strays_reintegrated": 67406,
"strays_migrated": 4,
Any help would be appreciated.
best regards
Henning