Hi again, hopefully for the last time with problems.
We had a MDS crash earlier with the MDS staying in failed state and used a command to reset the filesystem (this was wrong, I know now, thanks Patrick Donnelly for pointing this out). I did a full scrub on the filesystem and two files were damaged. One of those got repaired, but the following file keeps giving errors and can't be removed.
What can I do now? Below some information.
# ceph tell mds.atlassian-prod:0 damage ls
[
{
"damage_type": "backtrace",
"id": 2244444901,
"ino": 1099534008829,
"path": "/app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01"
}
]
Trying to repair the error (online research shows this should work for a backtrace damage type)
----------
# ceph tell mds.atlassian-prod:0 scrub start /app1/shared/data/repositories/11271 recursive,repair,force
{
"return_code": 0,
"scrub_tag": "d10ead42-5280-4224-971e-4f3022e79278",
"mode": "asynchronous"
}
Cluster logs after this
----------
1/2/24 9:37:05 AM
[INF]
scrub summary: idle
1/2/24 9:37:02 AM
[INF]
scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
1/2/24 9:37:01 AM
[INF]
scrub summary: active paths [/app1/shared/data/repositories/11271]
1/2/24 9:37:01 AM
[INF]
scrub summary: idle+waiting paths [/app1/shared/data/repositories/11271]
1/2/24 9:37:01 AM
[INF]
scrub queued for path: /app1/shared/data/repositories/11271
But the error doesn't disappear and still can't remove the file.
On the client trying to remove the file (we got a backup)
----------
$ rm -f /mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01
rm: cannot remove '/mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01': Input/output error
Best regards,
Sake
Hi!
As I'm reading through the documentation about subtree pinning, I was wondering if the following is possible.
We've got the following directory structure.
/
/app1
/app2
/app3
/app4
Can I pin /app1 to MDS rank 0 and 1, the directory /app2 to rank 2 and finally /app3 and /app4 to rank 3?
I would like to load balance the subfolders of /app1 to 2 (or 3) MDS servers.
Best regards,
Sake
Hi all,
I have a problem regarding upgrading Ceph cluster from Pacific to Quincy
version with cephadm. I have successfully upgraded the cluster to the
latest Pacific (16.2.11). But when I run the following command to upgrade
the cluster to 17.2.5, After upgrading 3/4 mgrs, the upgrade process stops
with "Unexpected error". (everything is on a private network)
ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v17.2.5
I also tried the 17.2.4 version.
cephadm fails to check the hosts' status and marks them as offline:
cephadm 2023-04-06T10:19:59.998510+0000 mgr.host9.arhpnd (mgr.4516356) 5782
: cephadm [DBG] host host4 (x.x.x.x) failed check
cephadm 2023-04-06T10:19:59.998553+0000 mgr.host9.arhpnd (mgr.4516356) 5783
: cephadm [DBG] Host "host4" marked as offline. Skipping daemon refresh
cephadm 2023-04-06T10:19:59.998581+0000 mgr.host9.arhpnd (mgr.4516356) 5784
: cephadm [DBG] Host "host4" marked as offline. Skipping gather facts
refresh
cephadm 2023-04-06T10:19:59.998609+0000 mgr.host9.arhpnd (mgr.4516356) 5785
: cephadm [DBG] Host "host4" marked as offline. Skipping network refresh
cephadm 2023-04-06T10:19:59.998633+0000 mgr.host9.arhpnd (mgr.4516356) 5786
: cephadm [DBG] Host "host4" marked as offline. Skipping device refresh
cephadm 2023-04-06T10:19:59.998659+0000 mgr.host9.arhpnd (mgr.4516356) 5787
: cephadm [DBG] Host "host4" marked as offline. Skipping osdspec preview
refresh
cephadm 2023-04-06T10:19:59.998682+0000 mgr.host9.arhpnd (mgr.4516356) 5788
: cephadm [DBG] Host "host4" marked as offline. Skipping autotune
cluster 2023-04-06T10:20:00.000151+0000 mon.host8 (mon.0) 158587 : cluster
[ERR] Health detail: HEALTH_ERR 9 hosts fail cephadm check; Upgrade: failed
due to an unexpected exception
cluster 2023-04-06T10:20:00.000191+0000 mon.host8 (mon.0) 158588 : cluster
[ERR] [WRN] CEPHADM_HOST_CHECK_FAILED: 9 hosts fail cephadm check
cluster 2023-04-06T10:20:00.000202+0000 mon.host8 (mon.0) 158589 : cluster
[ERR] host host7 (x.x.x.x) failed check: Unable to reach remote host
host7. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000213+0000 mon.host8 (mon.0) 158590 : cluster
[ERR] host host2 (x.x.x.x) failed check: Unable to reach remote host
host2. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000220+0000 mon.host8 (mon.0) 158591 : cluster
[ERR] host host8 (x.x.x.x) failed check: Unable to reach remote host
host8. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000228+0000 mon.host8 (mon.0) 158592 : cluster
[ERR] host host4 (x.x.x.x) failed check: Unable to reach remote host
host4. Process exited with non-zero exit status 3
cluster 2023-04-06T10:20:00.000240+0000 mon.host8 (mon.0) 158593 : cluster
[ERR] host host3 (x.x.x.x) failed check: Unable to reach remote host
host3. Process exited with non-zero exit status 3
and here are some outputs of the commands:
[root@host8 ~]# ceph -s
cluster:
id: xxx
health: HEALTH_ERR
9 hosts fail cephadm check
Upgrade: failed due to an unexpected exception
services:
mon: 5 daemons, quorum host8,host1,host7,host2,host9 (age 2w)
mgr: host9.arhpnd(active, since 105m), standbys: host8.jowfih,
host1.warjsr, host2.qyavjj
mds: 1/1 daemons up, 3 standby
osd: 37 osds: 37 up (since 8h), 37 in (since 3w)
data:
io:
client:
progress:
Upgrade to 17.2.5 (0s)
[............................]
[root@host8 ~]# ceph orch upgrade status
{
"target_image": "my-private-repo/quay-io/ceph/ceph@sha256
:34c763383e3323c6bb35f3f2229af9f466518d9db926111277f5e27ed543c427",
"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [],
"progress": "3/59 daemons upgraded",
"message": "Error: UPGRADE_EXCEPTION: Upgrade: failed due to an
unexpected exception",
"is_paused": true
}
[root@host8 ~]# ceph cephadm check-host host7
check-host failed:
Host 'host7' not found. Use 'ceph orch host ls' to see all managed hosts.
[root@host8 ~]# ceph versions
{
"mon": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 5
},
"mgr": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 1,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
},
"osd": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 37
},
"mds": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 4
},
"overall": {
"ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894)
pacific (stable)": 47,
"ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757)
quincy (stable)": 3
}
}
The strange thing is I can rollback the cluster status by failing to
not-upgraded mgr like this:
ceph mgr fail
ceph orch upgrade start my-private-repo/quay-io/ceph/ceph:v16.2.11
Would you happen to have any idea about this?
Best regards,
Reza
Hi Ceph users
We are using Ceph Pacific (16) in this specific deployment.
In our use case we do not want our users to be able to generate signature v4 URLs because they bypass the policies that we set on buckets (e.g IP restrictions).
Currently we have a sidecar reverse proxy running that filters requests with signature URL specific request parameters.
This is obviously not very efficient and we are looking to replace this somehow in the future.
1. Is there an option in RGW to disable this signed URLs (e.g returning status 403)?
2. If not is this planned or would it make sense to add it as a configuration option?
3. Or is the behaviour of not respecting bucket policies in RGW with signature v4 URLs a bug and they should be actually applied?
Thanks you for your help and let me know if you have any questions
Marc Singer
hi folks,
I currently test erasure-code-lrc (1) in a multi-room multi-rack setup.
The idea is to be able to repair a disk-failures within the rack
itself to lower bandwidth-usage
```bash
ceph osd erasure-code-profile set lrc_hdd \
plugin=lrc \
crush-root=default \
crush-locality=rack \
crush-failure-domain=host \
crush-device-class=hdd \
mapping=__DDDDD__DDDDD__DDDDD__DDDDD \
layers='
[
[ "_cDDDDD_cDDDDD_cDDDDD_cDDDDD", "" ],
[ "cDDDDDD_____________________", "" ],
[ "_______cDDDDDD______________", "" ],
[ "______________cDDDDDD_______", "" ],
[ "_____________________cDDDDDD", "" ],
]' \
crush-steps='[
[ "choose", "room", 4 ],
[ "choose", "rack", 1 ],
[ "chooseleaf", "host", 7 ],
]'
```
The roule picks 4 out of 5 rooms and keeps the PG in one rack like expected!
However it looks like the PG will not move to another Room if the PG
is undersized or the entire Room or Rack is down!
Questions:
* do I miss something to allow LRC (PG's) to move across Racks/Rooms for repair?
* Is it even possible to build such a 'Multi-stage' grushmap?
Thanks for your help,
Ansgar
1) https://docs.ceph.com/en/quincy/rados/operations/erasure-code-jerasure/
Just in case anybody is interested: Using dm-cache works and boosts
performance -- at least for my use case.
The "challenge" was to get 100 (identical) Linux-VMs started on a three
node hyperconverged cluster. The hardware is nothing special, each node
has a Supermicro server board with a single CPU with 24 cores and 4 x 4
TB hard disks. And there's that extra 1 TB NVMe...
I know that the general recommendation is to use the NVMe for WAL and
metadata, but this didn't seem appropriate for my use case and I'm still
not quite sure about failure scenarios with this configuration. So
instead I made each drive a logical volume (managed by an OSD) and added
85 GiB NVMe to each LV as read-only cache.
Each VM uses as system disk an RBD based on a snapshot from the master
image. The idea was that with this configuration, all VMs should share
most (actually almost all) of the data on their system disk and this
data should be available from the cache.
Well, it works. When booting the 100 VMs, almost all read operations are
satisfied from the cache. So I get close to NVMe speed but have payed
for conventional hard drives only (well, SSDs aren't that much more
expensive nowadays, but the hardware is 4 years old).
So, nothing sophisticated, but as I couldn't find anything about this
kind of setup, it might be of interest nevertheless.
- Michael
I have logged this as https://tracker.ceph.com/issues/64213
On 16/01/2024 14:18, DERUMIER, Alexandre wrote:
> Hi,
>
>>> ImportError: PyO3 modules may only be initialized once per
>>> interpreter
>>> process
>>>
>>> and ceph -s reports "Module 'dashboard' has failed dependency: PyO3
>>> modules may only be initialized once per interpreter process
> We have the same problem on proxmox8 (based on debian12) with ceph
> quincy or reef.
>
> It seem to be related to python version on debian12
>
> (we have no fix for this currently)
>
>
>