Hi together,
I still search for orphan objects and came across a strange bug:
There is a huge multipart upload happening (around 4TB), and listing the
rados objects in the bucket loops over the multipart upload.
--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
Hi, we're running 15.2.7 and our cluster is warning us about LARGE_OMAP_OBJECTS (1 large omap objects).
Here is what the distribution looks like for the bucket in question, and as you can see all but 3 of the keys reside in shard 2.
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.0 1
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.8 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.9 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.7 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.1 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.4 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.3 1
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.2 262384
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.6 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.5 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.12 0
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.10 1
.dir.5a5c812a-3d31-4d79-87e6-1a17206228ac.18635192.221.11 0
osd_deep_scrub_large_omap_object_key_threshold is set to 200000 by default, hence the warning observed for this bucket.
Dynamic resharding is enabled, and the bucket is not in the process of being resharded.
Versioning not in use for this bucket, so we're not affected by https://tracker.ceph.com/issues/46456.
Can anyone help us understand why all the keys are getting mapped to a singe shard? Is there a bug here, or is this expected behaviour?
Could it be related to the fact that the bucket contains large multipart uploads? (Object names look like this:)
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5900
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5901
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5902
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5903
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5904
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5905
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5906
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5907
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5908
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~0W-YhP3F7qc70Ad8JoBIugKzu225qs2.5909
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7152
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7153
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7154
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7155
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7156
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7157
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7158
_multipart_TOOTHROT/anonymised/TOOTHROT-DISK1-8c59002f-cffd-4f74-a680-147383ab8d78.vhdx.2~2uuwqny_HicO6kx_lPmWEf0zoyvdm_9.7159
Hi,
I caught up with Sage's talk on what to expect in Pacific (
https://www.youtube.com/watch?v=PVtn53MbxTc ) and there was no mention
of ceph-ansible at all.
Is it going to continue to be supported? We use it (and uncontainerised
packages) for all our clusters, so I'd be a bit alarmed if it was going
to go away...
Regards,
Matthew
--
The Wellcome Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.
Hi,
After an unschedule power outage our Ceph (Octopus) cluster reports a
healthy state with: "ceph status". However, when we run "ceph orch status"
the command hangs forever.
Are there other commands that we can run for a more thorough health check
of the cluster?
After looking at:
https://docs.ceph.com/en/octopus/rados/operations/health-checks/
I also run "ceph crash ls-new" but it hangs forever as well.
Any ideas?
Our Ceph cluster is currently used as backend storage for our OpenStack
cluster, and we are also having issues with storage volumes attached to
VMs, but we don't know how to narrow down the root cause.
Any feedback is highly appreciated.
Best regards,
Sebastian
Dear Ceph users,
I am currently constructing a small hyperconverged Proxmox cluster with
ceph as storage. So far I always had 3 nodes, which I directly linked
together via 2 bonded 10G network interfaces for the Ceph storage, so I
never needed any switching devices.
This new cluster has more nodes, so I am considering using a 10G switch
for the storage network. As I have no experience with such a setup, I
wonder if there are any specific issues that I should think of (latency...)?
As the whole cluster should be not too expensive, I am currently
thinking of the following solution:
2* CRS317-1G-16s+RM switches:
https://mikrotik.com/product/crs317_1g_16s_rm#fndtn-testresults
SFP+ Cables like these:
https://www.fs.com/de/products/48883.html
Some network interface for each node with two SFP+ ports, e.g.:
https://ark.intel.com/content/www/de/de/ark/products/39776/intel-ethernet-c…
Connect each port with each switch and configure master/slave
configuration so that the switches are redundant.
What do you think of this setup - or is there any information /
recommendation for an optimized setup of a 10G storage network?
Best Regards,
Hermann
--
hermann(a)qwer.tk
PGP/GPG: 299893C7 (on keyservers)
I’m new to Ceph and have just deployed my first cluster using ceph-ansible and running into some issues I was hoping someone could point me in the right direction.
I have 5 servers to start with. 5 OSD, 3 Monitors, 2 Manager, 4 iSCSI gateway. My intention is to use this environment as iSCSI storage for ESXi.
When I went to add an iSCSI Target, I got an error that popped up in the browser (went away to fast to read it) but after this, now if I click on the “Targets” tab in Dashboard for iSCSI, I just continuously get errors popping up:
500 - Internal Server Error
The server encountered an unexpected condition which prevented it from fulfilling the request.
I’m unable to add any targets now. The log files show this:
2021-05-19T12:58:05.294-0400 7f2db6115700 0 [dashboard ERROR exception] Internal Server Error
Traceback (most recent call last):
File "/usr/share/ceph/mgr/dashboard/services/exception.py", line 46, in dashboard_exception_handler
return handler(*args, **kwargs)
File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__
return self.callable(*self.args, **self.kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 694, in inner
ret = func(*args, **kwargs)
File "/usr/share/ceph/mgr/dashboard/controllers/__init__.py", line 907, in wrapper
return func(*vpath, **params)
File "/usr/share/ceph/mgr/dashboard/controllers/iscsi.py", line 266, in list
IscsiTarget._set_info(target)
File "/usr/share/ceph/mgr/dashboard/controllers/iscsi.py", line 990, in _set_info
raise e
File "/usr/share/ceph/mgr/dashboard/controllers/iscsi.py", line 980, in _set_info
target_iqn)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 531, in func_wrapper
**kwargs)
File "/usr/share/ceph/mgr/dashboard/services/iscsi_client.py", line 254, in get_targetinfo
return request()
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 326, in __call__
data, raw_content, headers)
File "/usr/share/ceph/mgr/dashboard/rest_client.py", line 449, in do_request
resp.content)
dashboard.rest_client.RequestException: iscsi REST API failed request with status code 503
(b'{\n "message": "failed, gateway(s) unavailable:cxcto-c240-j27-01(UNKNOWN'
b' state)"\n}\n')
2021-05-19T12:58:05.294-0400 7f2db6115700 0 [dashboard ERROR request] [10.117.244.166:58270] [GET] [500] [0.398s] [admin] [513.0B] /api/iscsi/target
2021-05-19T12:58:05.294-0400 7f2db6115700 0 [dashboard ERROR request] [b'{"status": "500 Internal Server Error", "detail": "The server encountered an unexpected condition which prevented it from fulfilling the request.", "request_id": "7bc691e8-e814-48c1-a69c-57ea1f3f3dbd"} ‘]
Clearly it’s showing an error indicating that the gateway is “unavailable”.
The output from gwcli shows this:
[root@cxcto-c240-j27-01 ~]# ./gwcli
1 gateway is inaccessible - updates will be disabled
/> ls
o- / ......................................................................................................................... [...]
o- cluster ......................................................................................................... [Clusters: 1]
| o- ceph ............................................................................................................ [HEALTH_OK]
| o- pools .......................................................................................................... [Pools: 4]
| | o- device_health_metrics ................................................. [(x3), Commit: 0.00Y/15911599M (0%), Used: 0.00Y]
| | o- iscsi ............................................................... [(x3), Commit: 0.00Y/15911599M (0%), Used: 262511b]
| | o- rbd ................................................................... [(x3), Commit: 0.00Y/15911599M (0%), Used: 1650b]
| | o- test .................................................................. [(x3), Commit: 0.00Y/15911599M (0%), Used: 0.00Y]
| o- topology .............................................................................................. [OSDs: 110,MONs: 3]
o- disks ....................................................................................................... [0.00Y, Disks: 0]
o- iscsi-targets ............................................................................... [DiscoveryAuth: CHAP, Targets: 1]
o- iqn.2001-07.com.ceph:1621437640904 ................................................................ [Auth: None, Gateways: 1]
o- disks .......................................................................................................... [Disks: 0]
o- gateways ............................................................................................ [Up: 0/1, Portals: 1]
| o- cxcto-c240-j27-01 ....................................................................... [10.122.242.196 (UNAUTHORIZED)]
o- host-groups .................................................................................................. [Groups : 0]
o- hosts ....................................................................................... [Auth: ACL_ENABLED, Hosts: 0]
In this case it says “UNAUTHORIZED”
When I created the target, I had selected all 4 iSCSI gateways, but it looks like something happened during the addition process that has left it in a weird state.
The dashboard seems to think that all the gateways are up and running :
[cid:8CA421EB-DFD1-43A9-87B6-7ADEE5B2DF69]
Notice that it only shows the target on the one node which it is reporting an error.
I’ve tried deleting the target from gwcli but that fails and I’m not really sure where to look next. The “UNAUTHORIZED” in the gwcli output makes me wonder if there is some kind of authorization issue, but I’m not sure what that would be.
This is what I see if I navigate to the Targets tab on the iSCSI page:
[cid:828103A0-C321-41F6-B018-AC8A883F6C7E]
Any thoughts or guidance are greatly appreciated.
-Paul
Hermann,
I think there was a discussion on recommended switches not too long ago.
You should be able to find it in the mailing list archives.
I think the latency of the network is usually very minor compared to ceph's
dependency on cpu and disk latency, so for a simple cluster I wouldn't
worry about it too much.
I have found fs.com's dac cables to get stuck a lot, so I don't use them
anymore. I usually buy dell or mellanox cables.
Regarding network cards I've found the intel cards to be not that great due
to bugs with lacp bonds, embedded lldp getting in the way and other issues.
So I'm using mellanox cards instead, but broadcom should also work.
hope it helps!
best regards,
Max
On Wed, May 19, 2021 at 1:48 PM <ceph-users-request(a)ceph.io> wrote:
> ---------- Forwarded message ----------
> From: Hermann Himmelbauer <hermann(a)qwer.tk>
> To: ceph-users(a)ceph.com
> Cc:
> Bcc:
> Date: Wed, 19 May 2021 11:22:26 +0200
> Subject: [ceph-users] Suitable 10G Switches for ceph storage - any
> recommendations?
> Dear Ceph users,
> I am currently constructing a small hyperconverged Proxmox cluster with
> ceph as storage. So far I always had 3 nodes, which I directly linked
> together via 2 bonded 10G network interfaces for the Ceph storage, so I
> never needed any switching devices.
>
> This new cluster has more nodes, so I am considering using a 10G switch
> for the storage network. As I have no experience with such a setup, I
> wonder if there are any specific issues that I should think of
> (latency...)?
>
> As the whole cluster should be not too expensive, I am currently
> thinking of the following solution:
>
> 2* CRS317-1G-16s+RM switches:
> https://mikrotik.com/product/crs317_1g_16s_rm#fndtn-testresults
>
> SFP+ Cables like these:
> https://www.fs.com/de/products/48883.html
>
> Some network interface for each node with two SFP+ ports, e.g.:
>
> https://ark.intel.com/content/www/de/de/ark/products/39776/intel-ethernet-c…
>
> Connect each port with each switch and configure master/slave
> configuration so that the switches are redundant.
>
> What do you think of this setup - or is there any information /
> recommendation for an optimized setup of a 10G storage network?
>
> Best Regards,
> Hermann
>
> --
> hermann(a)qwer.tk
> PGP/GPG: 299893C7 (on keyservers)
>
>
Hi guys,
We are recently testing rbd-nbd using ceph N version. After map rbd
image, mkfs and mount the nbd device, the rbd-nbd and dmesg will show
following errors when doing some read/write testing.
rbd-nbd log:
2021-05-18 11:35:08.034 7efdb8ff9700 20 []rbd-nbd: reader_entry:
waiting for nbd request
...
2021-05-18 11:35:08.066 7efdb8ff9700 -1 []rbd-nbd: failed to read nbd
request header: (33) Numerical argument out of domain
2021-05-18 11:35:08.066 7efdb3fff700 20 []rbd-nbd: writer_entry: no io
requests, terminating
2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
0x564a2be2b3c0 unregister_update_watcher: handle=0
2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
0x564a2be2b4b0 ImageUpdateWatchers::unregister_watcher: handle=0
2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
0x564a2be2b4b0 ImageUpdateWatchers::unregister_watcher: completing
unregister
2021-05-18 11:35:08.066 7efdea8d1a00 10 []rbd-nbd: ~NBDServer: terminating
2021-05-18 11:35:08.066 7efdea8d1a00 20 []librbd::ImageState:
0x564a2be2b3c0 close
dmesg:
[Tue May 18 11:35:07 2021] EXT4-fs (nbd0): mounted filesystem with
ordered data mode. Opts: discard
[Tue May 18 11:35:07 2021] block nbd0: shutting down sockets
[Tue May 18 11:35:09 2021] blk_update_request: I/O error, dev nbd0,
sector 75592 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
client host info:
centos7.x
kernel 5.4.109
It looks like the kernel nbd device shutdown its socket for some
reason, but we haven't figured it out. BTW, we have tried to turn
on/off rbd cache, use different fs ext4/xfs, use ec pool or replicated
pool, but the error remains. It is more frequent for us to reproduce
when batch map, mkfs and mount rbd-nbd on different hosts
simultaneously.
Thanks for any suggestions.
Regards,
Zhi Zhang (David)
Contact: zhang.david2011(a)gmail.com
zhangz.david(a)outlook.com
Hi
In the last couple of weeks we've been getting BlueFS spillover warnings
on multiple (>10) osds, eg
BLUEFS_SPILLOVER BlueFS spillover detected on 1 OSD(s)
osd.327 spilled over 58 MiB metadata from 'db' device (30 GiB used
of 66 GiB) to slow device
I know this can be corrected with a 'ceph tell osd.$osd compact' or
ignored with "bluestore_warn_on_bluefs_spillover=false", but my concern
is that these warnings have only recently started.
Could this be a sign of something nasty heading our way that I'm not
aware of? Is there a performance penalty by just ignoring, rather than
compacting?
Many thanks for any pointers.
Cheers
Toby
--
Toby Darling, Scientific Computing (2N249)
MRC Laboratory of Molecular Biology
Hello,
On my Octopus cluster with 6 nodes (3 mon/mgr, 3 OSD), I would like to re-install the operating system of the first mon/mgr node. For that purpose I tried "ceph host rm mynode" but then I got the following two health warnings:
2 stray daemon(s) not managed by cephadm
1 stray host(s) with 2 daemon(s) not managed by cephadm
So I did not proceed with the re-installation and added the node back.
What would be the correct command in order to do that? I can live with mon/mgr which are do not have the quorum for the time of the re-installation. I just need my CephFS to still be available. Right now there is 2 mon daemons running and 1 active mgr and 2 standby mgrs.
Thank you,
Mabi