I'd like to say that it was something smart but it was a bit of luck.
I logged in on a hypervisor (we run OSDs and OpenStack hypervisors on the
same hosts) to deal with another issue, and while checking the system I
noticed that one of the OSDs was using a lot more CPU than the others. It
made me think that the increased IOPS could put a strain on some of the
OSDs without impacting the whole cluster so I decided to increate pg_num to
spread the operations to more OSDs, and it did the trick. The qlen metric
went back to something similar to what we had before the problems started.
We're going to look into adding CPU/RAM monitoring for all the OSDs next.
Gauvain
On Fri, Dec 22, 2023 at 2:58 PM Drew Weaver <drew.weaver(a)thenap.com> wrote:
> Can you say how you determined that this was a problem?
>
> -----Original Message-----
> From: Gauvain Pocentek <gauvainpocentek(a)gmail.com>
> Sent: Friday, December 22, 2023 8:09 AM
> To: ceph-users(a)ceph.io
> Subject: [ceph-users] Re: RGW requests piling up
>
> Hi again,
>
> It turns out that our rados cluster wasn't that happy, the rgw index pool
> wasn't able to handle the load. Scaling the PG number helped (256 to 512),
> and the RGW is back to a normal behaviour.
>
> There is still a huge number of read IOPS on the index, and we'll try to
> figure out what's happening there.
>
> Gauvain
>
> On Thu, Dec 21, 2023 at 1:40 PM Gauvain Pocentek <
> gauvainpocentek(a)gmail.com>
> wrote:
>
> > Hello Ceph users,
> >
> > We've been having an issue with RGW for a couple days and we would
> > appreciate some help, ideas, or guidance to figure out the issue.
> >
> > We run a multi-site setup which has been working pretty fine so far.
> > We don't actually have data replication enabled yet, only metadata
> > replication. On the master region we've started to see requests piling
> > up in the rgw process, leading to very slow operations and failures
> > all other the place (clients timeout before getting responses from
> > rgw). The workaround for now is to restart the rgw containers regularly.
> >
> > We've made a mistake and forcefully deleted a bucket on a secondary
> > zone, this might be the trigger but we are not sure.
> >
> > Other symptoms include:
> >
> > * Increased memory usage of the RGW processes (we bumped the container
> > limits from 4G to 48G to cater for that)
> > * Lots of read IOPS on the index pool (4 or 5 times more compared to
> > what we were seeing before)
> > * The prometheus ceph_rgw_qlen and ceph_rgw_qactive metrics (number of
> > active requests) seem to show that the number of concurrent requests
> > increases with time, although we don't see more requests coming in on
> > the load-balancer side.
> >
> > The current thought is that the RGW process doesn't close the requests
> > properly, or that some requests just hang. After a restart of the
> > process things look OK but the situation turns bad fairly quickly
> > (after 1 hour we start to see many timeouts).
> >
> > The rados cluster seems completely healthy, it is also used for rbd
> > volumes, and we haven't seen any degradation there.
> >
> > Has anyone experienced that kind of issue? Anything we should be
> > looking at?
> >
> > Thanks for your help!
> >
> > Gauvain
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
Hi ceph-users,
I'm not sure if this mail got send correctly, my colleague seems to not have received it.
Either way. We've managed to replicate this issue from a local http_check test. The ceph-mgr seems to go down with every first visit, and works perfectly fine right after a couple re-visits.
Has anyone else seen this issue before?
Thanks in advance,
- Demian
From: Demian Romeijn <dromeijn(a)tuxis.nl>
To: <ceph-users(a)ceph.io>
Sent: 12/22/2023 2:14 PM
Subject: ceph-dashboard odd behavior when visiting through haproxy
I'm currently trying to setup a ceph-dashboard using the official documentation on how to do so.
I've managed to log-in by just visiting the URL & port, and by visting it through haproxy. However using haproxy to visit the site results in odd behavior.
At my first login, nothing loads on the page and eventually at ~5s it times me out, sending me back to the log-in screen.
After logging back on to the dashboard, everything loads and functions as expected. I can refresh my browser as many times as I want and it still keeps on working.
After some time, usually ~30 minutes or so of inactivity, the problem arises again.
Haproxy tells us the server is down for about ~10 seconds, running a simple HTTP check results in the following aswell: CRITICAL - Socket timeout after 10 seconds.
In the ceph-mgr logs there isn't any special error other than: [dashboard ERROR frontend.error] (https://*redacted*/#/login): Http failure response for https://*redacted*/ui-api/orchestrator/get_name: 401 OK None
It seems as such the ceph dashboard is "overloaded", changing haproxy config (following the official ceph documentation on how to set it up) to do health-checks less often results in the problem happening less often.
Anything I might've overlooked that could sort out the issue?
Hi community,
I am running ceph with block rbd with 6 nodes, erasure code 4+2 with
min_size of pool is 4.
When three osd is down, and an PG is state down, some pools is can't write
data, suppose three osd can't start and pg stuck in down state, how i can
delete or recreate pg to replace down pg or another way to allow pool to
write/read data?
Thanks for the community
*Tran Thanh Phong*
Email: tranphong079(a)gmail.com
Skype: tranphong079
Hi,
I just upgraded from 17.2.6 to 18.2.1 and have some issues with mds.
mds started crashing with
2023-12-27T13:21:30.491+0100 7f717b5886c0 1 mds.f9sn015 Updating MDS
map to version 2689280 from mon.5
2023-12-27T13:21:30.491+0100 7f717b5886c0 1 mds.0.2689276
handle_mds_map i am now mds.0.2689276
2023-12-27T13:21:30.491+0100 7f717b5886c0 1 mds.0.2689276
handle_mds_map state change up:clientreplay --> up:active
2023-12-27T13:21:30.491+0100 7f717b5886c0 1 mds.0.2689276 active_start
2023-12-27T13:21:30.524+0100 7f717b5886c0 1 mds.0.2689276 cluster
recovered.
2023-12-27T13:21:30.551+0100 7f7176d7f6c0 -1
/var/tmp/portage/sys-cluster/ceph-18.2.1-r2/work/ceph-18.2.1/src/mds/Server.cc:
In funct
ion 'CInode* Server::prepare_new_inode(MDRequestRef&, CDir*, inodeno_t,
unsigned int, const file_layout_t*)' thread 7f7176d7f6c0 time
2023-12-27T13:21:30.548697+0100
/var/tmp/portage/sys-cluster/ceph-18.2.1-r2/work/ceph-18.2.1/src/mds/Server.cc:
3441: FAILED ceph_assert(_inode->gid != (unsigned)-1)
and I could not bring it back again. As a workaround I was able to start
mds 17.2.6 and it somehow recovered.
Then I started 18 mds again, which soon after startup finds this corruption:
[
{
"damage_type": "dentry",
"id": 4247331390,
"ino": 1,
"frag": "*",
"dname": "lost+found",
"snap_id": "head",
"path": "/lost+found"
}
]
There are few corrupted files in some other directories ( leftovers from
several releases before I never managed to fix), and if I start mds
scrub there, mds crashes again, maybe because of corrupted lost+found.
If I try to remove lost+found, mds crashes again.
Do you have any hint how to recover from this?
Best regards,
Andrej
--
_____________________________________________________________
prof. dr. Andrej Filipcic, E-mail: Andrej.Filipcic(a)ijs.si
Department of Experimental High Energy Physics - F9
Jozef Stefan Institute, Jamova 39, P.o.Box 3000
SI-1001 Ljubljana, Slovenia
Tel.: +386-1-477-3674 Fax: +386-1-477-3166
-------------------------------------------------------------
We are running rook-ceph deployed as a operator in kubernetes with rook
version 1.10.8 and ceph 17.2.5.
Its working fine but we are seeing frequent OSD daemon crash in 3-4 days
and restarts without any problem also we are seeing flapping osds i.e osd
up down.
Recently daemon crash happened for 2 OSDs at same time on different nodes
with below error in crash info :
-305> 2023-12-17T14:50:14.413+0000 7f53b5f91700 -1 *** Caught signal
(Aborted) **
in thread 7f53b5f91700 thread_name:tp_osd_tp
ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
1: /lib64/libpthread.so.0(+0x12cf0) [0x7f53d93ddcf0]
2: gsignal()
3: abort()
4: /lib64/libc.so.6(+0x21d79) [0x7f53d8025d79]
5: /lib64/libc.so.6(+0x47456) [0x7f53d804b456]
6: (MOSDRepOp::encode_payload(unsigned long)+0x2d0) [0x55acc0f81730]
7: (Message::encode(unsigned long, int, bool)+0x2e) [0x55acc140ec2e]
8: (ProtocolV2::send_message(Message*)+0x25e) [0x55acc16a5aae]
9: (AsyncConnection::send_message(Message*)+0x18e) [0x55acc167dc4e]
10: (OSDService::send_message_osd_cluster(int, Message*, unsigned
int)+0x2bd) [0x55acc0b4b11d]
11: (ReplicatedBackend::issue_op(hobject_t const&, eversion_t const&,
unsigned long, osd_reqid_t, eversion_t, eversion_t, hobject_t,
hobject_t, std::vector<pg_log_entry_t, std::allocator<pg_log_entry_t>
> const&, std::optional<pg_hit_set_history_t>&,
ReplicatedBackend::InProgressOp*, ceph::os::Transaction&)+0x6c8)
[0x55acc0f69368]
12: (ReplicatedBackend::submit_transaction(hobject_t const&,
object_stat_sum_t const&, eversion_t const&,
std::unique_ptr<PGTransaction, std::default_delete<PGTransaction> >&&,
eversion_t const&, eversion_t const&, std::vector<pg_log_entry_t,
std::allocator<pg_log_entry_t> >&&,
std::optional<pg_hit_set_history_t>&, Context*, unsigned long,
osd_reqid_t, boost::intrusive_ptr<OpRequest>)+0x5e7) [0x55acc0f6c907]
13: (PrimaryLogPG::issue_repop(PrimaryLogPG::RepGather*,
PrimaryLogPG::OpContext*)+0x50d) [0x55acc0c92ebd]
14: (PrimaryLogPG::execute_ctx(PrimaryLogPG::OpContext*)+0xd25)
[0x55acc0cf0295]
15: (PrimaryLogPG::do_op(boost::intrusive_ptr<OpRequest>&)+0x288d)
[0x55acc0cf78fd]
16: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x1c0)
[0x55acc0b56900]
17: (ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*,
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x6d)
[0x55acc0e552ad]
18: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x115f) [0x55acc0b69dbf]
19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x435)
[0x55acc12c78c5]
20: (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x55acc12c9fe4]
21: /lib64/libpthread.so.0(+0x81ca) [0x7f53d93d31ca]
22: clone()
It has also has below errors before the crash:
scrub-queue::*remove_from_osd_queue* removing pg[2.4f0] failed. State was:
unregistering
Please help to troubleshoot the issue and fix it
Already posted on ceph tracker but no reply there since 3-4 days
Hi,
how can I increase a files deletion speed? Every files was deleted from
cephfs on my pool, but ceph df still show 50% usage of pool. I know
about delayed deletion (https://docs.ceph.com/en/latest), but is there
some way to little speed up this?
I significantly increase mds_max_purge_ops and mds_max_purge_files, but
this is not helped me.
Thanks for response.
Svoboda Miroslav
Howdy,
I am going to be replacing an old cluster pretty soon and I am looking for a few suggestions.
#1 cephadm or ceph-ansible for management?
#2 Since the whole... CentOS thing... what distro appears to be the most straightforward to use with Ceph? I was going to try and deploy it on Rocky 9.
That is all I have.
Thanks,
-Drew
Hello,
in our cluster we have one node with SSD, which are used, but we cannot see
it in "ceph orch device ls". Everything als looks OK. For better
understanding, the diskname is /dev/sda, it's osd.138:
~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 1 7T 0 disk
~# wipefs /dev/sda
DEVICE OFFSET TYPE UUID LABEL
sda 0x0 ceph_bluestore
~# ceph osd tree
-9 15.42809 host ceph06
138 ssd 6.98630 osd.138 up 1.00000 1.00000
The file ceph-osd.138.log does not look unusal to me.
ceph-volume.log show that the SSD is found by the "lsblk" command of the
volume processing.
It is not possible to add the SSD by
"# ceph orch daemon add osd ceph06:/dev/sda
Error message in this case is a question asking if it is already used, even
if the SSD is fully wiped via "wipefs -a" or by overwriting the entire disk
with the dd command. But It is possible to add it to the cluster by using
the option "--method raw".
Do you have an idea what happened here and how can I debug this behaviour?
Hi all,
I am all new with ceph and I come from gluster.
We have had our eyes on ceph for several years
and as the gluster project seems to slow down we
now think it is time to start look into ceph.
I have manually configured a ceph cluster with ceph fs on debian
bookworm.
What is the difference from installing with cephadm compared to manuall
install,
any benefits that you miss with manual install?
There are also another couple of things that I can not figure out
reading the documentation.
Most of our files are small and from my understanding replication is
then recomended, right?
The plan is to set ceph up like this:
1 x "admin node"
2 x "storage nodes"
The admin node will run mon, mgr and mds.
The storage nodes will run mon, mgr, mds and 8x osd (8 disks).
This works well to setup but I can not get my head around is
how things are replicated over nodes and disks.
In ceth.conf I set the folowing:
osd pool default size = 2
osd pool default min size = 1
So the idea is that we always have 2 copies of the data.
I do not seem to be able to figure out the replication
when things starts to fail.
If the admin node goes down, one of the data nodes will
run the mon, mgr and mds. This will slow things down but
will be fine until we have a new admin node in place again.
(or if there is something I am missing here?)
If just one data node goes down we will still not loose any
data and that is fine until we have a new server.
But what if one data node goes down and one disk of the other
data node breaks, will I loose data then?
Or how many disks can I loose before I loose data?
This is what I can not get my head around, how to think
when disaster strikes, how much hardware can I loose before
I loose data?
Or have I got it all wrong?
Is it a bad idea with just 2 fileservers is more servers required?
The second thing I have a problem with is snapshots.
I manage to create snapshot in root with command:
ceph fs subvolume snapshot create <vol_name> / <snap_name>
But it fails if I try to create a shapshot in any
other directory then in the root.
Second of all if I try to create a snapshot from the
client with:
mkdir /mnt-ceph/.snap/my_snapshot
I get the same error in all directories:
Permission dened.
I have not found any sollution to this,
am I missing something here as well?
Any config missing?
Many thanks for your support!!
Best regrads
Marcus