Hi Partick,
Thanks for getting back to me. Looks like I found the issue. Its due to the fact that I
had thought I had increased the max_file_size on ceph to 20TB turns out I missed a zero
and set it to 1.89 TB.
I had originally tried to fallocate the space for the 8TB volume which kept erroring. I
then tried DD and DD the entire space needed without errors. What I dont understand is,
what happens to cephFS when you do this.
The files I'm writing into the pre-allocated volume in ceph are still there
"luckily" but I thought that ceph would stop you from writing to cephFS if it
hit the upper limit of max_file_size.
Kind regards,
Kyle
________________________________
From: Patrick Donnelly <pdonnell(a)redhat.com>
Sent: 11 May 2021 03:14
To: Kyle Dean <k.s-dean(a)outlook.com>
Cc: ceph-users(a)ceph.io <ceph-users(a)ceph.io>
Subject: Re: [ceph-users] Write Ops on CephFS Increasing exponentially
Hi Kyle,
On Thu, May 6, 2021 at 7:56 AM Kyle Dean <k.s-dean(a)outlook.com> wrote:
Hi, hoping someone could help me get to the bottom of this particular issue I'm
having.
I have ceph octopus installed using ceph-ansible.
Currently, I have 3 MDS servers running, and one client connected to the active MDS.
I'm currently storing a very large encrypted container on the CephFS file system, 8TB
worth, and I'm writing data into it from the client host.
recently I have noticed a severe impact on performance, and the time take to do
processing on file within the container has increased from 1 minute to 11 minutes.
in the ceph dashboard, when I take a look at the performance tab on the file system page,
the Write Ops are increasing exponentially over time.
At the end of April around the 22nd I had 49 write Ops on the performance page for the
MDS deamons. This is now at 266467 Write Ops and increasing.
Also the client requests have gone from 14 to 67 to 117 and is now at 283
would someone be able to help me make sense of why the performance has decreased and what
is going on with the client requests and write operations.
I suggest you look at the "perf dump" statistics from the MDS (via
ceph tell or admin socket) over a period of time to get an idea what
operations it's performing. It's probable your workload changed
somehow and that is the cause.
--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D