I have encountered an issue on clients hanging on by opening a file.
Besides, any other client
who visited the same file or directory hung as well. The only way to
resolve it is rebooting the
clients server. This happened on kernel client only, Luminous version.
After that I chose fuse client except the client has large performance need.
Derrick Lin <klin938(a)gmail.com> 于2020年6月15日周一 下午3:28写道:
Hi guys,
I tried to mount via kernel driver, it works beautifully. I was surprised,
below is one of the FIO test, which wasn't able to run at all in FUSE
mount:
# /usr/bin/fio --randrepeat=1 --ioengine=libaio --direct=1
--gtod_reduce=1 --name=FIO --filename=fio.test --bs=4M --iodepth=16
--size=50G --readwrite=randrw --rwmixread=75
FIO: (g=0): rw=randrw, bs=(R) 4096KiB-4096KiB, (W) 4096KiB-4096KiB,
(T) 4096KiB-4096KiB, ioengine=libaio, iodepth=16
fio-3.7
Starting 1 process
FIO: Laying out IO file (1 file / 51200MiB)
Jobs: 1 (f=1): [m(1)][100.0%][r=1021MiB/s,w=340MiB/s][r=255,w=85
IOPS][eta 00m:00s]
FIO: (groupid=0, jobs=1): err= 0: pid=131431: Thu Jun 11 17:13:22 2020
read: IOPS=249, BW=999MiB/s (1047MB/s)(37.5GiB/38408msec)
bw ( KiB/s): min=819200, max=1171456, per=100.00%, avg=1023387.46,
stdev=69360.06, samples=76
iops : min= 200, max= 286, avg=249.83, stdev=16.96, samples=76
write: IOPS=83, BW=334MiB/s (351MB/s)(12.5GiB/38408msec)
bw ( KiB/s): min=229376, max=475136, per=99.96%, avg=342204.45,
stdev=40407.55, samples=76
iops : min= 56, max= 116, avg=83.51, stdev= 9.87, samples=76
cpu : usr=1.56%, sys=4.44%, ctx=12050, majf=0, minf=24
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=99.9%, 32=0.0%,
=64=0.0%
submit : 0=0.0%, 4=100.0%,
8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
=64=0.0%
complete : 0=0.0%, 4=100.0%,
8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%,
=64=0.0%
issued rwts:
total=9590,3210,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=16
Run status group 0 (all jobs):
READ: bw=999MiB/s (1047MB/s), 999MiB/s-999MiB/s
(1047MB/s-1047MB/s), io=37.5GiB (40.2GB), run=38408-38408msec
WRITE: bw=334MiB/s (351MB/s), 334MiB/s-334MiB/s (351MB/s-351MB/s),
io=12.5GiB (13.5GB), run=38408-38408msec
On Tue, Jun 9, 2020 at 6:16 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
Hi Derrick,
I am not sure what this 200-300MB/s on hdd is. But it is probably not
really relevant. I am testing native disk performance before I use them
with ceph with this fio script. It is a bit lengthy, that is because I
want to be able to have data for possible future use cases.
Furthermore since I upgraded to Nautilus I have been having issues with
the kernel mount cephfs on osd nodes and had to revert back to fuse.
Even when having 88GB free memory.
https://tracker.ceph.com/issues/45663
https://tracker.ceph.com/issues/44100
[global]
ioengine=libaio
#ioengine=posixaio
invalidate=1
ramp_time=30
iodepth=1
runtime=180
time_based
direct=1
filename=/dev/sdX
#filename=/mnt/cephfs/ssd/fio-bench.img
[write-4k-seq]
stonewall
bs=4k
rw=write
#write_bw_log=sdx-4k-write-seq.results
#write_iops_log=sdx-4k-write-seq.results
[randwrite-4k-seq]
stonewall
bs=4k
rw=randwrite
#write_bw_log=sdx-4k-randwrite-seq.results
#write_iops_log=sdx-4k-randwrite-seq.results
[read-4k-seq]
stonewall
bs=4k
rw=read
#write_bw_log=sdx-4k-read-seq.results
#write_iops_log=sdx-4k-read-seq.results
[randread-4k-seq]
stonewall
bs=4k
rw=randread
#write_bw_log=sdx-4k-randread-seq.results
#write_iops_log=sdx-4k-randread-seq.results
[rw-4k-seq]
stonewall
bs=4k
rw=rw
#write_bw_log=sdx-4k-rw-seq.results
#write_iops_log=sdx-4k-rw-seq.results
[randrw-4k-seq]
stonewall
bs=4k
rw=randrw
#write_bw_log=sdx-4k-randrw-seq.results
#write_iops_log=sdx-4k-randrw-seq.results
[write-128k-seq]
stonewall
bs=128k
rw=write
#write_bw_log=sdx-128k-write-seq.results
#write_iops_log=sdx-128k-write-seq.results
[randwrite-128k-seq]
stonewall
bs=128k
rw=randwrite
#write_bw_log=sdx-128k-randwrite-seq.results
#write_iops_log=sdx-128k-randwrite-seq.results
[read-128k-seq]
stonewall
bs=128k
rw=read
#write_bw_log=sdx-128k-read-seq.results
#write_iops_log=sdx-128k-read-seq.results
[randread-128k-seq]
stonewall
bs=128k
rw=randread
#write_bw_log=sdx-128k-randread-seq.results
#write_iops_log=sdx-128k-randread-seq.results
[rw-128k-seq]
stonewall
bs=128k
rw=rw
#write_bw_log=sdx-128k-rw-seq.results
#write_iops_log=sdx-128k-rw-seq.results
[randrw-128k-seq]
stonewall
bs=128k
rw=randrw
#write_bw_log=sdx-128k-randrw-seq.results
#write_iops_log=sdx-128k-randrw-seq.results
[write-1024k-seq]
stonewall
bs=1024k
rw=write
#write_bw_log=sdx-1024k-write-seq.results
#write_iops_log=sdx-1024k-write-seq.results
[randwrite-1024k-seq]
stonewall
bs=1024k
rw=randwrite
#write_bw_log=sdx-1024k-randwrite-seq.results
#write_iops_log=sdx-1024k-randwrite-seq.results
[read-1024k-seq]
stonewall
bs=1024k
rw=read
#write_bw_log=sdx-1024k-read-seq.results
#write_iops_log=sdx-1024k-read-seq.results
[randread-1024k-seq]
stonewall
bs=1024k
rw=randread
#write_bw_log=sdx-1024k-randread-seq.results
#write_iops_log=sdx-1024k-randread-seq.results
[rw-1024k-seq]
stonewall
bs=1024k
rw=rw
#write_bw_log=sdx-1024k-rw-seq.results
#write_iops_log=sdx-1024k-rw-seq.results
[randrw-1024k-seq]
stonewall
bs=1024k
rw=randrw
#write_bw_log=sdx-1024k-randrw-seq.results
#write_iops_log=sdx-1024k-randrw-seq.results
[write-4096k-seq]
stonewall
bs=4096k
rw=write
#write_bw_log=sdx-4096k-write-seq.results
#write_iops_log=sdx-4096k-write-seq.results
[randwrite-4096k-seq]
stonewall
bs=4096k
rw=randwrite
#write_bw_log=sdx-4096k-randwrite-seq.results
#write_iops_log=sdx-4096k-randwrite-seq.results
[read-4096k-seq]
stonewall
bs=4096k
rw=read
#write_bw_log=sdx-4096k-read-seq.results
#write_iops_log=sdx-4096k-read-seq.results
[randread-4096k-seq]
stonewall
bs=4096k
rw=randread
#write_bw_log=sdx-4096k-randread-seq.results
#write_iops_log=sdx-4096k-randread-seq.results
[rw-4096k-seq]
stonewall
bs=4096k
rw=rw
#write_bw_log=sdx-4096k-rw-seq.results
#write_iops_log=sdx-4096k-rw-seq.results
[randrw-4096k-seq]
stonewall
bs=4096k
rw=randrw
#write_bw_log=sdx-4096k-randrw-seq.results
#write_iops_log=sdx-4096k-randrw-seq.results
-----Original Message-----
From: Derrick Lin [mailto:klin938@gmail.com]
Sent: dinsdag 9 juni 2020 4:12
To: Mark Nelson
Cc: ceph-users(a)ceph.io
Subject: [ceph-users] Re: poor cephFS performance on Nautilus 14.2.9
deployed by ceph_ansible
Thanks Mark & Marc
We will do more testing inc kernel client as well as testing the block
storage performance first.
We just did some direct raw performance test on a single spinning disk
(format as ext4) and it could delivery 200-300MB/s throughput in various
writing and mix testings. But FUSE client could only give ~50MB/s.
Cheers,
D
On Thu, Jun 4, 2020 at 1:27 PM Mark Nelson <mnelson(a)redhat.com> wrote:
Try using the kernel client instead of the FUSE
client. The FUSE
client is known to be slow for a variety of reasons and I suspect you
may see faster performance with the kernel client.
Thanks,
Mark
On 6/2/20 8:00 PM, Derrick Lin wrote:
> Hi guys,
>
> We just deployed a CEPH 14.2.9 cluster with the following hardware:
>
> MDSS x 1
> Xeon Gold 5122 3.6Ghz
> 192GB
> Mellanox ConnectX-4 Lx 25GbE
>
>
> MON x 3
> Xeon Bronze 3103 1.7Ghz
> 48GB
> Mellanox ConnectX-4 Lx 25GbE
> 6 x 600GB 10K SAS
>
> OSD x 5
> Xeon Silver 4110 2.1Ghz x 2
> 192GB
> Mellanox ConnectX-4 Lx 25GbE
> 16 x 10TB 7.2K NLSAS (block)
> 2 x 2TB Intel P4600 NVMe (block.db)
>
> Network is all Mellanox SN2410/SN2700 configured at 25GbE for both
> front and back network.
>
> Just for POC at this stage, the cluster was deployed by ceph_ansible
> without much customization and the initial
test on its cephFS FUSE
> mount performance seems to be very low. We did some test with iozone
the
result as follow:
]# /opt/iozone/bin/iozone -i 0 -i 1-r 128k -s 5G -t 20
Iozone: Performance Test of File I/O
Version $Revision: 3.465 $
Compiled for 64 bit mode.
Build: linux-AMD64
Contributors:William Norcott, Don Capps, Isom Crawford,
Kirby Collins
Al Slater, Scott Rhine, Mike Wisner, Ken Goss
Steve Landherr, Brad Smith, Mark Kelly, Dr.
Alain
CYR,
Randy Dunlap, Mark
Montague, Dan Million,
Gavin Brebner,
Jean-Marc Zucconi, Jeff Blomberg, Benny
Halevy,
Dave
Boone,
Erik Habbinga, Kris Strecker, Walter Wong,
Joshua
Root,
Fabrice Bacchella, Zhenghua
Xue, Qin Li,
Darren
Sawyer,
> Vangel Bojaxhi, Ben England, Vikentsi Lapa,
> Alexey Skidanov.
>
> Run began: Tue Jun 2 16:40:53 2020
>
> File size set to 5242880 kB
> Command line used: /opt/iozone/bin/iozone -i 0 -i 1-r -s 5G
-t
20
> 128k
> Output is in kBytes/sec
> Time Resolution = 0.000001 seconds.
> Processor cache size set to 1024 kBytes.
> Processor cache line size set to 32 bytes.
> File stride size set to 17 * record size.
> Throughput test with 20 processes
> Each process writes a 5242880 kByte file in 4 kByte records
>
> Children see throughput for 20 initial writers =
35001.12
kB/sec
> Parent sees throughput for 20 initial writers =
34967.65
kB/sec
> Min throughput per process =
1748.22
kB/sec
> Max throughput per process =
1751.62
kB/sec
> Avg throughput per process =
1750.06
kB/sec
> Min xfer =
5232724.00 kB
>
> Children see throughput for 20 rewriters =
35704.79
kB/sec
> Parent sees throughput for 20 rewriters =
35704.30
kB/sec
> Min throughput per process =
1783.44
kB/sec
> Max throughput per process =
1786.29
kB/sec
> Avg throughput per process =
1785.24
kB/sec
> Min xfer =
5234532.00 kB
>
> Children see throughput for 20 readers =
49368539.50
kB/sec
> Parent sees throughput for 20 readers =
49317231.38
kB/sec
> Min throughput per process =
2414424.00
kB/sec
> Max throughput per process =
2599996.25
kB/sec
> Avg throughput per process =
2468426.98
kB/sec
> Min xfer =
4868708.00 kB
>
> Children see throughput for 20 re-readers =
48675891.50
kB/sec
> Parent sees throughput for 20 re-readers =
48617335.67
kB/sec
> Min throughput per process =
2316395.25
kB/sec
> Max throughput per process =
2703868.75
kB/sec
> Avg throughput per process =
2433794.58
kB/sec
> Min xfer =
4491704.00 kB
We also did some dd tests, the write speed on a single test on our
standard
server is ~50MB/s but on a very big memory
server, the speed is
double ~ 80-90MB/s.
We have zero experience on ceph and as said we haven't done more
tuning
at
> this stage. But if this sort of performance is way too low from
> those hardware spec?
>
> Any hints will be appreciated.
>
> Cheers
> D
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to
ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io