[ceph-users] Re: CephFS thrashing through the page cache

15 Mar 2023

Hi Ashu,

are you talking about the kernel client? I can't find "stripe size" anywhere
in its mount-documentation. Could you possibly post exactly what you did? Mount fstab
line, config setting?

Thanks!
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Ashu Pachauri &lt;ashu210890(a)gmail.com&gt;
Sent: 14 March 2023 19:23:42
To: ceph-users(a)ceph.io
Subject: [ceph-users] Re: CephFS thrashing through the page cache

Got the answer to my own question; posting here if someone else
encounters the same problem. The issue is that the default stripe size in a
cephfs mount is 4 MB. If you are doing small reads (like 4k reads in the
test I posted) inside the file, you'll end up pulling at least 4MB to the
client (and then discarding most of the pulled data) even if you set
readahead to zero. So, the solution for us was to set a lower stripe size,
which aligns better with our workloads.

Thanks and Regards,
Ashu Pachauri

On Fri, Mar 10, 2023 at 9:41 PM Ashu Pachauri &lt;ashu210890(a)gmail.com&gt; wrote:

...
  Also, I am able to reproduce the network read
amplification when I try to
 do very small reads from larger files. e.g.

 for i in $(seq 1 10000); do
   dd if=test_${i} of=/dev/null bs=5k count=10
 done

 This piece of code generates a network traffic of 3.3 GB while it actually
 reads approx 500 MB of data.

 Thanks and Regards,
 Ashu Pachauri

 On Fri, Mar 10, 2023 at 9:22 PM Ashu Pachauri &lt;ashu210890(a)gmail.com&gt;
 wrote:

  We have an internal use case where we back the
storage of a proprietary
 database by a shared file system. We noticed something very odd when
 testing some workload with a local block device backed file system vs
 cephfs. We noticed that the amount of network IO done by cephfs is almost
 double compared to the IO done in case of a local file system backed by an
 attached block device.

 We also noticed that CephFS thrashes through the page cache very quickly
 compared to the amount of data being read and think that the two issues
 might be related. So, I wrote a simple test.

 1. I wrote 10k files 400KB each using dd (approx 4 GB data).
 2. I dropped the page cache completely.
 3. I then read these files serially, again using dd. The page cache usage
 shot up to 39 GB for reading such a small amount of data.

 Following is the code used to repro this in bash:

 for i in $(seq 1 10000); do
   dd if=/dev/zero of=test_${i} bs=4k count=100
 done

 sync; echo 1 > /proc/sys/vm/drop_caches

 for i in $(seq 1 10000); do
   dd if=test_${i} of=/dev/null bs=4k count=100
 done

 The ceph version being used is:
 ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus
 (stable)

 The ceph configs being overriden:
 WHO       MASK  LEVEL     OPTION                                 VALUE
      RO
   mon           advanced  auth_allow_insecure_global_id_reclaim  false

   mgr           advanced  mgr/balancer/mode                      upmap

   mgr           advanced  mgr/dashboard/server_addr
  127.0.0.1    *
   mgr           advanced  mgr/dashboard/server_port              8443
     *
   mgr           advanced  mgr/dashboard/ssl                      false
      *
   mgr           advanced  mgr/prometheus/server_addr             0.0.0.0
      *
   mgr           advanced  mgr/prometheus/server_port             9283
     *
   osd           advanced  bluestore_compression_algorithm        lz4

   osd           advanced  bluestore_compression_mode
 aggressive
   osd           advanced  bluestore_throttle_bytes
 536870912
   osd           advanced  osd_max_backfills                      3

   osd           advanced  osd_op_num_threads_per_shard_ssd       8
      *
   osd           advanced  osd_scrub_auto_repair                  true

   mds           advanced  client_oc                              false

   mds           advanced  client_readahead_max_bytes             4096

   mds           advanced  client_readahead_max_periods           1

   mds           advanced  client_readahead_min                   0

   mds           basic     mds_cache_memory_limit
 21474836480
   client        advanced  client_oc                              false

   client        advanced  client_readahead_max_bytes             4096

   client        advanced  client_readahead_max_periods           1

   client        advanced  client_readahead_min                   0

   client        advanced  fuse_disable_pagecache                 false

 The cephfs mount options (note that readahead was disabled for this test):
 /mnt/cephfs type ceph
 (rw,relatime,name=cephfs,secret=<hidden>,acl,rasize=0)

 Any help or pointers are appreciated; this is a major performance issue
 for us.

 Thanks and Regards,
 Ashu Pachauri

 _______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: CephFS thrashing through the page cache