multi-node NFS Ganesha + libcephfs caching - ceph-users - lists.ceph.io

List overview All Threads
Download

multi-node NFS Ganesha + libcephfs caching

Re: Ceph fully crash and we unable...

rbd-mirror -> how far...

Maged Mokhtar

23 Mar 2020 23 Mar '20

1:49 p.m.

Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ? Cheers /Maged

Reply

Show replies by date

Jeff Layton

23 Mar 23 Mar

6:50 p.m.

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ?

You can do libcephfs write caching, as the caps would need to be recalled for any competing access. What you really want to avoid is any sort of caching at the ganesha daemon layer. -- Jeff Layton <jlayton(a)redhat.com>

Reply

Maged Mokhtar

8:31 p.m.

On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ?

You can do libcephfs write caching, as the caps would need to be recalled for any competing access. What you really want to avoid is any sort of caching at the ganesha daemon layer.

Hi Jeff, Thanks for your reply. I meant caching by libcepfs used within the ganesha ceph fsal plugin, which i am not sure from your reply if this is what you refer to as ganesha daemon layer (or does the later mean the internal mdcache in ganesha). I really appreciate if you can clarify this point. I really have doubts that it is safe to leave write caching in the plugin and have safe failover, yet i see comments in the conf file such as: # The libcephfs client will aggressively cache information while it # can, so there is little benefit to ganesha actively caching the same # objects. Or is it up to the NFS client to issue cache syncs and re-submit writes if it detects failover ? Appreciate your help. /Maged

Reply

Daniel Gryniewicz

24 Mar 24 Mar

11:35 a.m.

On 3/23/20 4:31 PM, Maged Mokhtar wrote:

On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ?

You can do libcephfs write caching, as the caps would need to be recalled for any competing access. What you really want to avoid is any sort of caching at the ganesha daemon layer.

Hi Jeff, Thanks for your reply. I meant caching by libcepfs used within the ganesha ceph fsal plugin, which i am not sure from your reply if this is what you refer to as ganesha daemon layer (or does the later mean the internal mdcache in ganesha). I really appreciate if you can clarify this point.

Caching in libcephfs is fine, it's caching above the FSAL layer that you should avoid.

I really have doubts that it is safe to leave write caching in the plugin and have safe failover, yet i see comments in the conf file such as: # The libcephfs client will aggressively cache information while it # can, so there is little benefit to ganesha actively caching the same # objects. Or is it up to the NFS client to issue cache syncs and re-submit writes if it detects failover ?

Correct. During failover, NFS will go into it's Grace period, which blocks new state, and allow the NFS clients to re-acquire the state (opens, locks, delegations, etc.). This includes re-sending any non-committed writes (commits will cause the data to be saved to the cluster, not just the libcephfs cache). Once this is all done, normal operation proceeds. It should be safe, even with caching in libcephfs. Daniel

Reply

Maged Mokhtar

12:19 p.m.

On 24/03/2020 13:35, Daniel Gryniewicz wrote:

On 3/23/20 4:31 PM, Maged Mokhtar wrote:

On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote:

Hello all, For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs write caching on, or should it be configured off for failover ?

You can do libcephfs write caching, as the caps would need to be recalled for any competing access. What you really want to avoid is any sort of caching at the ganesha daemon layer.

Hi Jeff, Thanks for your reply. I meant caching by libcepfs used within the ganesha ceph fsal plugin, which i am not sure from your reply if this is what you refer to as ganesha daemon layer (or does the later mean the internal mdcache in ganesha). I really appreciate if you can clarify this point.

Caching in libcephfs is fine, it's caching above the FSAL layer that you should avoid.

I really have doubts that it is safe to leave write caching in the plugin and have safe failover, yet i see comments in the conf file such as: # The libcephfs client will aggressively cache information while it # can, so there is little benefit to ganesha actively caching the same # objects. Or is it up to the NFS client to issue cache syncs and re-submit writes if it detects failover ?

Correct. During failover, NFS will go into it's Grace period, which blocks new state, and allow the NFS clients to re-acquire the state (opens, locks, delegations, etc.). This includes re-sending any non-committed writes (commits will cause the data to be saved to the cluster, not just the libcephfs cache). Once this is all done, normal operation proceeds. It should be safe, even with caching in libcephfs. Daniel

Thanks Daniel for the clarification..so it is the responsibility of the client tor re-send writes...2 questions so i can understand this better: -If this is handled at the client..why on the gateway it is ok to cache at the FSAL layer but not above ? -At what level/layer on the client does this get handled: NFS client layer (which will detect failover), filesystem layer, page cache...? Thanks for your patience :) /Maged

Reply

Daniel Gryniewicz

1:14 p.m.

On 3/24/20 8:19 AM, Maged Mokhtar wrote:

On 24/03/2020 13:35, Daniel Gryniewicz wrote:

On 3/23/20 4:31 PM, Maged Mokhtar wrote:

On 23/03/2020 20:50, Jeff Layton wrote:

On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: > Hello all, > > For multi-node NFS Ganesha over CephFS, is it OK to leave libcephfs > write caching on, or should it be configured off for failover ? > You can do libcephfs write caching, as the caps would need to be recalled for any competing access. What you really want to avoid is any sort of caching at the ganesha daemon layer.

Hi Jeff, Thanks for your reply. I meant caching by libcepfs used within the ganesha ceph fsal plugin, which i am not sure from your reply if this is what you refer to as ganesha daemon layer (or does the later mean the internal mdcache in ganesha). I really appreciate if you can clarify this point.

Caching in libcephfs is fine, it's caching above the FSAL layer that you should avoid.

I really have doubts that it is safe to leave write caching in the plugin and have safe failover, yet i see comments in the conf file such as: # The libcephfs client will aggressively cache information while it # can, so there is little benefit to ganesha actively caching the same # objects. Or is it up to the NFS client to issue cache syncs and re-submit writes if it detects failover ?

Correct. During failover, NFS will go into it's Grace period, which blocks new state, and allow the NFS clients to re-acquire the state (opens, locks, delegations, etc.). This includes re-sending any non-committed writes (commits will cause the data to be saved to the cluster, not just the libcephfs cache). Once this is all done, normal operation proceeds. It should be safe, even with caching in libcephfs. Daniel

Thanks Daniel for the clarification..so it is the responsibility of the client tor re-send writes...2 questions so i can understand this better: -If this is handled at the client..why on the gateway it is ok to cache at the FSAL layer but not above ?

In principle, it's fine above. However, that requires a level of coordination that's not there right now. The libcephfs cache is integrated with the CAPs system, and knows when it can cache and when it needs to flush. There's work to do to get that up to the higher layers.

-At what level/layer on the client does this get handled: NFS client layer (which will detect failover), filesystem layer, page cache...?

The NFS client layer, interacting with the VFS/page cache. (NFS is the filesystem in this case, so technically the filesystem layer.) Daniel

Reply

Maged Mokhtar

2:48 p.m.

On 24/03/2020 15:14, Daniel Gryniewicz wrote:

On 3/24/20 8:19 AM, Maged Mokhtar wrote:

On 24/03/2020 13:35, Daniel Gryniewicz wrote:

On 3/23/20 4:31 PM, Maged Mokhtar wrote:

On 23/03/2020 20:50, Jeff Layton wrote: > On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: >> Hello all, >> >> For multi-node NFS Ganesha over CephFS, is it OK to leave >> libcephfs write caching on, or should it be configured off for >> failover ? >> > You can do libcephfs write caching, as the caps would need to be > recalled for any competing access. What you really want to avoid > is any > sort of caching at the ganesha daemon layer. Hi Jeff, Thanks for your reply. I meant caching by libcepfs used within the ganesha ceph fsal plugin, which i am not sure from your reply if this is what you refer to as ganesha daemon layer (or does the later mean the internal mdcache in ganesha). I really appreciate if you can clarify this point.

Caching in libcephfs is fine, it's caching above the FSAL layer that you should avoid.

I really have doubts that it is safe to leave write caching in the plugin and have safe failover, yet i see comments in the conf file such as: # The libcephfs client will aggressively cache information while it # can, so there is little benefit to ganesha actively caching the same # objects. Or is it up to the NFS client to issue cache syncs and re-submit writes if it detects failover ?

Correct. During failover, NFS will go into it's Grace period, which blocks new state, and allow the NFS clients to re-acquire the state (opens, locks, delegations, etc.). This includes re-sending any non-committed writes (commits will cause the data to be saved to the cluster, not just the libcephfs cache). Once this is all done, normal operation proceeds. It should be safe, even with caching in libcephfs. Daniel

Thanks Daniel for the clarification..so it is the responsibility of the client tor re-send writes...2 questions so i can understand this better: -If this is handled at the client..why on the gateway it is ok to cache at the FSAL layer but not above ?

In principle, it's fine above. However, that requires a level of coordination that's not there right now. The libcephfs cache is integrated with the CAPs system, and knows when it can cache and when it needs to flush. There's work to do to get that up to the higher layers.

-At what level/layer on the client does this get handled: NFS client layer (which will detect failover), filesystem layer, page cache...?

The NFS client layer, interacting with the VFS/page cache. (NFS is the filesystem in this case, so technically the filesystem layer.) Daniel

Thank you so much for the clarification.. Maged

Reply

Maged Mokhtar

5:16 p.m.

On 24/03/2020 16:48, Maged Mokhtar wrote:

On 24/03/2020 15:14, Daniel Gryniewicz wrote:

On 3/24/20 8:19 AM, Maged Mokhtar wrote:

On 24/03/2020 13:35, Daniel Gryniewicz wrote:

On 3/23/20 4:31 PM, Maged Mokhtar wrote: > > On 23/03/2020 20:50, Jeff Layton wrote: >> On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: >>> Hello all, >>> >>> For multi-node NFS Ganesha over CephFS, is it OK to leave >>> libcephfs write caching on, or should it be configured off for >>> failover ? >>> >> You can do libcephfs write caching, as the caps would need to be >> recalled for any competing access. What you really want to avoid >> is any >> sort of caching at the ganesha daemon layer. > > Hi Jeff, > > Thanks for your reply. I meant caching by libcepfs used within the > ganesha ceph fsal plugin, which i am not sure from your reply if > this is what you refer to as ganesha daemon layer (or does the > later mean the internal mdcache in ganesha). I really appreciate > if you can clarify this point. Caching in libcephfs is fine, it's caching above the FSAL layer that you should avoid. > > I really have doubts that it is safe to leave write caching in the > plugin and have safe failover, yet i see comments in the conf file > such as: > # The libcephfs client will aggressively cache information while it > # can, so there is little benefit to ganesha actively caching the > same > # objects. > > Or is it up to the NFS client to issue cache syncs and re-submit > writes if it detects failover ? Correct. During failover, NFS will go into it's Grace period, which blocks new state, and allow the NFS clients to re-acquire the state (opens, locks, delegations, etc.). This includes re-sending any non-committed writes (commits will cause the data to be saved to the cluster, not just the libcephfs cache). Once this is all done, normal operation proceeds. It should be safe, even with caching in libcephfs. Daniel

Thanks Daniel for the clarification..so it is the responsibility of the client tor re-send writes...2 questions so i can understand this better: -If this is handled at the client..why on the gateway it is ok to cache at the FSAL layer but not above ?

In principle, it's fine above. However, that requires a level of coordination that's not there right now. The libcephfs cache is integrated with the CAPs system, and knows when it can cache and when it needs to flush. There's work to do to get that up to the higher layers.

-At what level/layer on the client does this get handled: NFS client layer (which will detect failover), filesystem layer, page cache...?

The NFS client layer, interacting with the VFS/page cache. (NFS is the filesystem in this case, so technically the filesystem layer.) Daniel

Thank you so much for the clarification.. Maged _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

One more thing: for non-Linux clients, specifically VMWare, their NFS client may not behave the same, correct ? In the iSCSI domain, VMWare does not have any kind of buffer/page cache, which is probably to support failover among ESXi nodes, should i test this or am i on the wrong track ? /Maged

Reply

Daniel Gryniewicz

5:38 p.m.

On 3/24/20 1:16 PM, Maged Mokhtar wrote:

On 24/03/2020 16:48, Maged Mokhtar wrote:

On 24/03/2020 15:14, Daniel Gryniewicz wrote:

On 3/24/20 8:19 AM, Maged Mokhtar wrote:

On 24/03/2020 13:35, Daniel Gryniewicz wrote: > > > On 3/23/20 4:31 PM, Maged Mokhtar wrote: >> >> On 23/03/2020 20:50, Jeff Layton wrote: >>> On Mon, 2020-03-23 at 15:49 +0200, Maged Mokhtar wrote: >>>> Hello all, >>>> >>>> For multi-node NFS Ganesha over CephFS, is it OK to leave >>>> libcephfs write caching on, or should it be configured off for >>>> failover ? >>>> >>> You can do libcephfs write caching, as the caps would need to be >>> recalled for any competing access. What you really want to avoid >>> is any >>> sort of caching at the ganesha daemon layer. >> >> Hi Jeff, >> >> Thanks for your reply. I meant caching by libcepfs used within the >> ganesha ceph fsal plugin, which i am not sure from your reply if >> this is what you refer to as ganesha daemon layer (or does the >> later mean the internal mdcache in ganesha). I really appreciate >> if you can clarify this point. > > Caching in libcephfs is fine, it's caching above the FSAL layer > that you should avoid. > >> >> I really have doubts that it is safe to leave write caching in the >> plugin and have safe failover, yet i see comments in the conf file >> such as: >> # The libcephfs client will aggressively cache information while it >> # can, so there is little benefit to ganesha actively caching the >> same >> # objects. >> >> Or is it up to the NFS client to issue cache syncs and re-submit >> writes if it detects failover ? > > Correct. During failover, NFS will go into it's Grace period, > which blocks new state, and allow the NFS clients to re-acquire > the state (opens, locks, delegations, etc.). This includes > re-sending any non-committed writes (commits will cause the data to > be saved to the cluster, not just the libcephfs cache). Once this > is all done, normal operation proceeds. It should be safe, even > with caching in libcephfs. > > Daniel > Thanks Daniel for the clarification..so it is the responsibility of the client tor re-send writes...2 questions so i can understand this better: -If this is handled at the client..why on the gateway it is ok to cache at the FSAL layer but not above ?

In principle, it's fine above. However, that requires a level of coordination that's not there right now. The libcephfs cache is integrated with the CAPs system, and knows when it can cache and when it needs to flush. There's work to do to get that up to the higher layers.

-At what level/layer on the client does this get handled: NFS client layer (which will detect failover), filesystem layer, page cache...?

The NFS client layer, interacting with the VFS/page cache. (NFS is the filesystem in this case, so technically the filesystem layer.) Daniel

Thank you so much for the clarification.. Maged _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

One more thing: for non-Linux clients, specifically VMWare, their NFS client may not behave the same, correct ? In the iSCSI domain, VMWare does not have any kind of buffer/page cache, which is probably to support failover among ESXi nodes, should i test this or am i on the wrong track ? /Maged

This behavior is a requirement of the spec. All compliant NFS implementations behave this way. If you don't have a client side cache, then you have to do only stable writes (each write is sync'd to the backing store). This is slower, but it's safe. If VMWare doesn't do this, then they *will* lose data if the server ever crashes, and it will be their exclusive fault. Daniel

Reply

1516

days inactive

1517

days old

ceph-users@ceph.io

Manage subscription

8 comments

3 participants

tags (0)

participants (3)

Daniel Gryniewicz
Jeff Layton
Maged Mokhtar