Hi all,
This email talks about how to design
1) ReplicaDaemon:
The daemon, running on the host with DCPMM & RNIC(RDMA-NIC),
reports what kind of info to Ceph/Monitor.
2) ReplicaMonitor:
ReplicaMonitor, one new PaxosService in Ceph/Monitor, manage the
ReplicaDaemons' info and deal with librbd's request to select
the appropriate ReplicaDaemons' info to librbd.
This email doesn't talk about:
After librbd get the ReplicaDaemons' info, how librbd will communite
with ReplicaDaemon and how to finish the replication.
RFC PR: [WIP] aggregate client state and route info
https://github.com/ceph/ceph/pull/37931
Detail:
+-----------------------------------+
+-----------------------------------------------+
|+---------------------------------+| |
+--------------------+|
|| ReplicaDaemonInfo: || | |PaxosServiceMessage
||
|| ||
|+---------------------------------------------+|
|| daemon_id; ||
||MReplicaDaemonBlink(MSG_REPLICADAEMON_BLINK):||
|| rnic_bind_port; || ||
||
|| rnic_addr; || ||ReplicaDaemonInfo;
||
|| free_size; ||
|+---------------------------------------------+|
|+---------------------------------+| |
+--------------------+|
|+---------------------------------+| | |PaxosServiceMessage
||
|| ReqReplicaDaemonInfo: ||
|+---------------------------------------------+|
|| ||
||MMonGetReplicaDaemonMap(CEPH_MSG_MON_GET_REPL||
|| replicas; || ||ICADAEMONMAP):
||
|| replica_size; || ||
||
|+---------------------------------+| ||ReqReplicaDaemonInfo;
||
|+---------------------------------+|
|+---------------------------------------------++
|| ReplicaDaemonMap: || |
+-------+|
|| || |
|Message||
|| std::vector<ReplicaDaemonInfo>; ||
|+---------------------------------------------+|
|+---------------------------------+|
||MReplicaDaemonMap(CEPH_MSG_REPLICADAEMON_MAP)||
| MetaData(need encode/decode) | ||
||
| | ||
||
| | ||ReplicaDaemonMap;
||
| |
|+---------------------------------------------+|
| | |
|
| | | Three messages defined for the MetaData
|
+-----------------------------------+
+-----------------------------------------------+
+--------+
+------------+
|Dispatch|
|PaxosService|
+---------------------+ Update ReplicaDaemonInfo
+---------------------------+
| ReplicaDaemon: | through | ReplicaMonitor:
|
| | MReplicaDaemonBlink |
|
| ReplicaDaemonInfo; -----------------------------------> ReplicaDaemonMap;
|
| | |
|
| ms_dispatch; | | //Need implement some
APIs|
+---------------------+
+------^-------------|------+
Request ReplicaDaemonMap Feedback
ReplicaDaemonMap
through |
|through
MMonGetReplicaDaemonMap
MReplicaDaemonMap
+------|-------------v------+
| librbd
|
+---------------------------+
ReplicaDaemon reports ReplicaDaemonInfo to ReplicaMonitor by MReplicaDaemonBlink
message.
ReplicaMonitor store all the ReplicaDaemonInfo into ReplicaDaemonMap after going
through Paxos.
The client(librbd) send MMonGetReplicaDaemonMap to ReplicaMonitor, ReplicaMonitor
will
choose the approprite ReplicaDaemon and pack all the info to new ReplicaDaemonMap to
send
back to the client by MReplicaDaemonMap message;
B.R.
Changcheng