Best osd scenario + ansible config?

List overview All Threads
Download

newer

older

Re: CephFS+NFS For VMWare

ceph fs crashes on simple fio test

Yoann Moulin

3 Sep 2019 3 Sep '19

4:28 p.m.

Hello, I am deploying a new Nautilus cluster and I would like to know what would be the best OSD's scenario config in this case : 10x 6TB Disk OSDs (data) 2x 480G SSD previously used for journal and can be used for WAL and/or DB Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one. A more general question, what is the impact on an OSD if we lose the WAL? The DB? Both? I plan to use EC 7+5 on 12 servers and I am OK if I lose one server temporarily. I have spare servers and I can easily add another one in this cluster. To deploy this cluster, I use ceph-ansible (stable-4.0). I am not sure how to configure the playbook to use SSD and disks with LVM. https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios… Is this good? osd_objectstore: bluestore lvm_volumes: - data: data-lv1 data_vg: data-vg1 db: db-lv1 db_vg: db-vg1 wal: wal-lv1 wal_vg: wal-vg1 - data: data-lv2 data_vg: data-vg2 db: db-lv2 db_vg: db-vg2 wal: wal-lv2 wal_vg: wal-vg2 Is it possible to let the playbook configure LVM for each disk in a mixed case? It looks like I must configure LVM before running the playbook but I am not sure if I missed something. Is wal_vg and db_vg can be identical (on VG per SSD shared with multiple OSDs)? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT

Show replies by date

EDH - Manuel Rios Fernandez

3 Sep 3 Sep

4:35 p.m.

Just a note: With 7+5 you will need 13 host for access your data in case one goes down. Expected in the nexts version allow access data with the EC numbers. -----Mensaje original----- De: Yoann Moulin <yoann.moulin(a)epfl.ch> Enviado el: martes, 3 de septiembre de 2019 11:28 Para: ceph-users(a)ceph.io Asunto: [ceph-users] Best osd scenario + ansible config? Hello, I am deploying a new Nautilus cluster and I would like to know what would be the best OSD's scenario config in this case : 10x 6TB Disk OSDs (data) 2x 480G SSD previously used for journal and can be used for WAL and/or DB Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one. A more general question, what is the impact on an OSD if we lose the WAL? The DB? Both? I plan to use EC 7+5 on 12 servers and I am OK if I lose one server temporarily. I have spare servers and I can easily add another one in this cluster. To deploy this cluster, I use ceph-ansible (stable-4.0). I am not sure how to configure the playbook to use SSD and disks with LVM. https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios. rst Is this good? osd_objectstore: bluestore lvm_volumes: - data: data-lv1 data_vg: data-vg1 db: db-lv1 db_vg: db-vg1 wal: wal-lv1 wal_vg: wal-vg1 - data: data-lv2 data_vg: data-vg2 db: db-lv2 db_vg: db-vg2 wal: wal-lv2 wal_vg: wal-vg2 Is it possible to let the playbook configure LVM for each disk in a mixed case? It looks like I must configure LVM before running the playbook but I am not sure if I missed something. Is wal_vg and db_vg can be identical (on VG per SSD shared with multiple OSDs)? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Yoann Moulin

6:25 p.m.

Hello,

...

Just a note: With 7+5 you will need 13 host for access your data in case one goes down.

As far as I know, EC 7+5 imply ‘erasure size 12 min_size 8’. So I need at least 8 servers access to my data, k=7, m=5, size = k+m = 12 min_size = k+1 = 8 Am I wrong?

...

Expected in the nexts version allow access data with the EC numbers.

I think it is still possible to set min_size = k in Nautilus but it is not recommended. Best, Yoann

...

-----Mensaje original----- De: Yoann Moulin <yoann.moulin(a)epfl.ch> Enviado el: martes, 3 de septiembre de 2019 11:28 Para: ceph-users(a)ceph.io Asunto: [ceph-users] Best osd scenario + ansible config? Hello, I am deploying a new Nautilus cluster and I would like to know what would be the best OSD's scenario config in this case : 10x 6TB Disk OSDs (data) 2x 480G SSD previously used for journal and can be used for WAL and/or DB Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one. A more general question, what is the impact on an OSD if we lose the WAL? The DB? Both? I plan to use EC 7+5 on 12 servers and I am OK if I lose one server temporarily. I have spare servers and I can easily add another one in this cluster. To deploy this cluster, I use ceph-ansible (stable-4.0). I am not sure how to configure the playbook to use SSD and disks with LVM. https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios. rst Is this good? osd_objectstore: bluestore lvm_volumes: - data: data-lv1 data_vg: data-vg1 db: db-lv1 db_vg: db-vg1 wal: wal-lv1 wal_vg: wal-vg1 - data: data-lv2 data_vg: data-vg2 db: db-lv2 db_vg: db-vg2 wal: wal-lv2 wal_vg: wal-vg2 Is it possible to let the playbook configure LVM for each disk in a mixed case? It looks like I must configure LVM before running the playbook but I am not sure if I missed something. Is wal_vg and db_vg can be identical (on VG per SSD shared with multiple OSDs)? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Yoann Moulin EPFL IC-IT

Darren Soothill

5:28 p.m.

Hi Yoann, So you need to think about failure domains. If you put all the DB's on one SSD and all the WAL's on another SSD then a failure of either of those SSD's will result in a failure of all the OSD's behind them. So in this case all 10 OSD's would have failed. Splitting it to 5 OSD's you have RocksDb and WAL on each SSD this then results in a failure of an SSD only impacting 5 OSD's. A failure of an SSD will take down all the OSD's that are behind that SSD. That's one of the reasons I would always say you need 1 nodes worth of spare capacity in the cluster to allow for automated re-builds to happen. As for your EC 7+5 I would have gone for some thing like 8+3 as then you have a spare node active in the cluster and can still provide full protection in the event of a failure of a node. Think about software updates that require a reboot of a node. Any data written during that time will need recovering to bring it back to full protection where as if you have a spare node then that data could be written and not require a later recovery. Darren On 03/09/2019, 10:29, "Yoann Moulin" <yoann.moulin(a)epfl.ch> wrote: Hello, I am deploying a new Nautilus cluster and I would like to know what would be the best OSD's scenario config in this case : 10x 6TB Disk OSDs (data) 2x 480G SSD previously used for journal and can be used for WAL and/or DB Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one. A more general question, what is the impact on an OSD if we lose the WAL? The DB? Both? I plan to use EC 7+5 on 12 servers and I am OK if I lose one server temporarily. I have spare servers and I can easily add another one in this cluster. To deploy this cluster, I use ceph-ansible (stable-4.0). I am not sure how to configure the playbook to use SSD and disks with LVM. https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios… Is this good? osd_objectstore: bluestore lvm_volumes: - data: data-lv1 data_vg: data-vg1 db: db-lv1 db_vg: db-vg1 wal: wal-lv1 wal_vg: wal-vg1 - data: data-lv2 data_vg: data-vg2 db: db-lv2 db_vg: db-vg2 wal: wal-lv2 wal_vg: wal-vg2 Is it possible to let the playbook configure LVM for each disk in a mixed case? It looks like I must configure LVM before running the playbook but I am not sure if I missed something. Is wal_vg and db_vg can be identical (on VG per SSD shared with multiple OSDs)? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Yoann Moulin

7:01 p.m.

Hi,

...

So you need to think about failure domains.

Failure domains will be set to host.

...

If you put all the DB's on one SSD and all the WAL's on another SSD then a failure of either of those SSD's will result in a failure of all the OSD's behind them. So in this case all 10 OSD's would have failed. Splitting it to 5 OSD's you have RocksDb and WAL on each SSD this then results in a failure of an SSD only impacting 5 OSD's. A failure of an SSD will take down all the OSD's that are behind that SSD.

That is what I wondered, thanks to confirm it.

...

That's one of the reasons I would always say you need 1 nodes worth of spare capacity in the cluster to allow for automated re-builds to happen. As for your EC 7+5 I would have gone for some thing like 8+3 as then you have a spare node active in the cluster and can still provide full protection in the event of a failure of a node.

Make sense! On another cluster, I have an EC 7+5 pool for cephfs but there are 4 servers per chassis. In case I lost one chassis, I still need to access data. But for that cluster, you are right, 8+3 may be enough for redundancy.

...

Think about software updates that require a reboot of a node. Any data written during that time will need recovering to bring it back to full protection where as if you have a spare node then that data could be written and not require a later recovery.

It is mostly a read-only cluster to distribute public datasets over S3 inside our network, it is fine for me if write operations are not fully protected during a couple of days. All writes operations are managed by us to update datasets. But as mentioned above, 8+3 may be a good compromise. Best, Yoann

...

On 03/09/2019, 10:29, "Yoann Moulin" <yoann.moulin(a)epfl.ch> wrote: Hello, I am deploying a new Nautilus cluster and I would like to know what would be the best OSD's scenario config in this case : 10x 6TB Disk OSDs (data) 2x 480G SSD previously used for journal and can be used for WAL and/or DB Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one. A more general question, what is the impact on an OSD if we lose the WAL? The DB? Both? I plan to use EC 7+5 on 12 servers and I am OK if I lose one server temporarily. I have spare servers and I can easily add another one in this cluster. To deploy this cluster, I use ceph-ansible (stable-4.0). I am not sure how to configure the playbook to use SSD and disks with LVM. https://github.com/ceph/ceph-ansible/blob/master/docs/source/osds/scenarios… Is this good? osd_objectstore: bluestore lvm_volumes: - data: data-lv1 data_vg: data-vg1 db: db-lv1 db_vg: db-vg1 wal: wal-lv1 wal_vg: wal-vg1 - data: data-lv2 data_vg: data-vg2 db: db-lv2 db_vg: db-vg2 wal: wal-lv2 wal_vg: wal-vg2 Is it possible to let the playbook configure LVM for each disk in a mixed case? It looks like I must configure LVM before running the playbook but I am not sure if I missed something. Is wal_vg and db_vg can be identical (on VG per SSD shared with multiple OSDs)? Thanks for your help. Best regards, -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

-- Yoann Moulin EPFL IC-IT

Robert LeBlanc

5 Sep 5 Sep

10:38 p.m.

On Tue, Sep 3, 2019 at 5:03 AM Yoann Moulin <yoann.moulin(a)epfl.ch> wrote:

...

As for your EC 7+5 I would have gone for some thing like 8+3 as then you

have a spare node active in the cluster and can still provide full protection in the event of a failure of a node. Make sense! On another cluster, I have an EC 7+5 pool for cephfs but there are 4 servers per chassis. In case I lost one chassis, I still need to access data. But for that cluster, you are right, 8+3 may be enough for redundancy.

Another configuration to consider is to leverage the bucket types in CRUSH. We setup rows, racks, switches, chassis, etc in our CRUSH map and then have the CRUSH rules only select one OSD per fault domain that we want to survive. In your case you 'could' put your hosts into a chassis, then have the rule choose_leaf from the chassis, then you 'could' go down as far as 8+2 for instance and still be protected if 4 hosts in a chassis went down. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

Lars Täuber

4 Sep 4 Sep

2:57 p.m.

Hi! Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

...

Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one.

I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1. Cheers, Lars

Yoann Moulin

3:32 p.m.

Hello,

...

Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one.

I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1.

What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best, -- Yoann Moulin EPFL IC-IT

Lars Täuber

4:01 p.m.

Wed, 4 Sep 2019 10:32:56 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

...

Hello,

Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one.

I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1.

What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best,

I'm still unsure with the size of the block.db and the wal. This seems to be relevant: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-May/035086.html But it is also said that the pure WAL need just 1 GB of space. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036509.html So the conclusion would be to use 2*X(DB) + 1GB (WAL) if you put both on the same partition/LV. With X being on of 3GB, 30GB or 300GB. You have 10 OSDs. That means you should have 10 partitions/LVs for DBs & WALs. This is something that should be cleared up in the docs! Lars

Yoann Moulin

4:11 p.m.

Le 04/09/2019 à 11:01, Lars Täuber a écrit :

...

Wed, 4 Sep 2019 10:32:56 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Hello,

Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one.

I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1.

What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best,

Lars Täuber

4:19 p.m.

Wed, 4 Sep 2019 11:11:14 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

...

Le 04/09/2019 à 11:01, Lars Täuber a écrit :

Wed, 4 Sep 2019 10:32:56 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Hello,

Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io : > Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on > the second one. I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1.

What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best,

So, I don't have enough space on SSDs to do raid1, I must use 1 SSD for 5 disks. 5x64GB + 5x2GB should be good, shouldn't it?

I'd put both on one LV/partition.

...

And I still don't know if the ceph-ansible playbook can manage the LVM setup of shall I need to prepare all VG and LV before.

I did this manually before running the ansible-playbook. host_vars/host3.yml lvm_volumes: - data: /dev/sdb db: '1' db_vg: host-3-db - data: /dev/sdc db: '2' db_vg: host-3-db - data: /dev/sde db: '3' db_vg: host-3-db - data: /dev/sdf db: '4' db_vg: host-3-db … Lars

Yoann Moulin

4:32 p.m.

...

> Tue, 3 Sep 2019 11:28:20 +0200 > Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io : >> Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on >> the second one. > > I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1. What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best,

So, I don't have enough space on SSDs to do raid1, I must use 1 SSD for 5 disks. 5x64GB + 5x2GB should be good, shouldn't it?

I'd put both on one LV/partition.

You mean, one LV/partition for wal and db ? I didn't know it was possible to put both on the same partition. In that case, I can split my SSD in 64GB partition for WAL and DB for each OSD.

...

And I still don't know if the ceph-ansible playbook can manage the LVM setup of shall I need to prepare all VG and LV before.

OK thanks for the help. Best, -- Yoann Moulin EPFL IC-IT

Darren Soothill

4:26 p.m.

I would suggest 5 x 70GB on each SSD for the RocksDB and WAL in the same partition/LVM. Unless you have a plan in the future to add some faster NVME for the WAL's when you might want them in a separate partition to make it easier to move the WAL to the NVME. Darren On 04/09/2019, 10:12, "Yoann Moulin" <yoann.moulin(a)epfl.ch> wrote: Le 04/09/2019 à 11:01, Lars Täuber a écrit :

...

Wed, 4 Sep 2019 10:32:56 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Hello,

Tue, 3 Sep 2019 11:28:20 +0200 Yoann Moulin <yoann.moulin(a)epfl.ch> ==> ceph-users(a)ceph.io :

Is it better to put all WAL on one SSD and all DBs on the other one? Or put WAL and DB of the first 5 OSDs on the first SSD and the 5 others on the second one.

I don't know if this has a relevant impact on the latency/speed of the ceph system but we use LVM on top of a SW RAID 1 over two SSDs for WAL & DB on this RAID1.

What is the recommended size for wall and db in my case? I have : 10x 6TB Disk OSDs (data) 2x 480G SSD Best,

So, I don't have enough space on SSDs to do raid1, I must use 1 SSD for 5 disks. 5x64GB + 5x2GB should be good, shouldn't it? And I still don't know if the ceph-ansible playbook can manage the LVM setup of shall I need to prepare all VG and LV before. Best, -- Yoann Moulin EPFL IC-IT _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1695

days inactive

1697

days old

ceph-users@ceph.io

Manage subscription

12 comments

5 participants

tags (0)

participants (5)

Darren Soothill
EDH - Manuel Rios Fernandez
Lars Täuber
Robert LeBlanc
Yoann Moulin