Re: Questions on Ceph cluster without OS disks

List overview All Threads
Download

newer

older

Cann't create ceph cluster

Is there a better way to make a...

Martin Verges

22 Mar 2020 22 Mar '20

4:50 p.m.

Hello Samuel, we from croit.io don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our mgmt Software. - No need to install OS: No maintenance costs, no labor required, no other OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < huxiaoyu(a)horebdata.cn>gt;:

...

Hello， Martin， I notice that Croit advocate the use of ceph cluster without OS disks, but with PXE boot. Do you use a NFS server to serve the root file system for each node? such as hosting configuration files, user and password, log files, etc. My question is, will the NFS server be a single point of failure? If the NFS server goes down, the network experience any outage, ceph nodes may not be able to write to the local file systems, possibly leading to service outage. How do you deal with the above potential issues in production? I am a bit worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn

Show replies by date

huxiaoyu＠horebdata.cn

23 Mar 23 Mar

4 a.m.

New subject: Questions on Ceph cluster without OS disks

Martin, thanks a lot for the information. This is very interesting, and i will contact again if we decided to go this way. best regards, samuel huxiaoyu(a)horebdata.cn From: Martin Verges Date: 2020-03-22 20:50 To: huxiaoyu(a)horebdata.cn CC: ceph-users Subject: Re: Questions on Ceph cluster without OS disks Hello Samuel, we from croit.io don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our mgmt Software. - No need to install OS: No maintenance costs, no labor required, no other OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn <huxiaoyu(a)horebdata.cn>cn>: Hello， Martin， I notice that Croit advocate the use of ceph cluster without OS disks, but with PXE boot. Do you use a NFS server to serve the root file system for each node? such as hosting configuration files, user and password, log files, etc. My question is, will the NFS server be a single point of failure? If the NFS server goes down, the network experience any outage, ceph nodes may not be able to write to the local file systems, possibly leading to service outage. How do you deal with the above potential issues in production? I am a bit worried... best regards, samuel huxiaoyu(a)horebdata.cn

Thomas Schneider

5:30 a.m.

New subject: Questions on Ceph cluster without OS disks

Hello Martin, how much disk space do you reserve for log in the PXE setup? Regards Thomas Am 22.03.2020 um 20:50 schrieb Martin Verges:

...

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Martin Verges

5:39 a.m.

New subject: Questions on Ceph cluster without OS disks

Hello Thomas, by default we allocate 1GB per Host on the Management Node, nothing on the PXE booted server. This value can be changed in the management container config file (/config/config.yml):

...

... logFilesPerServerGB: 1 ...

After changing the config, you need to restart the mgmt container. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider < 74cmonty(a)gmail.com>gt;:

...

Hello Martin, how much disk space do you reserve for log in the PXE setup? Regards Thomas Am 22.03.2020 um 20:50 schrieb Martin Verges:

Hello Samuel, we from croit.io don't use NFS to boot up Servers. We copy the OS

directly

into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per

node

and reduce overall costs as 1GB ram is cheaper then an OS disk and

consumes

less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that.

For

everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the

same

image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our

mgmt

Software. - No need to install OS: No maintenance costs, no labor required, no

other

OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < huxiaoyu(a)horebdata.cn>gt;: > Hello， Martin， > > I notice that Croit advocate the use of ceph cluster without OS disks,

but

> with PXE boot. > > Do you use a NFS server to serve the root file system for each node?

such

> as hosting configuration files, user and password, log files, etc. My > question is, will the NFS server be a single point of failure? If the

NFS

> server goes down, the network experience any outage, ceph nodes may not

> able to write to the local file systems, possibly leading to service

outage.

> > How do you deal with the above potential issues in production? I am a

bit

worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Thomas Schneider

5:46 a.m.

New subject: Questions on Ceph cluster without OS disks

Hello Martin, that is much less than I experienced of allocated disk space in case something is wrong with the cluster. I have defined at least 10GB and there were situations (in the past) when this space was quickly allocated by syslog user.log messages daemon.log Regards Thomas Am 23.03.2020 um 09:39 schrieb Martin Verges:

...

Hello Thomas, by default we allocate 1GB per Host on the Management Node, nothing on the PXE booted server. This value can be changed in the management container config file (/config/config.yml):

... logFilesPerServerGB: 1 ...

After changing the config, you need to restart the mgmt container. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Mo., 23. März 2020 um 09:30 Uhr schrieb Thomas Schneider <74cmonty(a)gmail.com <mailto:74cmonty@gmail.com>>: Hello Martin, how much disk space do you reserve for log in the PXE setup? Regards Thomas Am 22.03.2020 um 20:50 schrieb Martin Verges:

Hello Samuel, we from croit.io <http://croit.io> don't use NFS to boot up

Servers. We copy the OS directly

into the RAM (approximately 0.5-1GB). Think of it like a

container, you

start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more

storage per node

and reduce overall costs as 1GB ram is cheaper then an OS disk

and consumes

less power. If our management node is down, nothing will happen to the

cluster. No

impact, no downtime. However, you do need the mgmt node to boot

up the

cluster. So after a very rare total power outage, your first

system would

be the mgmt node and then the cluster itself. But again, if you

configure

your systems correct, no manual work is required to recover from

that. For

everything else, it is possible (but definitely not needed) to

deploy our

mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20

years of

datacenter hosting experience and it never ever failed us in the

last >10

years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same

version,

same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has

exactly the same

image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a

node reboot.

No cluster downtime, No service Impact, full automatic handling

by our mgmt

Software. - No need to install OS: No maintenance costs, no labor

required, no other

nodes, all

boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to

offer an

excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn

<mailto:huxiaoyu@horebdata.cn> <

huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>>: > Hello， Martin， > > I notice that Croit advocate the use of ceph cluster without OS

disks, but

> with PXE boot. > > Do you use a NFS server to serve the root file system for each

node? such

> as hosting configuration files, user and password, log files,

etc. My

> question is, will the NFS server be a single point of failure?

If the NFS

> server goes down, the network experience any outage, ceph nodes

may not be

> able to write to the local file systems, possibly leading to

service outage.

> > How do you deal with the above potential issues in production?

I am a bit

worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io

<mailto:ceph-users@ceph.io>

To unsubscribe send an email to ceph-users-leave(a)ceph.io

<mailto:ceph-users-leave@ceph.io>

Anthony D'Atri

24 Mar 24 Mar

4:39 a.m.

New subject: Questions on Ceph cluster without OS disks

I suspect Ceph is configured in their case to send all logs off-node to a central syslog server, ELK, etc. With Jewel this seemed to result in daemons crashing, but probably it’s since been fixed (I haven’t tried).

...

that is much less than I experienced of allocated disk space in case something is wrong with the cluster. I have defined at least 10GB and there were situations (in the past) when this space was quickly allocated by syslog user.log messages daemon.log

Thomas Schneider

4:47 a.m.

New subject: Questions on Ceph cluster without OS disks

...

Hello Thomas, by default we allocate 1GB per Host on the Management Node, nothing on the PXE booted server. This value can be changed in the management container config file (/config/config.yml):

... logFilesPerServerGB: 1 ...

Hello Samuel, we from croit.io <http://croit.io> don't use NFS to boot up

Servers. We copy the OS directly

into the RAM (approximately 0.5-1GB). Think of it like a

container, you

start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more

storage per node

and reduce overall costs as 1GB ram is cheaper then an OS disk

and consumes

less power. If our management node is down, nothing will happen to the

cluster. No

impact, no downtime. However, you do need the mgmt node to boot

up the

cluster. So after a very rare total power outage, your first

system would

be the mgmt node and then the cluster itself. But again, if you

configure

your systems correct, no manual work is required to recover from

that. For

everything else, it is possible (but definitely not needed) to

deploy our

mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20

years of

datacenter hosting experience and it never ever failed us in the

last >10

years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same

version,

same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has

exactly the same

image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a

node reboot.

No cluster downtime, No service Impact, full automatic handling

by our mgmt

Software. - No need to install OS: No maintenance costs, no labor

required, no other

nodes, all

boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to

offer an

<mailto:huxiaoyu@horebdata.cn> <

huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>>: > Hello， Martin， > > I notice that Croit advocate the use of ceph cluster without OS

disks, but

> with PXE boot. > > Do you use a NFS server to serve the root file system for each

node? such

> as hosting configuration files, user and password, log files,

etc. My

> question is, will the NFS server be a single point of failure?

If the NFS

> server goes down, the network experience any outage, ceph nodes

may not be

> able to write to the local file systems, possibly leading to

service outage.

> > How do you deal with the above potential issues in production?

I am a bit

worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io

<mailto:ceph-users@ceph.io>

To unsubscribe send an email to ceph-users-leave(a)ceph.io

<mailto:ceph-users-leave@ceph.io>

Martin Verges

5:36 a.m.

New subject: Questions on Ceph cluster without OS disks

Hello Thomas, we export the Logs using systemd-journald-remote / -upload. Long term retention can be done configuring an external syslog / elk / .. using our config file. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Di., 24. März 2020 um 08:47 Uhr schrieb Thomas Schneider < 74cmonty(a)gmail.com>gt;:

...

Hello Martin, I suspect you're using a central syslog server. Can you share information which central syslog server you use? Is this central server running on ceph cluster, too? Regards Thomas Am 23.03.2020 um 09:39 schrieb Martin Verges: Hello Thomas, by default we allocate 1GB per Host on the Management Node, nothing on the PXE booted server. This value can be changed in the management container config file (/config/config.yml):

... logFilesPerServerGB: 1 ...

Hello Martin, how much disk space do you reserve for log in the PXE setup? Regards Thomas Am 22.03.2020 um 20:50 schrieb Martin Verges:

Hello Samuel, we from croit.io don't use NFS to boot up Servers. We copy the OS

directly

into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per

node

and reduce overall costs as 1GB ram is cheaper then an OS disk and

consumes

would

be the mgmt node and then the cluster itself. But again, if you

configure

your systems correct, no manual work is required to recover from that.

For

everything else, it is possible (but definitely not needed) to deploy

our

mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last 10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the

same

image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node

reboot.

No cluster downtime, No service Impact, full automatic handling by our

mgmt

Software. - No need to install OS: No maintenance costs, no labor required, no

other

but

> with PXE boot. > > Do you use a NFS server to serve the root file system for each node?

such

> as hosting configuration files, user and password, log files, etc. My > question is, will the NFS server be a single point of failure? If the

NFS

> server goes down, the network experience any outage, ceph nodes may

not be

> able to write to the local file systems, possibly leading to service

outage.

> > How do you deal with the above potential issues in production? I am a

bit

worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Marc Roos

5:45 a.m.

New subject: Questions on Ceph cluster without OS disks

The default rsyslog in centos has been able to do remote logging for many years. -----Original Message----- Cc: ceph-users Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Martin, I suspect you're using a central syslog server. Can you share information which central syslog server you use? Is this central server running on ceph cluster, too? Regards Thomas Am 23.03.2020 um 09:39 schrieb Martin Verges:

...

Hello Thomas, by default we allocate 1GB per Host on the Management Node, nothing on

...

the PXE booted server. This value can be changed in the management container config file (/config/config.yml):

... logFilesPerServerGB: 1 ...

Hello Samuel, we from croit.io <http://croit.io> don't use NFS to boot up

Servers. We copy the OS directly

into the RAM (approximately 0.5-1GB). Think of it like a

container, you

start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more

storage per node

and reduce overall costs as 1GB ram is cheaper then an OS disk

and consumes

less power. If our management node is down, nothing will happen to the

cluster. No

impact, no downtime. However, you do need the mgmt node to boot

up the

cluster. So after a very rare total power outage, your first

system would

be the mgmt node and then the cluster itself. But again, if you

configure

your systems correct, no manual work is required to recover from

that. For

everything else, it is possible (but definitely not needed) to

deploy our

mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20

years of

datacenter hosting experience and it never ever failed us in the

last >10

years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same

version,

same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has

exactly the same

image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a

node reboot.

No cluster downtime, No service Impact, full automatic handling

by our mgmt

Software. - No need to install OS: No maintenance costs, no labor

required, no other > OS management required. > - Centralized Logs/Stats: As it is booted in memory, all logs

and

...

statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300

nodes, all

boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to

offer an

<mailto:huxiaoyu@horebdata.cn> <

huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>>: > Hello， Martin， > > I notice that Croit advocate the use of ceph cluster without OS

disks, but

> with PXE boot. > > Do you use a NFS server to serve the root file system for each

node? such

> as hosting configuration files, user and password, log files,

etc. My

> question is, will the NFS server be a single point of failure?

If the NFS

> server goes down, the network experience any outage, ceph nodes

may not be

> able to write to the local file systems, possibly leading to

service outage.

> > How do you deal with the above potential issues in production?

I am a bit >> worried... >> >> best regards, >> >> samuel >>

Brent Kennedy

4 Apr 4 Apr

8:59 p.m.

New subject: Questions on Ceph cluster without OS disks

Forgive me for asking but it seems most OS's require a swap file and when I look into doing something similar(meaning not having anything), they all say the OS could go unstable without it. It seems that anyone doing this needs to be 100 certain memory will not be used at 100% ever or the OS would crash if no swap was there. How are you getting around this and has it ever been a thing? Also, for the ceph OSDs, where are you storing the osd and host configurations ( central storage? )? Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways -----Original Message----- From: Martin Verges <martin.verges(a)croit.io> Sent: Sunday, March 22, 2020 3:50 PM To: huxiaoyu(a)horebdata.cn Cc: ceph-users <ceph-users(a)ceph.io> Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Samuel, we from croit.io don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our mgmt Software. - No need to install OS: No maintenance costs, no labor required, no other OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < huxiaoyu(a)horebdata.cn>gt;:

...

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Anthony D'Atri

9:21 p.m.

New subject: Questions on Ceph cluster without OS disks

Linuxes don’t require swap at least, maybe BSDs still do but I haven’t run one since the mid 90s. Back in the day swap had to be at least the size of physmem, but we’re talking SunOS 4.14 days. Swap IMHO has been moot for years. It dates to a time when RAM was much more expensive and capacities less. If you run out of physmem and swap, the real answer is to get more physmem or bound your processes. > On Apr 4, 2020, at 5:01 PM, Brent Kennedy <bkennedy(a)cfl.rr.com> wrote: > > Forgive me for asking but it seems most OS's require a swap file and when I look into doing something similar(meaning not having anything), they all say the OS could go unstable without it. It seems that anyone doing this needs to be 100 certain memory will not be used at 100% ever or the OS would crash if no swap was there. How are you getting around this and has it ever been a thing? > > Also, for the ceph OSDs, where are you storing the osd and host configurations ( central storage? )? > > Regards, > -Brent > > Existing Clusters: > Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) > US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways > UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways > US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways > > > > > -----Original Message----- > From: Martin Verges <martin.verges(a)croit.io> > Sent: Sunday, March 22, 2020 3:50 PM > To: huxiaoyu(a)horebdata.cn > Cc: ceph-users <ceph-users(a)ceph.io> > Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks > > Hello Samuel, > > we from croit.io don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. > This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. > > If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. > > We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. > > The main benefits out of that: > - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... > - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. > - Easy Update: Updating OS, Ceph or anything else is just a node reboot. > No cluster downtime, No service Impact, full automatic handling by our mgmt Software. > - No need to install OS: No maintenance costs, no labor required, no other OS management required. > - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. > - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. > .. lots more > > Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.verges(a)croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > >> Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < >> huxiaoyu(a)horebdata.cn>gt;: >> >> Hello， Martin， >> >> I notice that Croit advocate the use of ceph cluster without OS disks, >> but with PXE boot. >> >> Do you use a NFS server to serve the root file system for each node? >> such as hosting configuration files, user and password, log files, >> etc. My question is, will the NFS server be a single point of failure? >> If the NFS server goes down, the network experience any outage, ceph >> nodes may not be able to write to the local file systems, possibly leading to service outage. >> >> How do you deal with the above potential issues in production? I am a >> bit worried... >> >> best regards, >> >> samuel >> >> >> >> >> ------------------------------ >> huxiaoyu(a)horebdata.cn >> >> >> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

Martin Verges

5 Apr 5 Apr

4:04 a.m.

New subject: Questions on Ceph cluster without OS disks

Hello Brent, no, swap is definitely not needed if you configure systems correctly. Swap in Ceph kills all your performance and brings a lot of harm to clusters. It increases the downtime, decreases the performance and can result in much longer recovery times which endangers your data. In the very old times, swap was required as you were unable to have enough memory in your systems. Today's server does not require a swap partition and I personally disable it on all my systems in the past >10y. As my last company was a datacenter provider with multiple thousand systems, I believe to have quite some insights if that is stable. What happens if you run out of memory you might ask? - simple, OOM killer kills one process and systemd restarts it, service is back up in a few seconds. Can you choose what process is killed most likely? - yes you can. Take a look into /proc/*/oom_adj What happens if I swap gets filled up? - total destruction ;), your OOM killer kills one process, freeing up swap takes a much longer time, system load skyrocks, services become unresponsive, Ceph client IO can drop to near zero... just save yourself the trouble. So yes, we strongly believe to have a far superior system by design by just preventing swap at all. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 5. Apr. 2020 um 01:59 Uhr schrieb Brent Kennedy <bkennedy(a)cfl.rr.com

...

> Forgive me for asking but it seems most OS's require a swap file and when > I look into doing something similar(meaning not having anything), they all > say the OS could go unstable without it. It seems that anyone doing this > needs to be 100 certain memory will not be used at 100% ever or the OS > would crash if no swap was there. How are you getting around this and has > it ever been a thing? > > Also, for the ceph OSDs, where are you storing the osd and host > configurations ( central storage? )? > > Regards, > -Brent > > Existing Clusters: > Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi > gateways ( all virtual on nvme ) > US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 > gateways, 2 iscsi gateways > UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways > US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 > gateways, 2 iscsi gateways > > > > > -----Original Message----- > From: Martin Verges <martin.verges(a)croit.io> > Sent: Sunday, March 22, 2020 3:50 PM > To: huxiaoyu(a)horebdata.cn > Cc: ceph-users <ceph-users(a)ceph.io> > Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks > > Hello Samuel, > > we from croit.io don't use NFS to boot up Servers. We copy the OS > directly into the RAM (approximately 0.5-1GB). Think of it like a > container, you start it and throw it away when you no longer need it. > This way we can save the slots of OS harddisks to add more storage per > node and reduce overall costs as 1GB ram is cheaper then an OS disk and > consumes less power. > > If our management node is down, nothing will happen to the cluster. No > impact, no downtime. However, you do need the mgmt node to boot up the > cluster. So after a very rare total power outage, your first system would > be the mgmt node and then the cluster itself. But again, if you configure > your systems correct, no manual work is required to recover from that. For > everything else, it is possible (but definitely not needed) to deploy our > mgmt node in active/passive HA. > > We have multiple hundred installations worldwide in production > environments. Our strong PXE knowledge comes from more than 20 years of > datacenter hosting experience and it never ever failed us in the last >10 > years. > > The main benefits out of that: > - Immutable OS freshly booted: Every host has exactly the same version, > same library, kernel, Ceph versions,... > - OS is heavily tested by us: Every croit deployment has exactly the same > image. We can find errors much faster and hit much fewer errors. > - Easy Update: Updating OS, Ceph or anything else is just a node reboot. > No cluster downtime, No service Impact, full automatic handling by our > mgmt Software. > - No need to install OS: No maintenance costs, no labor required, no > other OS management required. > - Centralized Logs/Stats: As it is booted in memory, all logs and > statistics are collected on a central place for easy access. > - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot > the exact same image in a few seconds. > .. lots more > > Please do not hesitate to contact us directly. We always try to offer an > excellent service and are strongly customer oriented. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.verges(a)croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich > HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < > huxiaoyu(a)horebdata.cn

...

> > > Hello， Martin， > > > > I notice that Croit advocate the use of ceph cluster without OS disks, > > but with PXE boot. > > > > Do you use a NFS server to serve the root file system for each node? > > such as hosting configuration files, user and password, log files, > > etc. My question is, will the NFS server be a single point of failure? > > If the NFS server goes down, the network experience any outage, ceph > > nodes may not be able to write to the local file systems, possibly > leading to service outage. > > > > How do you deal with the above potential issues in production? I am a > > bit worried... > > > > best regards, > > > > samuel > > > > > > > > > > ------------------------------ > > huxiaoyu(a)horebdata.cn > > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > >

Brent Kennedy

3:13 p.m.

New subject: Questions on Ceph cluster without OS disks

I agree with the sentiment regarding swap, however it seems the OS devs still suggest having a swap, even if its small. We monitor swap file usage and there is none in the ceph clusters, I am mainly looking at eliminating it(assuming its “safe” to do so), but don’t want to risk production machines just to save some OS space on disk. However, the idea of loading the OS into memory is very interesting to me, at least in the instance of a production environment. Not that it’s a new thing, more so in the use case of ceph clusters. We already run all the command and control on VMs, so running the OSD host server OS’s in memory seems like a nifty idea to allow us to fully use every disk bay. We have some older 620s that use an SD card on mirror( which is not super reliable in practice ), they might be good candidates for this. I am just wondering how we would drop in the correct ceph configuration files during boot without needing to do tons of scripting ( the clusters are 15-20 machines ). -Brent From: Martin Verges <martin.verges(a)croit.io> Sent: Sunday, April 5, 2020 3:04 AM To: Brent Kennedy <bkennedy(a)cfl.rr.com> Cc: huxiaoyu(a)horebdata.cn; ceph-users <ceph-users(a)ceph.io> Subject: Re: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Brent, no, swap is definitely not needed if you configure systems correctly. Swap in Ceph kills all your performance and brings a lot of harm to clusters. It increases the downtime, decreases the performance and can result in much longer recovery times which endangers your data. In the very old times, swap was required as you were unable to have enough memory in your systems. Today's server does not require a swap partition and I personally disable it on all my systems in the past >10y. As my last company was a datacenter provider with multiple thousand systems, I believe to have quite some insights if that is stable. What happens if you run out of memory you might ask? - simple, OOM killer kills one process and systemd restarts it, service is back up in a few seconds. Can you choose what process is killed most likely? - yes you can. Take a look into /proc/*/oom_adj What happens if I swap gets filled up? - total destruction ;), your OOM killer kills one process, freeing up swap takes a much longer time, system load skyrocks, services become unresponsive, Ceph client IO can drop to near zero... just save yourself the trouble. So yes, we strongly believe to have a far superior system by design by just preventing swap at all. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 5. Apr. 2020 um 01:59 Uhr schrieb Brent Kennedy <bkennedy(a)cfl.rr.com <mailto:bkennedy@cfl.rr.com> >: Forgive me for asking but it seems most OS's require a swap file and when I look into doing something similar(meaning not having anything), they all say the OS could go unstable without it. It seems that anyone doing this needs to be 100 certain memory will not be used at 100% ever or the OS would crash if no swap was there. How are you getting around this and has it ever been a thing? Also, for the ceph OSDs, where are you storing the osd and host configurations ( central storage? )? Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways -----Original Message----- From: Martin Verges <martin.verges(a)croit.io <mailto:martin.verges@croit.io> > Sent: Sunday, March 22, 2020 3:50 PM To: huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> Cc: ceph-users <ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> > Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Samuel, we from croit.io <http://croit.io> don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last >10 years. The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our mgmt Software. - No need to install OS: No maintenance costs, no labor required, no other OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> < huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> >:

...

Martin Verges

4:23 p.m.

New subject: Questions on Ceph cluster without OS disks

Hello Brent, just use https://pages.croit.io/croit/v2002/getting-started/installation.html our free community edition provides all the logic and you can use that to have a reliable pxe ceph system. If you want to see it in action, please feel free to contact me and I will give you a live presentation and answer all your questions. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 5. Apr. 2020 um 20:13 Uhr schrieb Brent Kennedy <bkennedy(a)cfl.rr.com

...

> I agree with the sentiment regarding swap, however it seems the OS devs > still suggest having a swap, even if its small. We monitor swap file usage > and there is none in the ceph clusters, I am mainly looking at eliminating > it(assuming its “safe” to do so), but don’t want to risk production > machines just to save some OS space on disk. However, the idea of loading > the OS into memory is very interesting to me, at least in the instance of a > production environment. Not that it’s a new thing, more so in the use case > of ceph clusters. We already run all the command and control on VMs, so > running the OSD host server OS’s in memory seems like a nifty idea to allow > us to fully use every disk bay. We have some older 620s that use an SD > card on mirror( which is not super reliable in practice ), they might be > good candidates for this. I am just wondering how we would drop in the > correct ceph configuration files during boot without needing to do tons of > scripting ( the clusters are 15-20 machines ). > > > > -Brent > > > > *From:* Martin Verges <martin.verges(a)croit.io> > *Sent:* Sunday, April 5, 2020 3:04 AM > *To:* Brent Kennedy <bkennedy(a)cfl.rr.com> > *Cc:* huxiaoyu(a)horebdata.cn; ceph-users <ceph-users(a)ceph.io> > *Subject:* Re: [ceph-users] Re: Questions on Ceph cluster without OS disks > > > > Hello Brent, > > > > no, swap is definitely not needed if you configure systems correctly. > > Swap in Ceph kills all your performance and brings a lot of harm to > clusters. It increases the downtime, decreases the performance and can > result in much longer recovery times which endangers your data. > > > > In the very old times, swap was required as you were unable to have enough > memory in your systems. Today's server does not require a swap partition > and I personally disable it on all my systems in the past >10y. As my last > company was a datacenter provider with multiple thousand systems, I > believe to have quite some insights if that is stable. > > > > What happens if you run out of memory you might ask? - simple, OOM killer > kills one process and systemd restarts it, service is back up in a few > seconds. > > Can you choose what process is killed most likely? - yes you can. Take a > look into /proc/*/oom_adj > > What happens if I swap gets filled up? - total destruction ;), your OOM > killer kills one process, freeing up swap takes a much longer time, system > load skyrocks, services become unresponsive, Ceph client IO can drop to > near zero... just save yourself the trouble. > > > > So yes, we strongly believe to have a far superior system by design > by just preventing swap at all. > > > -- > > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.verges(a)croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > > > > > Am So., 5. Apr. 2020 um 01:59 Uhr schrieb Brent Kennedy < > bkennedy(a)cfl.rr.com

...

> > Forgive me for asking but it seems most OS's require a swap file and when > I look into doing something similar(meaning not having anything), they all > say the OS could go unstable without it. It seems that anyone doing this > needs to be 100 certain memory will not be used at 100% ever or the OS > would crash if no swap was there. How are you getting around this and has > it ever been a thing? > > Also, for the ceph OSDs, where are you storing the osd and host > configurations ( central storage? )? > > Regards, > -Brent > > Existing Clusters: > Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi > gateways ( all virtual on nvme ) > US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 > gateways, 2 iscsi gateways > UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways > US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 > gateways, 2 iscsi gateways > > > > > -----Original Message----- > From: Martin Verges <martin.verges(a)croit.io> > Sent: Sunday, March 22, 2020 3:50 PM > To: huxiaoyu(a)horebdata.cn > Cc: ceph-users <ceph-users(a)ceph.io> > Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks > > Hello Samuel, > > we from croit.io don't use NFS to boot up Servers. We copy the OS > directly into the RAM (approximately 0.5-1GB). Think of it like a > container, you start it and throw it away when you no longer need it. > This way we can save the slots of OS harddisks to add more storage per > node and reduce overall costs as 1GB ram is cheaper then an OS disk and > consumes less power. > > If our management node is down, nothing will happen to the cluster. No > impact, no downtime. However, you do need the mgmt node to boot up the > cluster. So after a very rare total power outage, your first system would > be the mgmt node and then the cluster itself. But again, if you configure > your systems correct, no manual work is required to recover from that. For > everything else, it is possible (but definitely not needed) to deploy our > mgmt node in active/passive HA. > > We have multiple hundred installations worldwide in production > environments. Our strong PXE knowledge comes from more than 20 years of > datacenter hosting experience and it never ever failed us in the last >10 > years. > > The main benefits out of that: > - Immutable OS freshly booted: Every host has exactly the same version, > same library, kernel, Ceph versions,... > - OS is heavily tested by us: Every croit deployment has exactly the same > image. We can find errors much faster and hit much fewer errors. > - Easy Update: Updating OS, Ceph or anything else is just a node reboot. > No cluster downtime, No service Impact, full automatic handling by our > mgmt Software. > - No need to install OS: No maintenance costs, no labor required, no > other OS management required. > - Centralized Logs/Stats: As it is booted in memory, all logs and > statistics are collected on a central place for easy access. > - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot > the exact same image in a few seconds. > .. lots more > > Please do not hesitate to contact us directly. We always try to offer an > excellent service and are strongly customer oriented. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.verges(a)croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich > HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > > Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn < > huxiaoyu(a)horebdata.cn

...

Marc Roos

5:35 p.m.

New subject: Questions on Ceph cluster without OS disks

If you want to have a swap, why not create a ramdisk and then format/use it as swap? -----Original Message----- From: Brent Kennedy [mailto:bkennedy@cfl.rr.com] Sent: 05 April 2020 20:13 To: 'Martin Verges' Cc: 'ceph-users' Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks I agree with the sentiment regarding swap, however it seems the OS devs still suggest having a swap, even if its small. We monitor swap file usage and there is none in the ceph clusters, I am mainly looking at eliminating it(assuming its “safe” to do so), but don’t want to risk production machines just to save some OS space on disk. However, the idea of loading the OS into memory is very interesting to me, at least in the instance of a production environment. Not that it’s a new thing, more so in the use case of ceph clusters. We already run all the command and control on VMs, so running the OSD host server OS’s in memory seems like a nifty idea to allow us to fully use every disk bay. We have some older 620s that use an SD card on mirror( which is not super reliable in practice ), they might be good candidates for this. I am just wondering how we would drop in the correct ceph configuration files during boot without needing to do tons of scripting ( the clusters are 15-20 machines ). -Brent From: Martin Verges <martin.verges(a)croit.io> Sent: Sunday, April 5, 2020 3:04 AM To: Brent Kennedy <bkennedy(a)cfl.rr.com> Cc: huxiaoyu(a)horebdata.cn; ceph-users <ceph-users(a)ceph.io> Subject: Re: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Brent, no, swap is definitely not needed if you configure systems correctly. Swap in Ceph kills all your performance and brings a lot of harm to clusters. It increases the downtime, decreases the performance and can result in much longer recovery times which endangers your data. In the very old times, swap was required as you were unable to have enough memory in your systems. Today's server does not require a swap partition and I personally disable it on all my systems in the past

...

10y. As my last company was a datacenter provider with multiple

thousand systems, I believe to have quite some insights if that is stable. What happens if you run out of memory you might ask? - simple, OOM killer kills one process and systemd restarts it, service is back up in a few seconds. Can you choose what process is killed most likely? - yes you can. Take a look into /proc/*/oom_adj What happens if I swap gets filled up? - total destruction ;), your OOM killer kills one process, freeing up swap takes a much longer time, system load skyrocks, services become unresponsive, Ceph client IO can drop to near zero... just save yourself the trouble. So yes, we strongly believe to have a far superior system by design by just preventing swap at all. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am So., 5. Apr. 2020 um 01:59 Uhr schrieb Brent Kennedy <bkennedy(a)cfl.rr.com <mailto:bkennedy@cfl.rr.com> >: Forgive me for asking but it seems most OS's require a swap file and when I look into doing something similar(meaning not having anything), they all say the OS could go unstable without it. It seems that anyone doing this needs to be 100 certain memory will not be used at 100% ever or the OS would crash if no swap was there. How are you getting around this and has it ever been a thing? Also, for the ceph OSDs, where are you storing the osd and host configurations ( central storage? )? Regards, -Brent Existing Clusters: Test: Nautilus 14.2.2 with 3 osd servers, 1 mon/man, 1 gateway, 2 iscsi gateways ( all virtual on nvme ) US Production(HDD): Nautilus 14.2.2 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.2 with 12 osd servers, 3 mons, 4 gateways US Production(SSD): Nautilus 14.2.2 with 6 osd servers, 3 mons, 3 gateways, 2 iscsi gateways -----Original Message----- From: Martin Verges <martin.verges(a)croit.io <mailto:martin.verges@croit.io> > Sent: Sunday, March 22, 2020 3:50 PM To: huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> Cc: ceph-users <ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> > Subject: [ceph-users] Re: Questions on Ceph cluster without OS disks Hello Samuel, we from croit.io <http://croit.io> don't use NFS to boot up Servers. We copy the OS directly into the RAM (approximately 0.5-1GB). Think of it like a container, you start it and throw it away when you no longer need it. This way we can save the slots of OS harddisks to add more storage per node and reduce overall costs as 1GB ram is cheaper then an OS disk and consumes less power. If our management node is down, nothing will happen to the cluster. No impact, no downtime. However, you do need the mgmt node to boot up the cluster. So after a very rare total power outage, your first system would be the mgmt node and then the cluster itself. But again, if you configure your systems correct, no manual work is required to recover from that. For everything else, it is possible (but definitely not needed) to deploy our mgmt node in active/passive HA. We have multiple hundred installations worldwide in production environments. Our strong PXE knowledge comes from more than 20 years of datacenter hosting experience and it never ever failed us in the last

...

10 years.

The main benefits out of that: - Immutable OS freshly booted: Every host has exactly the same version, same library, kernel, Ceph versions,... - OS is heavily tested by us: Every croit deployment has exactly the same image. We can find errors much faster and hit much fewer errors. - Easy Update: Updating OS, Ceph or anything else is just a node reboot. No cluster downtime, No service Impact, full automatic handling by our mgmt Software. - No need to install OS: No maintenance costs, no labor required, no other OS management required. - Centralized Logs/Stats: As it is booted in memory, all logs and statistics are collected on a central place for easy access. - Easy to scale: It doesn't matter if you boot 3 oder 300 nodes, all boot the exact same image in a few seconds. .. lots more Please do not hesitate to contact us directly. We always try to offer an excellent service and are strongly customer oriented. -- Martin Verges Managing director Mobile: +49 174 9335695 E-Mail: martin.verges(a)croit.io <mailto:martin.verges@croit.io> Chat: https://t.me/MartinVerges croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munich HRB 231263 Web: https://croit.io YouTube: https://goo.gl/PGE1Bx Am Sa., 21. März 2020 um 13:53 Uhr schrieb huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> < huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn> >:

...

Hello， Martin， I notice that Croit advocate the use of ceph cluster without OS disks,

...

but with PXE boot. Do you use a NFS server to serve the root file system for each node? such as hosting configuration files, user and password, log files, etc. My question is, will the NFS server be a single point of failure? If the NFS server goes down, the network experience any outage, ceph nodes may not be able to write to the local file systems, possibly

leading to service outage.

...

How do you deal with the above potential issues in production? I am a bit worried... best regards, samuel ------------------------------ huxiaoyu(a)horebdata.cn <mailto:huxiaoyu@horebdata.cn>

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> To unsubscribe send an email to ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io> _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

1488

days inactive

1502

days old

ceph-users@ceph.io

Manage subscription

14 comments

6 participants

tags (0)

participants (6)

Anthony D'Atri
Brent Kennedy
huxiaoyu＠horebdata.cn
Marc Roos
Martin Verges
Thomas Schneider