On May 29, 2020, at 10:53 AM, Dave Hall
<kdhall(a)binghamton.edu> wrote:
I agree with Paul 100%. Going further - there are many more 'knobs to turn' than
just Jumbo Frames, which makes the problem even harder. Changing any one setting may just
move the bottleneck, or possibly introduce instabilities. In the worst case, one might
tune their Linux system so well that it overruns the switch it's connected to. So
then we have to add more knobs in the switch and see what we can do there, or de-tune
Linux to make it play nice with the switch.
Just to be sure, I will add a disclaimer at the top of my document to emphasize
before/after benchmarking.
-Dave
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
607-760-2328 (Cell)
607-777-4641 (Office)
On 5/29/2020 6:29 AM, Paul Emmerich wrote:
Please do not apply any optimization without
benchmarking *before* and *after* in a somewhat realistic scenario.
No, iperf is likely not a realistic setup because it will usually be limited by available
network bandwidth which is (should) rarely be maxed out on your actual Ceph setup.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io <http://www.croit.io>
Tel: +49 89 1896585 90
On Fri, May 29, 2020 at 2:15 AM Dave Hall <kdhall(a)binghamton.edu
<mailto:kdhall@binghamton.edu>> wrote:
Hello.
A few days ago I offered to share the notes I've compiled on network
tuning. Right now it's a Google Doc:
https://docs.google.com/document/d/1nB5fzIeSgQF0ti_WN-tXhXAlDh8_f8XF9GhU7J1…
I've set it up to allow comments and I'd be glad for questions and
feedback. If Google Docs not an acceptable format I'll try to put
it up
somewhere as HTML or Wiki. Disclosure: some sections were copied
verbatim from other sources.
Regarding the current discussion about iperf, the likely
bottleneck is
buffering. There is a per-NIC output queue set with 'ip link' and
a per
CPU core input queue set with 'sysctl'. Both should be set to some
multiple of the frame size based on calculations related to link
speed
and latency. Jumping from 1500 to 9000 could negatively impact
performance because one buffer or the other might be 1500 bytes
short of
a low multiple of 9000.
It would be interesting to see the iperf tests repeated with
corresponding buffer sizing. I will perform this experiment as
soon as
I complete some day-job tasks.
-Dave
Dave Hall
Binghamton University
kdhall(a)binghamton.edu <mailto:kdhall@binghamton.edu>
607-760-2328 (Cell)
607-777-4641 (Office)
On 5/27/2020 6:51 AM, EDH - Manuel Rios wrote:
Anyone can share their table with other MTU
values?
Also interested into Switch CPU load
KR,
Manuel
-----Mensaje original-----
De: Marc Roos <M.Roos(a)f1-outsourcing.eu
<mailto:M.Roos@f1-outsourcing.eu>>
Enviado el: miércoles, 27 de mayo de 2020 12:01
Para: chris.palmer <chris.palmer(a)pobox.com
<mailto:chris.palmer@pobox.com>>; paul.emmerich
<paul.emmerich(a)croit.io <mailto:paul.emmerich@croit.io>>
CC: amudhan83 <amudhan83(a)gmail.com
<mailto:amudhan83@gmail.com>>; anthony.datri
<anthony.datri(a)gmail.com <mailto:anthony.datri@gmail.com>>;
ceph-users <ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>>;
doustar <doustar(a)rayanexon.ir <mailto:doustar@rayanexon.ir>>;
kdhall <kdhall(a)binghamton.edu <mailto:kdhall@binghamton.edu>>;
sstkadu <sstkadu(a)gmail.com <mailto:sstkadu@gmail.com>>
Asunto: [ceph-users] Re: [External Email] Re:
Ceph Nautius not
working after setting MTU 9000
Interesting table. I have this on a production cluster 10gbit at a
datacenter (obviously doing not that much).
[@]# iperf3 -c 10.0.0.13 -P 1 -M 9000
Connecting to host 10.0.0.13, port 5201
[ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 1.14 GBytes 9.77 Gbits/sec 0 690 KBytes
[ 4] 1.00-2.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.08 MBytes
[ 4] 2.00-3.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
[ 4] 3.00-4.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
[ 4] 4.00-5.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.08 MBytes
[ 4] 5.00-6.00 sec 1.15 GBytes 9.90 Gbits/sec 0 1.21 MBytes
[ 4] 6.00-7.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
[ 4] 7.00-8.00 sec 1.15 GBytes 9.88 Gbits/sec 0 1.21 MBytes
[ 4] 8.00-9.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
[ 4] 9.00-10.00 sec 1.15 GBytes 9.89 Gbits/sec 0 1.21 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec 0
sender
[ 4] 0.00-10.00 sec 11.5 GBytes 9.87 Gbits/sec
receiver
-----Original Message-----
Subject: Re: [ceph-users] Re: [External Email] Re: Ceph Nautius not
working after setting MTU 9000
To elaborate on some aspects that have been mentioned already
and add
some others::
* Test using iperf3.
* Don't try to use jumbos on networks where you don't have
complete
control over every host. This usually includes
the main ceph
network.
It's just too much grief. You can consider
using it for
limited-access
networks (e.g. ceph cluster network, hypervisor
migration
network, etc)
where you know every switch & host is tuned
correctly. (This
works even
when those nets share a vlan trunk with non-jumbo
vlans - just
set the
max value on the trunk itself, and individual
values on each vlan.)
* If you are pinging make sure it doesn't fragment otherwise you
will get misleading results: e.g. ping -M do -s 9000 x.x.x.x
* Do not assume that 9000 is the best value. It depends on your
NICs, your switch, kernel/device parameters, etc. Try different
values
(using iperf3). As an example the results below
are using a
small cheap
Mikrotek 10G switch and HPE 10G NICs. It
highlights how in this
configuration 9000 is worse than 1500, but that 5139 is optimal
yet 5140
is worst. The same pattern (obviously with
different values) was
apparent when multiple tests were run concurrently. Always test
your own
network in a controlled manner. And of course if
you introduce
anything
different later on, test again. With
enterprise-grade kit this
might not
be so common, but always test if you fiddle.
MTU Gbps (actual data transfer values using iperf3) - one
particular
configuration only
9600 8.91 (max value)
9000 8.91
8000 8.91
7000 8.91
6000 8.91
5500 8.17
5200 7.71
5150 7.64
5140 7.62
5139 9.81 (optimal)
5138 9.81
5137 9.81
5135 9.81
5130 9.81
5120 9.81
5100 9.81
5000 9.81
4000 9.76
3000 9.68
2000 9.28
1500 9.37 (default)
Whether any of this will make a tangible difference for ceph is
moot. I
just spend a little time getting the network
stack correct as above,
then leave it. That way I know I am probably getting some
benefit, and
not doing any harm. If you blindly change things
you may well do
harm
that can manifest itself in all sorts of ways
outside of Ceph.
Getting
some test results for this using Ceph will be
easy; getting
MEANINGFUL
results that way will be hard.
Chris
On 27/05/2020 09:25, Marc Roos wrote:
I would not call a ceph page, a random tuning tip. At
least I hope
they
are not. NVMe-only with 100Gbit is not really a standard
setup. I
assume
with such setup you have the luxury to not notice many
optimizations.
What I mostly read is that changing to mtu 9000 will allow
you to
better
saturate the 10Gbit adapter, and I expect this to show on
a low end
busy
cluster. Don't you have any test results of such a setup?
-----Original Message-----
Subject: Re: [ceph-users] Re: [External Email] Re: Ceph
Nautius not
working after setting MTU 9000
Don't optimize stuff without benchmarking *before and
after*,
don't
apply random tuning tipps from the Internet without
benchmarking
them.
My experience with Jumbo frames: 3% performance. On a
NVMe-only
setup
with 100 Gbit/s network.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at
https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io <http://www.croit.io>
Tel: +49 89 1896585 90
On Tue, May 26, 2020 at 7:02 PM Marc Roos
<M.Roos(a)f1-outsourcing.eu <mailto:M.Roos@f1-outsourcing.eu>>
<mailto:M.Roos@f1-outsourcing.eu <mailto:M.Roos@f1-outsourcing.eu>>
wrote:
Look what I have found!!! :)
https://ceph.com/geen-categorie/ceph-loves-jumbo-frames/
-----Original Message-----
From: Anthony D'Atri
[mailto:anthony.datri@gmail.com
<mailto:anthony.datri@gmail.com>]
Sent: maandag 25 mei 2020 22:12
To: Marc Roos
Cc: kdhall; martin.verges; sstkadu; amudhan83;
ceph-users;
doustar
Subject: Re: [ceph-users] Re: [External Email] Re:
Ceph
Nautius not
working after setting MTU 9000
Quick and easy depends on your network infrastructure.
Sometimes
it is
difficult or impossible to retrofit a live cluster
without
disruption.
On May 25, 2020, at 1:03 AM, Marc Roos
<M.Roos(a)f1-outsourcing.eu <mailto:M.Roos@f1-outsourcing.eu>>
<mailto:M.Roos@f1-outsourcing.eu <mailto:M.Roos@f1-outsourcing.eu>>
wrote:
>
>
> I am interested. I am always setting mtu to
9000. To be
honest I
> cannot imagine there is no optimization since
you have less
interrupt
> requests, and you are able x times as much data.
Every time
there
> something written about optimizing the first
thing mention
is
changing
> to the mtu 9000. Because it is quick and easy win.
>
>
>
>
> -----Original Message-----
> From: Dave Hall [mailto:kdhall@binghamton.edu
<mailto:kdhall@binghamton.edu>]
> Sent: maandag 25 mei 2020
5:11
> To: Martin Verges; Suresh Rama
> Cc: Amudhan P; Khodayar Doustar; ceph-users
> Subject: [ceph-users] Re: [External Email] Re:
Ceph Nautius
not
> working after setting MTU 9000
>
> All,
>
> Regarding Martin's observations about Jumbo
Frames....
>
> I have recently been gathering some notes from
various
internet
> sources regarding Linux network performance, and
Linux
performance in
> general, to be applied to a Ceph cluster I
manage but also
to the
rest
> of the Linux server farm I'm responsible for.
>
> In short, enabling Jumbo Frames without also
tuning a
number
of
other
> kernel and NIC attributes will not provide the
performance
increases
> we'd like to see. I have not yet had a chance
to go
through
the
rest
> of the testing I'd like to do, but I can
confirm (via
iperf3)
that
only enabling Jumbo Frames didn't make a
significant
difference.
>
> Some of the other attributes I'm referring to
are
incoming
and
> outgoing buffer sizes at the NIC, IP, and TCP
levels,
interrupt
> coalescing, NIC offload functions that should or
shouldn't
be
turned
> on, packet queuing disciplines (tc), the best
choice of TCP
slow-start
> algorithms, and other TCP features and attributes.
>
> The most off-beat item I saw was something about
adding
IPTABLES
rules
> to bypass CONNTRACK table lookups.
>
> In order to do anything meaningful to assess the
effect of
all of
> these settings I'd like to figure out how to set
them
all
via
Ansible
> - so more to learn before I can give opinions.
>
> --> If anybody has added this type of
configuration to
Ceph
Ansible,
> I'd be glad for some pointers.
>
> I have started to compile a document containing
my notes.
It's
rough,
> but I'd be glad to share if anybody is interested.
>
> -Dave
>
> Dave Hall
> Binghamton University
>
>> On 5/24/2020 12:29 PM, Martin Verges wrote:
>>
>> Just save yourself the trouble. You won't have
any
real
benefit
from
> MTU
>> 9000. It has some smallish, but it is not worth
the
effort,
problems,
> and
>> loss of reliability for most environments.
>> Try it yourself and do some benchmarks,
especially with
your
regular
>> workload on the cluster (not the maximum peak
performance),
then
drop
> the
>> MTU to default ;).
>>
>> Please if anyone has other real world
benchmarks
showing
huge
> differences
>> in regular Ceph clusters, please feel free to
post it
here.
>>
>> --
>> Martin Verges
>> Managing director
>>
>> Mobile: +49 174 9335695
>> E-Mail: martin.verges(a)croit.io
<mailto:martin.verges@croit.io>
>> Chat:
https://t.me/MartinVerges
>>
>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> CEO: Martin Verges - VAT-ID: DE310638492 Com.
register:
Amtsgericht
>> Munich HRB 231263
>>
>> Web:
https://croit.io
>> YouTube:
https://goo.gl/PGE1Bx
>>
>>
>>> Am So., 24. Mai 2020 um 15:54 Uhr schrieb
Suresh
Rama
>> <sstkadu(a)gmail.com
<mailto:sstkadu@gmail.com>>
<mailto:sstkadu@gmail.com
<mailto:sstkadu@gmail.com>> :
>>
>>> Ping with 9000 MTU won't get response as I
said
and it
should
be
> 8972. Glad
>>> it is working but you should know what
happened to
avoid
this
issue
later.
>>
>>> On Sun, May 24, 2020, 3:04 AM Amudhan P
<amudhan83(a)gmail.com
<mailto:amudhan83@gmail.com>>
<mailto:amudhan83@gmail.com
<mailto:amudhan83@gmail.com>>
wrote:
>>>
>>>> No, ping with MTU size 9000 didn't work.
>>>>
>>>> On Sun, May 24, 2020 at 12:26 PM Khodayar Doustar
> <doustar(a)rayanexon.ir
<mailto:doustar@rayanexon.ir>> <mailto:doustar@rayanexon.ir
<mailto:doustar@rayanexon.ir>>
>>> wrote:
>>>
>>>> Does your ping work or not?
>>>>
>>>>
>>>> On Sun, May 24, 2020 at 6:53 AM Amudhan P
<amudhan83(a)gmail.com <mailto:amudhan83@gmail.com>>
<mailto:amudhan83@gmail.com <mailto:amudhan83@gmail.com>>
> wrote:
>>>>>
>>>>>> Yes, I have set setting on the switch side
also.
>>>>>>
>>>>>> On Sat 23 May, 2020, 6:47 PM Khodayar Doustar,
> <doustar(a)rayanexon.ir
<mailto:doustar@rayanexon.ir>> <mailto:doustar@rayanexon.ir
<mailto:doustar@rayanexon.ir>>
>>>>>> wrote:
>>>>>>
>>>>>>> Problem should be with network. When you
change MTU it
should be
>>>> changed
>>>>>>> all over the network, any single hup on
your network
should
>>>>>>> speak
> and
>>>>>>> accept 9000 MTU packets. you can check it
on your
hosts
with
>>> "ifconfig"
>>>>>>> command and there is also equivalent
commands for
other
>>>> network/security
>>>>>>> devices.
>>>>>>>
>>>>>>> If you have just one node which it not
correctly
configured
for
> MTU
>>>> 9000
>>>>>>> it wouldn't work.
>>>>>>>
>>>>>>> On Sat, May 23, 2020 at 2:30 PM
sinan(a)turka.nl <mailto:sinan@turka.nl>
<sinan(a)turka.nl
<mailto:sinan@turka.nl>>
<mailto:sinan@turka.nl
<mailto:sinan@turka.nl>>
>>> wrote:
>>>>>>>> Can the servers/nodes ping eachother
using large
packet
sizes?
>>>>>>> I
>> guess
>>>>>>> not.
>>>>>>>
>>>>>>> Sinan Polat
>>>>>>>
>>>>>>>> Op 23 mei 2020 om 14:21 heeft Amudhan P
<amudhan83(a)gmail.com <mailto:amudhan83@gmail.com>>
<mailto:amudhan83@gmail.com <mailto:amudhan83@gmail.com>>
> het
>>>>>>>> volgende geschreven:
>>>>>>>>> In OSD logs "heartbeat_check: no
reply
from OSD"
>>>>>>>>>
>>>>>>>>>> On Sat, May 23, 2020 at 5:44 PM
Amudhan P
> <amudhan83(a)gmail.com
<mailto:amudhan83@gmail.com>> <mailto:amudhan83@gmail.com
<mailto:amudhan83@gmail.com>>
>>>>>>>>
wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I have set Network switch with MTU
size
9000 and
also in
my
>> netplan
>>>>>>>>> configuration.
>>>>>>>>>
>>>>>>>>> What else needs to be checked?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On Sat, May 23, 2020 at 3:39 PM Wido den
Hollander
<
>> wido(a)42on.com
<mailto:wido@42on.com>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> On 5/23/20 12:02 PM, Amudhan P wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I am using ceph Nautilus in Ubuntu 18.04
working
fine
wit
MTU
>>> size
>>>>>>> 1500
>>>>>>>>>>> (default) recently i tried to update MTU size
to
9000.
>>>>>>>>>>> After
setting Jumbo frame running ceph -s is
timing
out.
>>>>>>>>>> Ceph can
run just fine with an MTU of 9000. But
there
is
>> probably
>>>>>>>>>> something else wrong on the network which is
causing
this.
>>>>>>>>>>
>>>>>>>>>> Check the Jumbo Frames settings on all the
switches as
well
> to
>>>> make
>>>>>>>> sure
>>>>>>>>>>> they forward all the packets.
>>>>>>>>>>>
>>>>>>>>>>> This is definitely not a Ceph issue.
>>>>>>>>>>>
>>>>>>>>>>> Wido
>>>>>>>>>>>
>>>>>>>>>>>> regards
>>>>>>>>>>>> Amudhan P
>>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>>> ceph-users mailing list --
ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To
>>>>>>>>>>> unsubscribe send an email to
ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>
>>>>>>>>>>>>
>>>>>>>>>>>
_______________________________________________
>>>>>>>>>>> ceph-users mailing list --
ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To
unsubscribe
>>>>>>>>>>> send an email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
>>>>>>>>>>>
>>>>>>>>>
_______________________________________________
>>>>>>>>> ceph-users mailing list --
ceph-users(a)ceph.io <mailto:ceph-users@ceph.io> To
unsubscribe
>>>>>>>>> send an email to
ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>
>>>>>>>>
_______________________________________________
>>>>>>>>
ceph-users mailing list --
ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>
To
unsubscribe
>>>>>>>> send an email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
>>>>>>>>
> >>>>
_______________________________________________
> >>>> ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To
unsubscribe
send
>>>> an email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
>>>>
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To
unsubscribe
send an
>>> email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
>>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To
unsubscribe
send an
>> email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
>
_______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To unsubscribe
send
an
> email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
> >
>
_______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io> To unsubscribe
send
an
> email to ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to
ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to
ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
<mailto:ceph-users@ceph.io>
To unsubscribe send an email to
ceph-users-leave(a)ceph.io
<mailto:ceph-users-leave@ceph.io>
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
Virus-free.
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io