Just for the sake of curiosity, if you do a show all on /cX/vX, what is shown for the VD
properties?
VD0 Properties :
==============
Strip Size = 256 KB
Number of Blocks = 1953374208
VD has Emulated PD = No
Span Depth = 1
Number of Drives Per Span = 1
Write Cache(initial setting) = WriteBack
Disk Cache Policy = Disabled
Encryption = None
Data Protection = Disabled
Active Operations = None
Exposed to OS = Yes
Creation Date = 17-06-2016
Creation Time = 02:49:02 PM
Emulation type = default
Cachebypass size = Cachebypass-64k
Cachebypass Mode = Cachebypass Intelligent
Is LD Ready for OS Requests = Yes
SCSI NAA Id = 600304801bb4c0001ef6ca5ea0fcb283
I'm wondering if the pdcache value must be set at vd creation, as it is a creation
option as well.
If that's the case, maybe consider blowing away one of the SSD vd's and recreating
the vd and OSD, and see if you can measure a difference on that disk specifically in
testing.
It might also be helpful to document some of these values from /cX show all
Version :
=======
Firmware Package Build = 24.7.0-0026
Firmware Version = 4.270.00-3972
Bios Version = 6.22.03.0_4.16.08.00_0x060B0200
Ctrl-R Version = 5.08-0006
Preboot CLI Version = 01.07-05:#%0000
NVDATA Version = 3.1411.00-0009
Boot Block Version = 3.06.00.00-0001
Driver Name = megaraid_sas
Driver Version = 07.703.05.00-rc1
Supported Adapter Operations :
============================
Support Shield State = Yes
Block SSD Write Disk Cache Change = Yes
Support Suspend Resume BG ops = Yes
Support Emergency Spares = Yes
Support Set Link Speed = Yes
Support Boot Time PFK Change = No
Support JBOD = Yes
Supported VD Operations :
=======================
Read Policy = Yes
Write Policy = Yes
IO Policy = Yes
Access Policy = Yes
Disk Cache Policy = Yes
Reconstruction = Yes
Deny Locate = No
Deny CC = No
Allow Ctrl Encryption = No
Enable LDBBM = No
Support FastPath = Yes
Performance Metrics = Yes
Power Savings = No
Support Powersave Max With Cache = No
Support Breakmirror = Yes
Support SSC WriteBack = No
Support SSC Association = No
Support VD Hide = Yes
Support VD Cachebypass = Yes
Support VD discardCacheDuringLDDelete = Yes
Advanced Software Option :
========================
----------------------------------------
Adv S/W Opt Time Remaining Mode
----------------------------------------
MegaRAID FastPath Unlimited -
MegaRAID RAID6 Unlimited -
MegaRAID RAID5 Unlimited -
----------------------------------------
Namely, on my 3108 controller, Block SSD Write Disk Cache Change = Yes, stands out to me.
My controller has SAS HDD's behind it, though so I just may not be running into the
same issue, that may pertain to me.
Also wondering if FastPath is enabled as well. I know on some of the older controllers, it
was a paid feature enable, but they then opened it up for free, though you may need a
software key to enable it (for free).
Just looking to widen the net and hope we catch something.
Reed
On Sep 2, 2020, at 7:38 AM, VELARTIS Philipp Dürhammer
<p.duerhammer(a)velartis.at> wrote:
> I assume you are referencing this parameter?
> storcli /c0/v0 set ssdcaching=<on|off>
> If so, this is for CacheCade, which is
LSI's cache tiering solution, which should both be off and not in use for ceph.
No storcli /cx/vx set pdcache=off is denied because of the lsi setting "Block SSD
Write Disk Cache Change = Yes"
I cannot find any firmware to upload or way to change this
Do you think that disabling the write cache also on the ssd helps a lot (ceph is not
aware of this because 'smartctl -g wcache /dev/sdX shows cache disabled - because the
cache on the lsi is disabled allready)
The only way would be to buy some hba cards and add it to the server. But that’s a lot of
work - not knowing that this will improve the speed a lot.
I am using rbd with hyperconvergenced nodes (4 at the moment) pools are 2 and 3 times
replicated. actually the performance for windows and linux vms with the hdd osd pool was
ok. But with the time getting a little bit more slow. I just want to get ready for the
future. and we plan to put some bigger database servers on the cluster (they are on local
storage at the moment) and therefore I want to increase the random small iops of the
cluster a lot
-----Ursprüngliche Nachricht-----
Von: Reed Dier <reed.dier(a)focusvq.com>
Gesendet: Dienstag, 01. September 2020 23:44
An: VELARTIS Philipp Dürhammer <p.duerhammer(a)velartis.at>
Cc: ceph-users(a)ceph.io
Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? (no extra
journals)
there is an option set in the controller
"Block SSD Write Disk Cache Change = Yes" which does not permit to deactivate
the ssd cache. I could not find any solution in google for this controller (LSI MegaRAID
SAS 9271-8i) to change this setting.
I assume you are referencing this parameter?
storcli /c0/v0 set ssdcaching=<on|off>
If so, this is for CacheCade, which is LSI's cache tiering solution, which should
both be off and not in use for ceph.
Single thread and single iodepth benchmarks will tend to be underwhelming.
Ceph shines with aggregate performance from lots of clients.
And in an odd twist of fate, I typically see better performance on RBD for random
benchmarks rather than sequential benchmarks, as it distributes the load across more
OSD's.
Might also help others offer some pointers for tuning if you describe the
pool/application a bit more.
Ie RBD vs cephfs vs RGW, 3x replicated vs EC, etc.
At least things are trending in a positive direction.
Reed
On Sep 1, 2020, at 4:21 PM, VELARTIS Philipp
Dürhammer <p.duerhammer(a)velartis.at> wrote:
Thank you. I was working in this direction. The situation is a lot better. But I think I
can get still far better.
I could set the controller to writethrough, direct and no read ahead for the ssds.
But I cannot disable the pdcache ☹ there is an option set in the controller "Block
SSD Write Disk Cache Change = Yes" which does not permit to deactivate the ssd cache.
I could not find any solution in google for this controller (LSI MegaRAID SAS 9271-8i) to
change this setting.
I don’t know how much performance gain it will be to deactivate the ssd cache. At least
the micron 5200max has capacitor so I hope it is safe for data loss in case if power
failure. I wrote a request to lsi / Broadcom if they know how I can change this setting.
This is really annyoing.
I will check the cpu power settings. I rode also somewhere it can improve iops a lot. (if
its bad set)
At the moment I get 600iops 4k random write 1 thread and 1 iodepth. I get 40K - 4k random
iops for some instances with 32iodepth. Its not the world but a lot better then before.
Read around 100k iops. For 16 ssd's and 2 x dual 10G nic.
I was reading that good tunings and hardware config can get more then 2000 iops on single
thread out of the ssds. I know thet ceph does not shine with single thread. But 600 iops
is not very much...
philipp
-----Ursprüngliche Nachricht-----
Von: Reed Dier <reed.dier(a)focusvq.com>
Gesendet: Dienstag, 01. September 2020 22:37
An: VELARTIS Philipp Dürhammer <p.duerhammer(a)velartis.at>
Cc: ceph-users(a)ceph.io
Betreff: Re: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? (no extra
journals)
If using storcli/perccli for manipulating the LSI controller, you can disable the on-disk
write cache with:
storcli /cx/vx set pdcache=off
You can also ensure that you turn off write caching at the controller level with
storcli /cx/vx set iopolicy=direct
storcli /cx/vx set wrcache=wt
You can also tweak the readahead value for the vd if you want, though with an ssd, I
don't think it will be much of an issue.
storcli /cx/vx set rdcache=nora
I'm sure the megacli alternatives are available with some quick searches.
May also want to check your c-states and p-states to make sure there isn't any
aggressive power saving features getting in the way.
Reed
On Aug 31, 2020, at 7:44 AM, VELARTIS Philipp
Dürhammer <p.duerhammer(a)velartis.at> wrote:
We have older LSi Raid controller with no HBA/JBOD option. So we expose the single disks
as raid0 devices. Ceph should not be aware of cache status?
But digging deeper in to it it seems that 1 out of 4 serves is performing a lot better
and has super low commit/applay rates while the other have a lot mor (20+) on heavy
writes. This just applys fore the ssd. For the hdds I cant see a difference...
-----Ursprüngliche Nachricht-----
Von: Frank Schilder <frans(a)dtu.dk>
Gesendet: Montag, 31. August 2020 13:19
An: VELARTIS Philipp Dürhammer <p.duerhammer(a)velartis.at>at>;
'ceph-users(a)ceph.io' <ceph-users(a)ceph.io>
Betreff: Re: Can 16 server grade ssd's be slower then 60 hdds? (no extra journals)
Yes, they can - if volatile write cache is not disabled. There are many threads on this,
also recent. Search for "disable write cache" and/or "disable volatile
write cache".
You will also find different methods of doing this automatically.
Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
________________________________________
From: VELARTIS Philipp Dürhammer <p.duerhammer(a)velartis.at>
Sent: 31 August 2020 13:02:45
To: 'ceph-users(a)ceph.io'
Subject: [ceph-users] Can 16 server grade ssd's be slower then 60 hdds? (no extra
journals)
I have a productive 60 osd's cluster. No extra Journals. Its performing okay. Now I
added an extra ssd Pool with 16 Micron 5100 MAX. And the performance is little slower or
equal to the 60 hdd pool. 4K random as also sequential reads. All on dedicated 2 times 10G
Network. HDDS are still on filestore. SSD on bluestore. Ceph Luminous.
What should be possible 16 ssd's vs. 60 hhd's no extra journals?
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to
ceph-users-leave(a)ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io
To unsubscribe send an email to ceph-users-leave(a)ceph.io