SAS vs SATA for OSD

List overview All Threads
Download

newer

older

CRUSH rule for EC 6+2 on 6-node...

OSD Won't Start - LVM IOCTL Error...

Dave Hall

3 Jun 2021 3 Jun '21

6:25 p.m.

Hello, We're planning another batch of OSD nodes for our cluster. Our prior nodes have been 8 x 12TB SAS drives plus 500GB NVMe per HDD. Due to market circumstances and the shortage of drives those 12TB SAS drives are in short supply. Our integrator has offered an option of 8 x 14TB SATA drives (still Enterprise). For Ceph, will the switch to SATA carry a performance difference that I should be concerned about? Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

Show replies by date

Mark Nelson

3 Jun 3 Jun

6:35 p.m.

I suspect the behavior of the controller and the behavior of the drive firmware will end up mattering more than SAS vs SATA. As always it's best if you can test it first before committing to buying a pile of them. Historically I have seen SATA drives that have performed well as far as HDDs go though. Mark On 6/3/21 4:25 PM, Dave Hall wrote:

...

Anthony D'Atri

7:10 p.m.

Agreed. I think oh …. maybe 15-20 years ago there was often a wider difference between SAS and SATA drives, but with modern queuing etc. my sense is that there is less of an advantage. Seek and rotational latency I suspect dwarf interface differences wrt performance. The HBA may be a bigger bottleneck (and way more trouble). 500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with RGW, or as dmcache or something? Depending on your constraints, QLC flash might be more competitive than you think ;) — aad

...

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Dave Hall

7:18 p.m.

New subject: SAS vs SATA for OSD - WAL+DB sizing.

Anthony, I had recently found a reference in the Ceph docs that indicated something like 40GB per TB for WAL+DB space. For a 12TB HDD that comes out to 480GB. If this is no longer the guideline I'd be glad to save a couple dollars. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri <anthony.datri(a)gmail.com> wrote:

...

I suspect the behavior of the controller and the behavior of the drive

firmware will end up mattering more than SAS vs SATA. As always it's best if you can test it first before committing to buying a pile of them. Historically I have seen SATA drives that have performed well as far as HDDs go though.

Mark On 6/3/21 4:25 PM, Dave Hall wrote: > Hello, > > We're planning another batch of OSD nodes for our cluster. Our prior

nodes

> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD. Due to market > circumstances and the shortage of drives those 12TB SAS drives are in

short

supply. Our integrator has offered an option of 8 x 14TB SATA drives (still Enterprise). For Ceph, will the switch to SATA carry a performance difference that I should be concerned about? Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Mark Nelson

8:38 p.m.

New subject: SAS vs SATA for OSD - WAL+DB sizing.

FWIW, those guidelines try to be sort of a one-size-fits-all recommendation that may not apply to your situation. Typically RBD has pretty low metadata overhead so you can get away with smaller DB partitions. 4% should easily be enough. If you are running heavy RGW write workloads with small objects, you will almost certainly use more than 4% for metadata (I've seen worst case up to 50%, but that was before column family sharding which should help to some extent). Having said that, bluestore will roll the higher rocksdb levels over to the slow device and keep the wall, L0, and other lower LSM levels on the fast device. It's not necessarily the end of the world if you end up with some of the more rarely used metadata on the HDD but having it on flash certain is nice. Mark On 6/3/21 5:18 PM, Dave Hall wrote:

...

I suspect the behavior of the controller and the behavior of the drive

Mark On 6/3/21 4:25 PM, Dave Hall wrote: > Hello, > > We're planning another batch of OSD nodes for our cluster. Our prior

nodes

> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD. Due to market > circumstances and the shortage of drives those 12TB SAS drives are in

short

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Dave Hall

9:29 p.m.

New subject: SAS vs SATA for OSD - WAL+DB sizing.

Mark, We are running a mix of RGW, RDB, and CephFS. Our CephFS is pretty big, but we're moving a lot of it to RGW. What prompted me to go looking for a guideline was a high frequency of Spillover warnings as our cluster filled up past the 50% mark. That was with 14.2.9, I think. I understand that some things have changed since, but I think I'd like to have the flexibility and performance of a generous WAL+DB - the cluster is used to store research data, and the usage pattern is tending to change as the research evolves. No telling what our mix will be a year from now. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On Thu, Jun 3, 2021 at 7:39 PM Mark Nelson <mnelson(a)redhat.com> wrote:

...

Anthony, I had recently found a reference in the Ceph docs that indicated

something

like 40GB per TB for WAL+DB space. For a 12TB HDD that comes out to 480GB. If this is no longer the guideline I'd be glad to save a couple dollars. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu On Thu, Jun 3, 2021 at 6:10 PM Anthony D'Atri <anthony.datri(a)gmail.com> wrote: > Agreed. I think oh …. maybe 15-20 years ago there was often a wider > difference between SAS and SATA drives, but with modern queuing etc. my > sense is that there is less of an advantage. Seek and rotational

latency I

> suspect dwarf interface differences wrt performance. The HBA may be a > bigger bottleneck (and way more trouble). > > 500 GB NVMe seems like a lot per HDD, are you using that as WAL+DB with > RGW, or as dmcache or something? > > Depending on your constraints, QLC flash might be more competitive than > you think ;) > > — aad > > >> I suspect the behavior of the controller and the behavior of the drive > firmware will end up mattering more than SAS vs SATA. As always it's

best

if you can test it first before committing to buying a pile of them. Historically I have seen SATA drives that have performed well as far as HDDs go though.

Mark On 6/3/21 4:25 PM, Dave Hall wrote: > Hello, > > We're planning another batch of OSD nodes for our cluster. Our prior

nodes

> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD. Due to market > circumstances and the shortage of drives those 12TB SAS drives are in

short

> supply. > > Our integrator has offered an option of 8 x 14TB SATA drives (still > Enterprise). For Ceph, will the switch to SATA carry a performance > difference that I should be concerned about? > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdhall(a)binghamton.edu > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Anthony D'Atri

9:43 p.m.

New subject: SAS vs SATA for OSD - WAL+DB sizing.

In releases before … Pacific I think, there are certain discrete capacities that DB will actually utilize: the sum of RocksDB levels. Lots of discussion in the archives. AIUI in those releases, with a 500 GB BlueStore WAL+DB device, you’ll with default settings only actually use ~~300 GB most of the time, though the extra might accelerate compaction. With Pacific I believe code was merged that shards OSD RocksDB to make better use of arbitrary partition / devices sizes. With older releases one can (or so I’ve read) game this a bit by carefully adjusting rocksdb.max-bytes-for-level-base; ISTR that Karan did that for his impressive 10 Billion Object exercise. I’ve seen threads on the list over the past couple of years that seemed to show spillover despite the DB device not being fully utilized; I hope that’s since been addressed. My understanding is that with column sharding, compaction only takes place on a fraction of the DB at any one time, so the transient space used for it (and thus prone to spillover) should be lessened. I may of course be out of my Vulcan mind, but HTH. — aad

...

On Jun 3, 2021, at 5:29 PM, Dave Hall <kdhall(a)binghamton.edu> wrote: Mark, We are running a mix of RGW, RDB, and CephFS. Our CephFS is pretty big, but we're moving a lot of it to RGW. What prompted me to go looking for a guideline was a high frequency of Spillover warnings as our cluster filled up past the 50% mark. That was with 14.2.9, I think. I understand that some things have changed since, but I think I'd like to have the flexibility and performance of a generous WAL+DB - the cluster is used to store research data, and the usage pattern is tending to change as the research evolves. No telling what our mix will be a year from now. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu 607-760-2328 (Cell) 607-777-4641 (Office) On Thu, Jun 3, 2021 at 7:39 PM Mark Nelson <mnelson(a)redhat.com> wrote:

Anthony, I had recently found a reference in the Ceph docs that indicated

something

latency I

best

if you can test it first before committing to buying a pile of them. Historically I have seen SATA drives that have performed well as far as HDDs go though. > > Mark > > On 6/3/21 4:25 PM, Dave Hall wrote: >> Hello, >> >> We're planning another batch of OSD nodes for our cluster. Our prior nodes >> have been 8 x 12TB SAS drives plus 500GB NVMe per HDD. Due to market >> circumstances and the shortage of drives those 12TB SAS drives are in short >> supply. >> >> Our integrator has offered an option of 8 x 14TB SATA drives (still >> Enterprise). For Ceph, will the switch to SATA carry a performance >> difference that I should be concerned about? >> >> Thanks. >> >> -Dave >> >> -- >> Dave Hall >> Binghamton University >> kdhall(a)binghamton.edu >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io >> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io _______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

_______________________________________________ ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an email to ceph-users-leave(a)ceph.io

Jamie Fargen

6:43 p.m.

Dave- These are just general observations of how SATA drives operate in storage clusters. It has been a while since I have run a storage cluster with SATA drives, but in the past I did notice that SATA drives would drop off the controllers pretty frequently. Depending on many factors, it may just be a brief outage where the drive wasn't available but recoverable, sometimes it meant going into the controller and rescanning for drives before they could be added back to the system, the worst was one chassis that would mark the drives as failed after the drive dropped off a certain number of times and the vendor could not correct the issue with a firmware update and had to replace the storage chassis. Regards, -Jamie On Thu, Jun 3, 2021 at 5:26 PM Dave Hall <kdhall(a)binghamton.edu> wrote:

...

-- Jamie Fargen Senior Consultant jfargen(a)redhat.com 813-817-4430

1063

days inactive

1064

days old

ceph-users@ceph.io

Manage subscription

7 comments

4 participants

tags (0)

participants (4)

Anthony D'Atri
Dave Hall
Jamie Fargen
Mark Nelson