I've not been replying to the list, apologies.
just the write metadata to the mon, with the actual
write data
content not having to cross a physical ethernet cable but directly to
the chassis-local osds via the 'virtual' internal switch?
This is my understanding as well, yes. I've not explored the ceph source
yet though.
On 2020-06-29 8:37 p.m., Harry G. Coin wrote:
Jeff, thanks for the lead. When a user space rbd write has as a
destination three replica osds in the same chassis, does the whole
write get shipped out to the mon and then back, or just the write
metadata to the mon, with the actual write data content not having to
cross a physical ethernet cable but directly to the chassis-local osds
via the 'virtual' internal switch? I thought when I read the layout
of how ceph works only the control traffic goes to the mons, the data
directly from the generator to the osds. Did I get that wrong?
On 6/29/20 10:32 PM, Jeff W wrote:
> You mentioned setting up pools per host but still hitting network
> limits, did you try tcpdumping the NIC to see who's talking to who?
> Perhaps something isn't configured the way you expect? That may help
> you narrow down what is using the NIC as well, Mon or osd or what
> not. If it's local, I would think that the NIC wouldn't be a
> bottleneck and if it is a bottleneck I would suspect my own configs,
> but that's just my 2c.
>
> Off the top of my head im thinking it's the Mon, because even if you
> setup multiple pools I can't think of a way to have multiple groups
> of mons maintaining their own shards of consensus. Unless your
> workload is largely read only, then .. I'm not sure what the
> bottleneck would be.
>
>
> On Mon., Jun. 29, 2020, 7:32 p.m. Harry G. Coin, <hgcoin(a)gmail.com
> <mailto:hgcoin@gmail.com>> wrote:
>
> I need exactly what ceph is for a whole lot of work, that work just
> doesn't represent a large fraction of the total local traffic.
> Ceph is
> the right choice. Plainly ceph has tremendous support for
> replication
> within a chassis, among chassis and among racks. I just need
> intra-chassis traffic to not hit the net much. Seems not such an
> unreasonable thing given the intra-chassis crush rules and all.
> After
> all.. ceph's name wasn't chosen for where it can't go....
>
> On 6/29/20 1:57 PM, Marc Roos wrote:
> > I wonder if you should not have chosen a different product?
> Ceph is
> > meant to distribute data across nodes, racks, data centers etc.
> For a
> > nail use a hammer, for a screw use a screw driver.
> >
> >
> > -----Original Message-----
> > To: ceph-users(a)ceph.io <mailto:ceph-users@ceph.io>
> > Subject: *****SPAM***** [ceph-users] layout help: need chassis
> local io
> > to minimize net links
> >
> > Hi
> >
> > I have a few servers each with 6 or more disks, with a storage
> workload
> > that's around 80% done entirely within each server. From a
> > work-to-be-done perspective there's no need for 80% of the load to
> > traverse network interfaces, the rest needs what ceph is all
> about. So
> > I cooked up a set of crush maps and pools, one map/pool for
> each server
> > and one map/pool for the whole. Skipping the long story, the
> > performance remains network link speed bound and has got to
> change.
> > "Chassis local" io is too slow. I even tried putting a mon
> within each
> > server. I'd like to avoid having to revert to some other HA
> > filesystem per server with ceph at the chassis layer if I can help
> > it.
> >
> > Any notions that would allow 'chassis local' rbd traffic to
> avoid or
> > mostly avoid leaving the box?
> >
> > Thanks!
> >
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io> To unsubscribe send an
> > email to ceph-users-leave(a)ceph.io <mailto:ceph-users-leave@ceph.io>
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> <mailto:ceph-users@ceph.io>
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> <mailto:ceph-users-leave@ceph.io>
>