I think the idea behind pool size of 1, is that hadoop already writes
copies to 2 other pools(?).
However that leaves the possibility that pg's of these 3 pools can maybe
share an osd, and if that osd fails, you loose data in these pools. I
have no idea what the chances are that the same data of different pools
can end up on the same osd.
-----Original Message-----
To: ceph-users(a)ceph.io
Subject: [ceph-users] HBase/HDFS on Ceph/CephFS
Hi
We have an 3 year old Hadoop cluster - up for refresh - so it is time to
evaluate options. The "only" usecase is running an HBase installation
which is important for us and migrating out of HBase would be a hazzle.
Our Ceph usage has expanded and in general - we really like what we see.
Thus - Can this be "sanely" consolidated somehow? I have seen this:
https://docs.ceph.com/docs/jewel/cephfs/hadoop/
But it seem really-really bogus to me.
It recommends that you set:
pool 3 'hadoop1' rep size 1 min_size 1
Which would - if I understand correct - be disastrous. The Hadoop end
would replicated in 3 across - but within Ceph the replication would be
1.
The 1 replication in ceph means pulling the OSD node would "gaurantee"
the pg's to go inactive - which could be ok - but there is nothing
gauranteeing that the other Hadoop replicas are not served out of the
same OSD-node/pg? In which case - rebooting an OSD node would bring the
hadoop cluster unavailable.
Is anyone serving HBase out of Ceph - how does the stadck and
configuration look? If I went for 3 x replication in both Ceph and HDFS
then it would definately work, but 9x copies of the dataset is a bit
more than what looks feasible at the moment.
Thanks for your reflections/input.
Jesper
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
email to ceph-users-leave(a)ceph.io