[ceph-users] Re: HBase/HDFS on Ceph/CephFS

24 Apr 2020

I think the idea behind pool size of 1, is that hadoop already writes 
copies to 2 other pools(?).

However that leaves the possibility that pg's of these 3 pools can maybe 
share an osd, and if that osd fails, you loose data in these pools. I 
have no idea what the chances are that the same data of different pools 
can end up on the same osd.

-----Original Message-----
To: ceph-users(a)ceph.io
Subject: [ceph-users] HBase/HDFS on Ceph/CephFS

Hi

We have an 3 year old Hadoop cluster - up for refresh - so it is time to 
evaluate options. The "only" usecase is running an HBase installation 
which is important for us and migrating out of HBase would be a hazzle.

Our Ceph usage has expanded and in general - we really like what we see.

Thus - Can this be "sanely" consolidated somehow? I have seen this:
https://docs.ceph.com/docs/jewel/cephfs/hadoop/
But it seem really-really bogus to me.

It recommends that you set:
pool 3 'hadoop1' rep size 1 min_size 1

Which would - if I understand correct - be disastrous. The Hadoop end 
would replicated in 3 across - but within Ceph the replication would be 
1.
The 1 replication in ceph means pulling the OSD node would "gaurantee" 
the pg's to go inactive - which could be ok - but there is nothing 
gauranteeing that the other Hadoop replicas are not served out of the 
same OSD-node/pg? In which case - rebooting an OSD node would bring the 
hadoop cluster unavailable.

Is anyone serving HBase out of Ceph - how does the stadck and 
configuration look? If I went for 3 x replication in both Ceph and HDFS 
then it would definately work, but 9x copies of the dataset is a bit 
more than what looks feasible at the moment.

Thanks for your reflections/input.

Jesper
_______________________________________________
ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an 
email to ceph-users-leave(a)ceph.io

2024

2023

2022

2021

2020

2019

[ceph-users] Re: HBase/HDFS on Ceph/CephFS