Hi Steve,
Thanks, it's an interesting discussion, however I don't think that it's
the same problem, because in my case bluestore eats additional space
during rebalance. And it doesn't seem that Ceph does small overwrites
during rebalance. As I understand it does the opposite: it reads and
writes the whole object... Also I have bluestore_min_alloc_size set to
4K from the beginning and Igor says that it works around that bug...
bug-o-feature. :D
> Hi Vitaliy,
>
> You may be coming across the EC space amplification issue,
>
https://tracker.ceph.com/issues/44213
>
> I am not aware of any recent updates to resolve this issue.
>
> Sincerely,
>
> On Tue, Mar 24, 2020 at 12:53 PM <vitalif(a)yourcmc.ru> wrote:
>
>> Hi.
>>
>> I'm experiencing some kind of a space leak in Bluestore. I use EC,
>> compression and snapshots. First I thought that the leak was caused
>> by
>> "virtual clones" (issue #38184). However, then I got rid of most of
>> the
>> snapshots, but continued to experience the problem.
>>
>> I suspected something when I added a new disk to the cluster and
>> free
>> space in the cluster didn't increase (!).
>>
>> So to track down the issue I moved one PG (34.1a) using upmaps from
>> osd11,6,0 to osd6,0,7 and then back to osd11,6,0.
>>
>> It ate +59 GB after the first move and +51 GB after the second. As I
>>
>> understand this proves that it's not #38184. Devirtualizaton of
>> virtual
>> clones couldn't eat additional space after SECOND rebalance of the
>> same
>> PG.
>>
>> The PG has ~39000 objects, it is EC 2+1 and the compression is
>> enabled.
>> Compression ratio is about ~2.7 in my setup, so the PG should use
>> ~90 GB
>> raw space.
>>
>> Before and after moving the PG I stopped osd0, mounted it with
>> ceph-objectstore-tool with debug bluestore = 20/20 and opened the
>> 34.1a***/all directory. It seems to dump all object extents into the
>> log
>> in that case. So now I have two logs with all allocated extents for
>> osd0
>> (I hope all extents are there). I parsed both logs and added all
>> compressed blob sizes together ("get_ref Blob ... 0x20000 -> 0x...
>> compressed"). But they add up to ~39 GB before first rebalance
>> (34.1as2), ~22 GB after it (34.1as1) and ~41 GB again after the
>> second
>> move (34.1as2) which doesn't indicate a leak.
>>
>> But the raw space usage still exceeds initial by a lot. So it's
>> clear
>> that there's a leak somewhere.
>>
>> What additional details can I provide for you to identify the bug?
>>
>> I posted the same message in the issue tracker,
>>
https://tracker.ceph.com/issues/44731
>>
>> --
>> Vitaliy Filippov
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
>
> Steven Pine
>
>
webair.com [1]
>
> P 516.938.4100 x
>
> E steven.pine(a)webair.com
>
> [2] [3]
>
>
>
> Links:
> ------
> [1]
http://webair.com
> [2]
https://www.facebook.com/WebairInc/
> [3]
https://www.linkedin.com/company/webair