Hi Andrei,
Probably the first thing to check is if you have objects that are under
the min_alloc size. Those objects will result in wasted space as they
will use the full min_alloc size. IE by default a 1K RGW object on HDD
will take 64KB, while on NVMe it will take 16KB. We are considering
possibly setting the min_alloc size in master to 4K now that we've
improved performance of the write path, but there is a trade-off as this
will result in more rocksdb metadata and likely more overhead as the DB
grows. We still have testing we need to perform to see if it's a good
idea as a default value. We are also considering inlining very small
(<4K) objects in the onode itself, but that also will require
significant testing as it may put additional load on the DB as well.
Mark
On 9/26/19 4:58 AM, Andrei Mikhailovsky wrote:
> Hi Georg,
>
> I am having a similar issue with the RGW pool. However, not to the extent of 10x
error rate. In my case, the error rate is a bout 2-3x. My real data usage is around 6TB,
but Ceph uses over 17TB. I have asked this question here, but no one seems to know the
solution and how to go about finding the wasted space and clearing it.
>
> @ceph_guys - does anyone in the company work in the area of finding the bugs that
relate to the wasted space? Anyone could assist us in debugging the fixing our issues?
>
> Thanks
>
> Andrei
>
> ----- Original Message -----
>> From: "Georg F" <georg(a)pace.car>
>> To: ceph-users(a)ceph.io
>> Sent: Thursday, 26 September, 2019 10:50:01
>> Subject: [ceph-users] Raw use 10 times higher than data use
>> Hi all,
>>
>> I've recently moved a 1TiB pool (3TiB raw use) from hdd osds (7) to newly
added
>> nvme osds (14). The hdd osds should be almost empty by now as just small pools
>> reside on them. The pools on the hdd osds in sum store about 25GiB, which
>> should use about 75GiB with a pool size of 3. Wal and db are on separate
>> devices.
>>
>> However the outputs of ceph df and ceph osd df tell a different story:
>>
>> # ceph df
>> RAW STORAGE:
>> CLASS SIZE AVAIL USED RAW USED %RAW USED
>> hdd 19 TiB 18 TiB 775 GiB 782 GiB 3.98
>>
>> # ceph osd df | egrep "(ID|hdd)"
>> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
>> VAR PGS STATUS
>> 8 hdd 2.72392 1.00000 2.8 TiB 111 GiB 10 GiB 111 KiB 1024 MiB 2.7 TiB 3.85
>> 0.60 65 up
>> 6 hdd 2.17914 1.00000 2.3 TiB 112 GiB 11 GiB 83 KiB 1024 MiB 2.2 TiB 4.82
>> 0.75 58 up
>> 3 hdd 2.72392 1.00000 2.8 TiB 114 GiB 13 GiB 71 KiB 1024 MiB 2.7 TiB 3.94
>> 0.62 76 up
>> 5 hdd 2.72392 1.00000 2.8 TiB 109 GiB 7.6 GiB 83 KiB 1024 MiB 2.7 TiB 3.76
>> 0.59 63 up
>> 4 hdd 2.72392 1.00000 2.8 TiB 112 GiB 11 GiB 55 KiB 1024 MiB 2.7 TiB 3.87
>> 0.60 59 up
>> 7 hdd 2.72392 1.00000 2.8 TiB 114 GiB 13 GiB 8 KiB 1024 MiB 2.7 TiB 3.93
>> 0.61 66 up
>> 2 hdd 2.72392 1.00000 2.8 TiB 111 GiB 9.9 GiB 78 KiB 1024 MiB 2.7 TiB 3.84
>> 0.60 69 up
>>
>> The sum of "DATA" is 75,5GiB which is what I am expecting to be used by
the
>> pools. How come the sum of "RAW USE" is 783GiB? More than 10x the size
of the
>> stored data. On my nvme osds the "RAW USE" to "DATA" overhead
is <1%:
>>
>> ceph osd df|egrep "(ID|nvme)"
>> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE
>> VAR PGS STATUS
>> 0 nvme 2.61989 1.00000 2.6 TiB 181 GiB 180 GiB 31 KiB 1.0 GiB 2.4 TiB 6.74
>> 1.05 12 up
>> 1 nvme 2.61989 1.00000 2.6 TiB 151 GiB 150 GiB 39 KiB 1024 MiB 2.5 TiB 5.62
>> 0.88 10 up
>> 13 nvme 2.61989 1.00000 2.6 TiB 239 GiB 238 GiB 55 KiB 1.0 GiB 2.4 TiB 8.89
>> 1.39 16 up
>> -- truncated --
>>
>> I am running ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
>> nautilus (stable) which was upgraded recently from 13.2.1.
>>
>> Any help is appreciated.
>>
>> Best regards,
>> Georg
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io