Hello, thank you for your response.
Erasure Coding gets better and we really cannot afford the storage
overhead of x3 replication.
Anyway, as I understand the problem, it is also present with
replication, just less amplified (blocks are not divided between OSDs,
just replicated fully).
Le 2021-02-02 16:50, Steven Pine a écrit :
> You are unlikely to avoid the space amplification bug by using larger
> block sizes. I honestly do not recommend using an EC pool, it is
> generally less performant and EC pools are not as well supported by
> the ceph development community.
>
> On Tue, Feb 2, 2021 at 5:11 AM Gilles Mocellin
> <gilles.mocellin(a)nuagelibre.org> wrote:
>
>> Hello,
>>
>> As we know, with 64k for bluestore_min_alloc_size_hdd (I'm only
>> using
>> HDDs),
>> in certain conditions, especially with erasure coding,
>> there's a leak of space while writing objects smaller than 64k x k
>> (EC:k+m).
>>
>> Every object is divided in k elements, written on different OSD.
>>
>> My main use case is big (40TB) RBD images mounted as XFS filesystems
>> on
>> Linux servers,
>> exposed to our backup software.
>> So, it's mainly big files.
>>
>> My though, but I'd like some other point of view, is that I could
>> deal
>> with the amplification by using bigger block sizes on my XFS
>> filesystems.
>> Instead of reducing bluestore_min_alloc_size_hdd on all OSDs.
>>
>> What do you think ?
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
> --
>
> Steven Pine
>
> E steven.pine(a)webair.com | P 516.938.4100 x
>
> Webair | 501 Franklin Avenue Suite 200, Garden City NY, 11530
>
> webair.com [1]
>
> [2] [3] [4]
>
> NOTICE: This electronic mail message and all attachments transmitted
> with it are intended solely for the use of the addressee and may
> contain legally privileged proprietary and confidential information.
> If the reader of this message is not the intended recipient, or if you
> are an employee or agent responsible for delivering this message to
> the intended recipient, you are hereby notified that any
> dissemination, distribution, copying, or other use of this message or
> its attachments is strictly prohibited. If you have received this
> message in error, please notify the sender immediately by replying to
> this message and delete it from your computer.
>
>
>
> Links:
> ------
> [1] http://webair.com
> [2] https://www.facebook.com/WebairInc/
> [3] https://twitter.com/WebairInc
> [4] https://www.linkedin.com/company/webair
Hi,
After upgrading from 15.2.5 to 15.2.8, I see this health error.
Has anyone seen this? "ceph log last cephadm" doesn't show anything
about it. How can I trace it?
Thanks!
Tony
Hi,
I added a host by "ceph orch host add ceph-osd-5 10.6.10.84 ceph-osd".
I can see the host by "ceph orch host ls", but no devices listed by
"ceph orch device ls ceph-osd-5". I tried "ceph orch device zap
ceph-osd-5 /dev/sdc --force", which works fine. Wondering why no
devices listed? What I am missing here?
Thanks!
Tony
Hi,
With 3 replicas, a pg hs 3 osds. If all those 3 osds are down,
the pg becomes unknow. Is that right?
If those 3 osds are replaced and in and on, is that pg going to
be eventually back to active? Or anything else has to be done
to fix it?
Thanks!
Tony
We have a fairly old cluster that has over time been upgraded to nautilus. We were digging through some things and found 3 bucket indexes without a corresponding bucket. They should have been deleted but somehow were left behind. When we try and delete the bucket index, it will not allow it as the bucket is not found. The bucket index list command works fine though without the bucket. Is there a way to delete the indexes? Maybe somehow relink the bucket so it can be deleted again?
Thanks,
Kevin
Hi Davor,
Use "ceph orch ls osd --format yaml" to have more info about the problems
deploying the osd service, probably that will give you clues about what is
happening. Share the input if you cannot solve the problem:-)
The same command can be used for other services like the node-exporter,
although in that case I think that the problem was a bug fixed a few days
ago.
https://github.com/ceph/ceph/pull/38946
The fix was backported to pacific last week.
BR
--
Juan Miguel Olmo Martínez
Senior Software Engineer
Red Hat <https://www.redhat.com/>
jolmomar(a)redhat.com
<https://www.redhat.com/>
Hi Eugen Block
<https://lists.ceph.io/hyperkitty/users/d8d92a6469954bcd82ced59fcbf701d7/>
useful tips to create OSDs:
1. Check devices availability in your cluster hosts:
# ceph orch device ls
2. Devices not available:
This usually means that you have created lvs in these devices, (I mean the
devices are not cleaned.) A ""cepr orch zap <device>" will fix that.
3. The OSD does not start. Check what is the status with:
ceph orch osd ls --format yaml
--
Juan Miguel Olmo Martínez
Senior Software Engineer
Red Hat <https://www.redhat.com/>
jolmomar(a)redhat.com
<https://www.redhat.com/>
Hi
I've got an old cluster running ceph 10.2.11 with filestore backend. Last
week a PG was reported inconsistent with a scrub error
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 38.20 is active+clean+inconsistent, acting [1778,1640,1379]
1 scrub errors
I first tried 'ceph pg repair' but nothing seemed to happen, then
# rados list-inconsistent-obj 38.20 --format=json-pretty
showed that the problem was on osd 1379. The logs showed that that osd had
read errors so I decided to mark that osd out for replacement. Later on
removed it from the crush map en deleted the osd. My thoughts were that
the missing replica gets backfilled on another osd and everything would be
ok again. It got another osd assigned but the health error stayed
# ceph health detail
HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
pg 38.20 is active+clean+inconsistent, acting [1778,1640,1384]
1 scrub errors
Now I get an error on:
# rados list-inconsistent-obj 38.20 --format=json-pretty
No scrub information available for pg 38.20
error 2: (2) No such file or directory
And if I try
# ceph pg deep-scrub 38.20
instructing pg 38.20 on osd.1778 to deep-scrub
The deepscrub does not get scheduled. Same goes for
# ceph daemon osd.1778 trigger_scrub 38.20 on the storage node
Nothing appears in the logs concerning the scrubbing of PG 38.20. I see in
the log that other PG's get (deep) scrubbed according to the automatic
scheduling
There is no recovery going on but just to be sure I'd set ceph daemon
osd.1778 config set osd_scrub_during_recovery true
Also the load limit is set way higher then the actual system load
I checked the other osds en there are no scrubs going on on these when I
schedule the deep-scrub
I found some report of people that had the same problem. However no
solution was found (for example https://tracker.ceph.com/issues/15781).
Even in mimic and luminous there were sort of the same cases
- Does anyone know what logging I should incraese in order to get more
information as to why my deep-scrub does not get scheduled
- Is there a way in jewel to see the list of scheduled scrubs and their
dates for an osd
- Does someone have advice on how to proceed in clearing this PG error
Thanks for any help
Marcel
Hello everyone,
Could some one please let me know what is the recommended modern kernel disk scheduler that should be used for SSD and HDD osds? The information in the manuals is pretty dated and refer to the schedulers which have been deprecated from the recent kernels.
Thanks
Andrei
Hi,
we're using mainly CephFS to give access to storage.
At all times we can see that all clients combines use "X MiB/s" and "y
op/s" for read and write by using the cli or ceph dashboard.
With a tool like iftop, I can get a bit of insight to which clients most
data 'flows', but it isn't really precise.
is there any way to get a MiB/s and op/s number per CephFS client?
Thanks,
Erwin