Howdy, the dashboard on our cluster keeps showing LARGE_OMAP_OBJECTS.
I went through this document
https://www.suse.com/support/kb/doc/?id=000019698
I've found that we have a total of 5 buckets, each one is owned by a different user.
From what I have read on this issue it seems to flip flop between this is an actual problem that will cause real world issues to "we just raised the limit in the next version".
Does anyone have any expertise on whether this is an actual problem or if we should just tune the numbers and how do you determine that?
One other quick question: is there a way to add usage information for buckets into mgr for version 14?
Thanks,
-Drew
Hello,
A while back, I was having an issue with an OSD repeatedly crashing. I
ultimately reweighted it to zero and then marked out 'Out'. Since I found
that the logs for thoses crashes match https://tracker.ceph.com/issues/46490
.
Since the OSD is in a 'Safe-to-Destroy' state, I'm wondering the best
course of action - should I just mark it back in? Or should I destroy and
rebuild it. If clearing it in the way I have, in combination with updating
to 14.2.16, will prevent it from misbehaving, why go through the trouble of
destroying and rebuilding?
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
607-760-2328 (Cell)
607-777-4641 (Office)
All,
In looking at the options for setting the default pg autoscale option, I
notice that there is a global option setting and a per-pool option
setting. It seems that the options at the pool level are off, warn, and
on. The same, I assume for the global setting.
Is there a way to get rid of the per-pool setting and set the pool to honor
the global setting? I think I'm looking for 'off, warn, on, or global'.
It seems that once the per-pool option is set for all of one's pools, the
global value is irrelevant. This also implies that in a circumstance where
one would want to temporarily suspend autoscaling it would be required to
modify the setting for each pool and then to modify it back afterward.
Thoughts?
Thanks
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
On Mon, Mar 29, 2021 at 1:44 PM Anthony D'Atri <anthony.datri(a)gmail.com>
wrote:
> Yes the PG autoscalar has a way of reducing PG count way too far. There’s
> a claim that it’s better in Pacific, but I tend to recommend disabling it
> and calculating / setting pg_num manually.
>
> > On Mar 29, 2021, at 9:06 AM, Dave Hall <kdhall(a)binghamton.edu> wrote:
> >
> > Eugen,
> >
> > I didn't really think my cluster was eating itself, but I also didn't
> want
> > to be in denial.
> >
> > Regarding the autoscaler, I really thought that it only went up - I
> didn't
> > expect that it would decrease the number of PGs. Plus, I thought I had
> it
> > turned off. I see now that it's off globally but enabled for this
> > particular pool. Also, I see that the target PG count is lower than the
> > current.
> >
> > I guess you learn something new every day.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall(a)binghamton.edu
> > 607-760-2328 (Cell)
> > 607-777-4641 (Office)
> >
> >
> > On Mon, Mar 29, 2021 at 7:52 AM Eugen Block <eblock(a)nde.ag> wrote:
> >
> >> Hi,
> >>
> >> that sounds like the pg_autoscaler is doing its work. Check with:
> >>
> >> ceph osd pool autoscale-status
> >>
> >> I don't think ceph is eating itself or that you're losing data. ;-)
> >>
> >>
> >> Zitat von Dave Hall <kdhall(a)binghamton.edu>:
> >>
> >>> Hello,
> >>>
> >>> About 3 weeks ago I added a node and increased the number of OSDs in my
> >>> cluster from 24 to 32, and then marked one old OSD down because it was
> >>> frequently crashing. .
> >>>
> >>> After adding the new OSDs the PG count jumped fairly dramatically, but
> >> ever
> >>> since, amidst a continuous low level of rebalancing, the number of PGs
> >> has
> >>> gradually decreased to less by 25% from it's max value. Although I
> don't
> >>> have specific notes, my perception is that the current number of PGs is
> >>> actually lower than it was before I added OSDs.
> >>>
> >>> So what's going on here? It is possible to imagine that my cluster is
> >>> slowly eating itself, and that I'm about to lose 200TB of data. It's
> also
> >>> possible to imagine that this is all due to the gradual optimization of
> >> the
> >>> pools.
> >>>
> >>> Note that the primary pool is an EC 8 + 2 containing about 124TB.
> >>>
> >>> Thanks.
> >>>
> >>> -Dave
> >>>
> >>> --
> >>> Dave Hall
> >>> Binghamton University
> >>> kdhall(a)binghamton.edu
> >>> _______________________________________________
> >>> ceph-users mailing list -- ceph-users(a)ceph.io
> >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list -- ceph-users(a)ceph.io
> >> To unsubscribe send an email to ceph-users-leave(a)ceph.io
> >>
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io
> > To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
>
Hello,
About 3 weeks ago I added a node and increased the number of OSDs in my
cluster from 24 to 32, and then marked one old OSD down because it was
frequently crashing. .
After adding the new OSDs the PG count jumped fairly dramatically, but ever
since, amidst a continuous low level of rebalancing, the number of PGs has
gradually decreased to less by 25% from it's max value. Although I don't
have specific notes, my perception is that the current number of PGs is
actually lower than it was before I added OSDs.
So what's going on here? It is possible to imagine that my cluster is
slowly eating itself, and that I'm about to lose 200TB of data. It's also
possible to imagine that this is all due to the gradual optimization of the
pools.
Note that the primary pool is an EC 8 + 2 containing about 124TB.
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Hello everybody.
I searched in several places and I couldn't find any information about what
the best bucket index and WAL / DB organization would be.
I have several hosts consisting of 12 HDDs and 2 NVMes, and currently one
of the NVMes serves as WAL / DB for the 10 OSDs and the other NVMe is
partitioned in two, serving as 2 OSDs to serve the S3 index pool.
I saw in ceph-ansible a playbook (infrastructure-playbooks / lv-create.yml)
that creates a division where we have an OSD living with a journal on the
same NVMe. The problem is that in lv-vars.yaml used by lv-create.yml it is
said that this only applies to the filestore. Is this correct or can I use
this same structure with bluestore?
Thank you all,
Marcelo.
Hello,
We are in the process of bringing new hardware online that will allow us
to get all of the MGRs, MONs, MDSs, etc. off of our OSD nodes and onto
dedicated management nodes. I've created MGRs and MONs on the new
nodes, and I found procedures for disabling the MONs from the OSD nodes.
Now I'm looking for the correct procedure to remove the MGRs from the
OSD nodes. I haven't found any reference to this in the documentation.
Is it as simple as stopping and disabling the systemd service/target?
Or are there Ceph commands? Do I need to clean up /var/lib/ceph/mgr?
Same questions about MDS in the near term, but I haven't searched the
docs yet.
Thanks.
-Dave
--
Dave Hall
Binghamton University
kdhall(a)binghamton.edu
Hi,
Here is a snippet from top on a node with 10 OSDs.
===========================
MiB Mem : 257280.1 total, 2070.1 free, 31881.7 used, 223328.3 buff/cache
MiB Swap: 128000.0 total, 126754.7 free, 1245.3 used. 221608.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30492 167 20 0 4483384 2.9g 16696 S 6.0 1.2 707:05.25 ceph-osd
35396 167 20 0 4444952 2.8g 16468 S 5.0 1.1 815:58.52 ceph-osd
33488 167 20 0 4161872 2.8g 16580 S 4.7 1.1 496:07.94 ceph-osd
36371 167 20 0 4387792 3.0g 16748 S 4.3 1.2 762:37.64 ceph-osd
39185 167 20 0 5108244 3.1g 16576 S 4.0 1.2 998:06.73 ceph-osd
38729 167 20 0 4748292 2.8g 16580 S 3.3 1.1 895:03.67 ceph-osd
34439 167 20 0 4492312 2.8g 16796 S 2.0 1.1 921:55.50 ceph-osd
31473 167 20 0 4314500 2.9g 16684 S 1.3 1.2 680:48.09 ceph-osd
32495 167 20 0 4294196 2.8g 16552 S 1.0 1.1 545:14.53 ceph-osd
37230 167 20 0 4586020 2.7g 16620 S 1.0 1.1 844:12.23 ceph-osd
===========================
Does it look OK with 2GB free?
I can't tell how that 220GB is used for buffer/cache.
Is that used by OSDs? Is it controlled by configuration or auto scaling based
on physical memory? Any clarifications would be helpful.
Thanks!
Tony
Hello there,
Thank you for your response.
There is no error at syslog, dmesg, or SMART.
# ceph health detail
HEALTH_WARN Too many repaired reads on 2 OSDs
OSD_TOO_MANY_REPAIRS Too many repaired reads on 2 OSDs
osd.29 had 38 reads repaired
osd.16 had 17 reads repaired
How can i clear this waning ?
My ceph is version 14.2.9(clear_shards_repaired is not supported.)
/dev/sdh1 on /var/lib/ceph/osd/ceph-16 type xfs (rw,relatime,attr2,inode64,noquota)
# cat dmesg | grep sdh
[ 12.990728] sd 5:2:3:0: [sdh] 19531825152 512-byte logical blocks: (10.0 TB/9.09 TiB)
[ 12.990728] sd 5:2:3:0: [sdh] Write Protect is off
[ 12.990728] sd 5:2:3:0: [sdh] Mode Sense: 1f 00 00 08
[ 12.990728] sd 5:2:3:0: [sdh] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 13.016616] sdh: sdh1 sdh2
[ 13.017780] sd 5:2:3:0: [sdh] Attached SCSI disk
# ceph tell osd.29 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 6.464404,
"bytes_per_sec": 166100668.21318716,
"iops": 39.60148530320815
}
# ceph tell osd.16 bench
{
"bytes_written": 1073741824,
"blocksize": 4194304,
"elapsed_sec": 9.6168945000000008,
"bytes_per_sec": 111651617.26584397,
"iops": 26.619819942914003
}
Thank you
> On 26 Mar 2021, at 16:04, Anthony D'Atri <anthony.datri(a)gmail.com> wrote:
>
> Did you look at syslog, dmesg, or SMART? Mostly likely the drives are failing.
>
>
>> On Mar 25, 2021, at 9:55 PM, jinguk.kwon(a)ungleich.ch wrote:
>>
>> Hello there,
>>
>> Thank you for advanced.
>> My ceph is ceph version 14.2.9
>> I have a repair issue too.
>>
>> ceph health detail
>> HEALTH_WARN Too many repaired reads on 2 OSDs
>> OSD_TOO_MANY_REPAIRS Too many repaired reads on 2 OSDs
>> osd.29 had 38 reads repaired
>> osd.16 had 17 reads repaired
>>
>> ~# ceph tell osd.16 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "elapsed_sec": 7.1486738159999996,
>> "bytes_per_sec": 150201541.10217974,
>> "iops": 35.81083800844663
>> }
>> ~# ceph tell osd.29 bench
>> {
>> "bytes_written": 1073741824,
>> "blocksize": 4194304,
>> "elapsed_sec": 6.9244327500000002,
>> "bytes_per_sec": 155065672.9246161,
>> "iops": 36.970537406114602
>> }
>>
>> But it looks like those osds are ok. how can i clear this warning ?
>>
>> Best regards
>> JG
>> _______________________________________________
>> ceph-users mailing list -- ceph-users(a)ceph.io
>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>
Hi,
Do I need to update ceph.conf and restart each OSD after adding more MONs?
This is with 15.2.8 deployed by cephadm.
When adding MON, "mon_host" should be updated accordingly.
Given [1], is that update "the monitor cluster’s centralized configuration
database" or "runtime overrides set by an administrator"?
[1] https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#config-sourc…
Thanks!
Tony