El 2020-10-21 10:08, Mark Nelson escribió:
> On 10/21/20 7:54 AM, Ing. Luis Felipe Domínguez Vega wrote:
>> El 2020-10-21 08:43, Mark Nelson escribió:
>>> Theoretically we shouldn't be spiking memory as much these days
>>> during
>>> recovery, but the code is complicated and it's tough to reproduce
>>> these kinds of issues in-house. If you happen to catch it in the
>>> act,
>>> do you see the pglog mempool stats also spiking up?
>>>
>>>
>>> Mark
>>>
>>>
>>> On 10/21/20 2:34 AM, Dan van der Ster wrote:
>>>> Hi,
>>>>
>>>> This might be the pglog issue which has been coming up a few times
>>>> on the list.
>>>> If the OSD cannot boot without going OOM, you might have success by
>>>> trimming the pglog, e.g. search this list for "ceph-objectstore-tool
>>>> --op trim-pg-log" for some recipes. The thread "OSDs taking too much
>>>> memory, for pglog" in particular might help.
>>>>
>>>> Cheers, Dan
>>>>
>>>>
>>>>
>>>> On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega
>>>> <luis.dominguez(a)desoft.cu> wrote:
>>>>> Hi, today mi Infra provider has a blackout, then the Ceph was try
>>>>> to
>>>>> recover but are in an inconsistent state because many OSD can
>>>>> recover
>>>>> itself because the kernel kill it by OOM. Even now one OSD that was
>>>>> OK,
>>>>> go down by OOM killed.
>>>>>
>>>>> Even in a server with 32GB RAM the OSD use ALL that and never
>>>>> recover, i
>>>>> think that can be a memory leak, ceph version octopus 15.2.3
>>>>>
>>>>> In: https://pastebin.pl/view/59089adc
>>>>> You can see that buffer_anon get 32GB, but why?? all my cluster is
>>>>> down
>>>>> because that.
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> this https://pastebin.pl/view/59089adc is almost the OSD going to be
>> killed by OOM
>>
>
> Ok, that is very interesting! The OSD memory autotuning code shrank
> the caches to be almost nothing to try and compensate for the huge
> growth in buffer_anon (and to a lesser extent osd_pglog) usage but
> obviously couldn't do anything with that much memory being used. Any
> chance you could create a tracker ticket and paste the memory pool
> info in along with ceph version/etc?
>
>
> https://tracker.ceph.com/
>
>
> Mark
Thanks, https://tracker.ceph.com/issues/47929
Hi all,
There is a huge difference between node exporter and ceph exporter
(prometheus mgr module) data. For example I see there is a 120MB/s write on
my disk from node exporter but ceph exporter says it is 22MB! Also for
latency and IOPS and...
Which one is reliable?
Thanks.
Right, both Norman and I set the pg_num before the pgp_num. For example,
here is my current pool settings:
*"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target
2048 last_change 8458830 lfor 0/0/8445757 flags
hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read
1 application rgw"*
So, when I set:
"*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*"
it returns:
"*set pool 40 pgp_num to 2048*"
But upon checking the pool details again:
"*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7
crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target
2048 last_change 8458870 lfor 0/0/8445757 flags
hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read
1 application rgw*"
and the pgp_num value does not increase. Am I just doing something
totally wrong?
Thanks,
Mac Wynkoop
On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote:
> pg_num and pgp_num need to be the same, not?
>
> 3.5.1. Set the Number of PGs
>
> To set the number of placement groups in a pool, you must specify the
> number of placement groups at the time you create the pool. See Create a
> Pool for details. Once you set placement groups for a pool, you can
> increase the number of placement groups (but you cannot decrease the
> number of placement groups). To increase the number of placement groups,
> execute the following:
>
> ceph osd pool set {pool-name} pg_num {pg_num}
>
> Once you increase the number of placement groups, you must also increase
> the number of placement groups for placement (pgp_num) before your
> cluster will rebalance. The pgp_num should be equal to the pg_num. To
> increase the number of placement groups for placement, execute the
> following:
>
> ceph osd pool set {pool-name} pgp_num {pgp_num}
>
>
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s…
>
> -----Original Message-----
> To: norman
> Cc: ceph-users
> Subject: [ceph-users] Re: pool pgp_num not updated
>
> Hi everyone,
>
> I'm seeing a similar issue here. Any ideas on this?
> Mac Wynkoop,
>
>
>
> On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote:
>
> > Hi guys,
> >
> > When I update the pg_num of a pool, I found it not worked(no
> > rebalanced), anyone know the reason? Pool's info:
> >
> > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule
> > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024
> > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags
> > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd
> > removed_snaps
> > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7]
> > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22
> > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256
> > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor
> > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0
> > application rbd
> >
> > The pgp_num_target is set, but pgp_num not set.
> >
> > I have scale out new OSDs and is backfilling before setting the value,
>
> > is it the reason?
> > _______________________________________________
> > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> > email to ceph-users-leave(a)ceph.io
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an
> email to ceph-users-leave(a)ceph.io
>
>
>
The best F/OSS conference in the southern hemisphere is back again,
virtualized, January 23-25. The CFP is open until November 6. Submit
early, submit often! ;-)
-------- Forwarded Message --------
Subject: [lca-announce] linux.conf.au 2021 - Call for Sessions and
Miniconfs Open
Date: Thu, 15 Oct 2020 20:43:24 +1000
From: linux.conf.au Announcements <lca-announce(a)lists.linux.org.au>
Reply-To: lca-announce(a)lists.linux.org.au
To: lca-announce(a)lists.linux.org.au
We're excited! The linux.conf.au 2021 Call for Sessions and Call for
Miniconfs are now open. They will stay open until 6th November 2020
Anywhere on Earth
(AoE) (https://en.wikipedia.org/wiki/Anywhere_on_Earth). This is only 3
weeks away - so don't delay, get your talks in early!
Our theme is "So what's next?".
We all know we're living through unprecedented change and uncertain
times. How can open source play a role in creating, helping and adapting
to this ongoing change? What new developments in software and coding can
we look forward to in 2021 and beyond?
If you have ideas or developments you'd like to share with the open
source community at linux.conf.au, we'd love to hear from you.
Call for Sessions
The main conference runs on Sunday 24 and Monday 25 January, with
multiple streams catering for a wide range of interest areas.
We invite you to submit a session
(https://linux.conf.au/programme/sessions/) proposal for a talk. Talks
are generally 35-45 minute presentations on a single topic presented in
lecture format.
Call for Miniconfs
Miniconfs are dedicated day-long streams focusing on single topics,
creating a more immersive experience for delegates than a session. We
encourage you to get creative with how you could deliver your Miniconf
virtually!
Running a Miniconf (https://linux.conf.au/programme/miniconfs/) is a
great way to gain experience, provide exposure for your project or
topic, and raise your professional profile. They're a crowd favourite
and an awesome way to kick off the conference.
Miniconfs will run on Saturday 23 January, before the main conference
commences on Sunday.
No need to book flights or hotels
Don't forget: the conference will be a fully online, virtual experience.
This means our speakers will be beaming in from their own homes or
workplaces. The organising team will be able to help speakers with their
tech set-up. Each accepted presenter will have a tech check prior to the
event to smooth out any difficulties and there will be an option to
pre-record presentations.
Have we piqued your interest?
You can find out how to submit your session or miniconf proposals at
https://linux.conf.au/programme/proposals/.
If you have any other questions you can contact us via email at
contact(a)lca2021.linux.org.au.
Timeline
16th October: Call for sessions opens
6th November: Call for sessions closes
January 23 2021: Miniconfs!
January 24-25 2021: Main conference presentations
Tickets will go on sale in the coming weeks. We'll keep you posted.
We're looking forward to reading your submissions.
linux.conf.au 2021 Organising Team
About linux.conf.au 2021
Running since 1999, linux.conf.au is the largest linux and open source
conference in the Asia-Pacific region. The conference provides deeply
technical presentations from industry leaders and experts on a wide
array of subjects relating to open source projects, data and open
government and community engagement.
---
Read this online at
https://lca2021.linux.org.au/news/call-for-sessions-miniconfs-open/
_______________________________________________
lca-announce mailing list
lca-announce(a)lists.linux.org.au
http://lists.linux.org.au/mailman/listinfo/lca-announce
El 2020-10-20 23:17, Anthony D'Atri escribió:
>> On Oct 20, 2020, at 6:23 PM, Ing. Luis Felipe Domínguez Vega
>> <luis.dominguez(a)desoft.cu> wrote:
>>
>> El 2020-10-20 19:33, Anthony D'Atri escribió:
>>> You have a *lot* of peering and recovery going on.
>>> Write a script that monitors available memory on the system and
>>> restarts the OSD process using the most when it crosses some
>>> threshold. Run that on all OSD nodes. OSDs will come up, make some
>>> progress, get restarted, but eventually they’ll sync up.
>>>> On Oct 20, 2020, at 2:57 PM, Ing. Luis Felipe Domínguez Vega
>>>> <luis.dominguez(a)desoft.cu> wrote:
>>>> Hi, today mi Infra provider has a blackout, then the Ceph was try to
>>>> recover but are in an inconsistent state because many OSD can
>>>> recover itself because the kernel kill it by OOM. Even now one OSD
>>>> that was OK, go down by OOM killed.
>>>> Even in a server with 32GB RAM the OSD use ALL that and never
>>>> recover, i think that can be a memory leak, ceph version octopus
>>>> 15.2.3
>>>> In: https://pastebin.pl/view/59089adc
>>>> You can see that buffer_anon get 32GB, but why?? all my cluster is
>>>> down because that.
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users(a)ceph.io
>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io
>> That's the only solution???
>
> I didn’t say that. If you don’t like it, don’t do it. You can also
> try eras’ idea. You’ve offered almost no detail about your cluster or
> hardware.
>
>
>> there is nothing to limit the OSD resource usage on recover?
Sorry, what is the era's idea? And i can send my cluster info, what do
you need to try a solution (already i'm running your suggestion of a
script that restart the OSD on high memory)
Hello, we have integrated Ceph's RGW with LDAP and have authenticated users using the mail attribute successfully. We would like to shift to SSO and are evaluating the new OIDC feature in Ceph together with dexIdP with an LDAP connector as an upstream IdP.
We are trying to understand the flow of the user authentication and how it will effect my current LDAP users buckets which are already created in Ceph as LDAP users.
Will the Ceph RGW be able to pass the token to be verified to the IdP and what type of user will then be created in Ceph? Is this the intended way of OIDC integration?
Thanks for any assistance
Hello
I'm facing an issue with ceph. I cannot run any ceph command. It literally hangs. I need to hit CTRL-C to get this:
^CCluster connection interrupted or timed out
This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get information from the cluster, but now there is no data to graph. Any clue?
BQ_BEGIN
cephadm version
BQ_END
BQ_BEGIN
INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable)
BQ_END
cephadm ls
[
{
"style": "cephadm:v1",
"name": "mon.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d(a)mon.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T19:03:16.759730",
"created": "2020-09-04T23:30:30.250336",
"deployed": "2020-09-04T23:48:20.956277",
"configured": "2020-09-04T23:48:22.100283"
},
{
"style": "cephadm:v1",
"name": "mgr.osswrkprbe001",
"fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d",
"systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d(a)mgr.osswrkprbe001",
"enabled": true,
"state": "running",
"container_id": "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8",
"container_image_name": "docker.io/ceph/ceph:v15.2.1",
"container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2",
"version": "15.2.1",
"started": "2020-10-19T20:43:38.329529",
"created": "2020-09-04T23:30:31.110341",
"deployed": "2020-09-04T23:47:41.604057",
"configured": "2020-09-05T00:00:21.064246"
}
]
Thank you in advance.
Saludos,
EMANUEL CASTELLI
Arquitecto de Información - Gerencia OSS
C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecastelli(a)telecentro.net.ar
Lavardén 157 1er piso. CABA (C1437FBC)
Hi,
I have an existing few RBDs. I would like to create a new RBD Image for PostgreSQL. Do you have any suggestions for such use cases? For example;
Currently defaults are:
Object size (4MB) and Stripe Unit (None)
Features: Deep flatten + Layering + Exclusive Lock + Object Map + FastDiff
Should I use as is or should I use 16KB of object size and different sets of features for PostgreSQL?
Thanks,
Gencer.
Hi,
I've received a warning today morning:
HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space
MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space
mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB)
mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB)
mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB)
It hits the 15GB so I've restarted all the 3 mons, it triggered compaction.
I've also ran this command:
ceph tell mon.`hostname -s` compact on the first node, but it wents down only to 13GB.
du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/
13G total
Anything else I can do to reduce it?
Luminous 12.2.8 is the version.
Thank you in advance.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.