October 2020 - ceph-users

by Ing. Luis Felipe Domínguez Vega

El 2020-10-21 10:08, Mark Nelson escribió: > On 10/21/20 7:54 AM, Ing. Luis Felipe Domínguez Vega wrote: >> El 2020-10-21 08:43, Mark Nelson escribió: >>> Theoretically we shouldn't be spiking memory as much these days >>> during >>> recovery, but the code is complicated and it's tough to reproduce >>> these kinds of issues in-house. If you happen to catch it in the >>> act, >>> do you see the pglog mempool stats also spiking up? >>> >>> >>> Mark >>> >>> >>> On 10/21/20 2:34 AM, Dan van der Ster wrote: >>>> Hi, >>>> >>>> This might be the pglog issue which has been coming up a few times >>>> on the list. >>>> If the OSD cannot boot without going OOM, you might have success by >>>> trimming the pglog, e.g. search this list for "ceph-objectstore-tool >>>> --op trim-pg-log" for some recipes. The thread "OSDs taking too much >>>> memory, for pglog" in particular might help. >>>> >>>> Cheers, Dan >>>> >>>> >>>> >>>> On Tue, Oct 20, 2020 at 11:57 PM Ing. Luis Felipe Domínguez Vega >>>> <luis.dominguez(a)desoft.cu> wrote: >>>>> Hi, today mi Infra provider has a blackout, then the Ceph was try >>>>> to >>>>> recover but are in an inconsistent state because many OSD can >>>>> recover >>>>> itself because the kernel kill it by OOM. Even now one OSD that was >>>>> OK, >>>>> go down by OOM killed. >>>>> >>>>> Even in a server with 32GB RAM the OSD use ALL that and never >>>>> recover, i >>>>> think that can be a memory leak, ceph version octopus 15.2.3 >>>>> >>>>> In: https://pastebin.pl/view/59089adc >>>>> You can see that buffer_anon get 32GB, but why?? all my cluster is >>>>> down >>>>> because that. >>>>> _______________________________________________ >>>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users(a)ceph.io >>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> this https://pastebin.pl/view/59089adc is almost the OSD going to be >> killed by OOM >> > > Ok, that is very interesting! The OSD memory autotuning code shrank > the caches to be almost nothing to try and compensate for the huge > growth in buffer_anon (and to a lesser extent osd_pglog) usage but > obviously couldn't do anything with that much memory being used. Any > chance you could create a tracker ticket and paste the memory pool > info in along with ceph version/etc? > > > https://tracker.ceph.com/ > > > Mark Thanks, https://tracker.ceph.com/issues/47929

3 years, 6 months

1
0
0 0

Difference between node exporter and ceph exporter data

by Seena Fallah

Hi all, There is a huge difference between node exporter and ceph exporter (prometheus mgr module) data. For example I see there is a 120MB/s write on my disk from node exporter but ceph exporter says it is 22MB! Also for latency and IOPS and... Which one is reliable? Thanks.

3 years, 6 months

1
0
0 0

Re: pool pgp_num not updated

by Mac Wynkoop

Right, both Norman and I set the pg_num before the pgp_num. For example, here is my current pool settings: *"pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458830 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw"* So, when I set: "*ceph osd pool set hou-ec-1.rgw.buckets.data pgp_num 2048*" it returns: "*set pool 40 pgp_num to 2048*" But upon checking the pool details again: "*pool 40 '*redacted*.rgw.buckets.data' erasure size 9 min_size 7 crush_rule 2 object_hash rjenkins pg_num 2048 pgp_num 1024 pgp_num_target 2048 last_change 8458870 lfor 0/0/8445757 flags hashpspool,ec_overwrites,nodelete,backfillfull stripe_width 24576 fast_read 1 application rgw*" and the pgp_num value does not increase. Am I just doing something totally wrong? Thanks, Mac Wynkoop On Tue, Oct 6, 2020 at 2:32 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > pg_num and pgp_num need to be the same, not? > > 3.5.1. Set the Number of PGs > > To set the number of placement groups in a pool, you must specify the > number of placement groups at the time you create the pool. See Create a > Pool for details. Once you set placement groups for a pool, you can > increase the number of placement groups (but you cannot decrease the > number of placement groups). To increase the number of placement groups, > execute the following: > > ceph osd pool set {pool-name} pg_num {pg_num} > > Once you increase the number of placement groups, you must also increase > the number of placement groups for placement (pgp_num) before your > cluster will rebalance. The pgp_num should be equal to the pg_num. To > increase the number of placement groups for placement, execute the > following: > > ceph osd pool set {pool-name} pgp_num {pgp_num} > > > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/s… > > -----Original Message----- > To: norman > Cc: ceph-users > Subject: [ceph-users] Re: pool pgp_num not updated > > Hi everyone, > > I'm seeing a similar issue here. Any ideas on this? > Mac Wynkoop, > > > > On Sun, Sep 6, 2020 at 11:09 PM norman <norman.kern(a)gmx.com> wrote: > > > Hi guys, > > > > When I update the pg_num of a pool, I found it not worked(no > > rebalanced), anyone know the reason? Pool's info: > > > > pool 21 'openstack-volumes-rs' replicated size 3 min_size 2 crush_rule > > 21 object_hash rjenkins pg_num 1024 pgp_num 512 pgp_num_target 1024 > > autoscale_mode warn last_change 85103 lfor 82044/82044/82044 flags > > hashpspool,nodelete,selfmanaged_snaps stripe_width 0 application rbd > > removed_snaps > > [1~1e6,1e8~300,4e9~18,502~3f,542~11,554~1a,56f~1d7] > > pool 22 'openstack-vms-rs' replicated size 3 min_size 2 crush_rule 22 > > object_hash rjenkins pg_num 512 pgp_num 512 pg_num_target 256 > > pgp_num_target 256 autoscale_mode warn last_change 84769 lfor > > 0/0/55294 flags hashpspool,nodelete,selfmanaged_snaps stripe_width 0 > > application rbd > > > > The pgp_num_target is set, but pgp_num not set. > > > > I have scale out new OSDs and is backfilling before setting the value, > > > is it the reason? > > _______________________________________________ > > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > > email to ceph-users-leave(a)ceph.io > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io > > >

3 years, 6 months

4
11
0 0

Fwd: [lca-announce] linux.conf.au 2021 - Call for Sessions and Miniconfs Open

by Tim Serong

The best F/OSS conference in the southern hemisphere is back again, virtualized, January 23-25. The CFP is open until November 6. Submit early, submit often! ;-) -------- Forwarded Message -------- Subject: [lca-announce] linux.conf.au 2021 - Call for Sessions and Miniconfs Open Date: Thu, 15 Oct 2020 20:43:24 +1000 From: linux.conf.au Announcements <lca-announce(a)lists.linux.org.au> Reply-To: lca-announce(a)lists.linux.org.au To: lca-announce(a)lists.linux.org.au We're excited! The linux.conf.au 2021 Call for Sessions and Call for Miniconfs are now open. They will stay open until 6th November 2020 Anywhere on Earth (AoE) (https://en.wikipedia.org/wiki/Anywhere_on_Earth). This is only 3 weeks away - so don't delay, get your talks in early! Our theme is "So what's next?". We all know we're living through unprecedented change and uncertain times. How can open source play a role in creating, helping and adapting to this ongoing change? What new developments in software and coding can we look forward to in 2021 and beyond? If you have ideas or developments you'd like to share with the open source community at linux.conf.au, we'd love to hear from you. Call for Sessions The main conference runs on Sunday 24 and Monday 25 January, with multiple streams catering for a wide range of interest areas. We invite you to submit a session (https://linux.conf.au/programme/sessions/) proposal for a talk. Talks are generally 35-45 minute presentations on a single topic presented in lecture format. Call for Miniconfs Miniconfs are dedicated day-long streams focusing on single topics, creating a more immersive experience for delegates than a session. We encourage you to get creative with how you could deliver your Miniconf virtually! Running a Miniconf (https://linux.conf.au/programme/miniconfs/) is a great way to gain experience, provide exposure for your project or topic, and raise your professional profile. They're a crowd favourite and an awesome way to kick off the conference. Miniconfs will run on Saturday 23 January, before the main conference commences on Sunday. No need to book flights or hotels Don't forget: the conference will be a fully online, virtual experience. This means our speakers will be beaming in from their own homes or workplaces. The organising team will be able to help speakers with their tech set-up. Each accepted presenter will have a tech check prior to the event to smooth out any difficulties and there will be an option to pre-record presentations. Have we piqued your interest? You can find out how to submit your session or miniconf proposals at https://linux.conf.au/programme/proposals/. If you have any other questions you can contact us via email at contact(a)lca2021.linux.org.au. Timeline 16th October: Call for sessions opens 6th November: Call for sessions closes January 23 2021: Miniconfs! January 24-25 2021: Main conference presentations Tickets will go on sale in the coming weeks. We'll keep you posted. We're looking forward to reading your submissions. linux.conf.au 2021 Organising Team About linux.conf.au 2021 Running since 1999, linux.conf.au is the largest linux and open source conference in the Asia-Pacific region. The conference provides deeply technical presentations from industry leaders and experts on a wide array of subjects relating to open source projects, data and open government and community engagement. --- Read this online at https://lca2021.linux.org.au/news/call-for-sessions-miniconfs-open/ _______________________________________________ lca-announce mailing list lca-announce(a)lists.linux.org.au http://lists.linux.org.au/mailman/listinfo/lca-announce

3 years, 6 months

1
0
0 0

How to see dprintk output

by 展荣臻（信泰）

Hi, There are many dprintk call in crush/mapper.c and crush/builder.c,I want to debug crush algorithm. How I to see output of dprintk?

3 years, 6 months

1
0
0 0

Re: Huge RAM Ussage on OSD recovery

by Ing. Luis Felipe Domínguez Vega

El 2020-10-20 23:17, Anthony D'Atri escribió: >> On Oct 20, 2020, at 6:23 PM, Ing. Luis Felipe Domínguez Vega >> <luis.dominguez(a)desoft.cu> wrote: >> >> El 2020-10-20 19:33, Anthony D'Atri escribió: >>> You have a *lot* of peering and recovery going on. >>> Write a script that monitors available memory on the system and >>> restarts the OSD process using the most when it crosses some >>> threshold. Run that on all OSD nodes. OSDs will come up, make some >>> progress, get restarted, but eventually they’ll sync up. >>>> On Oct 20, 2020, at 2:57 PM, Ing. Luis Felipe Domínguez Vega >>>> <luis.dominguez(a)desoft.cu> wrote: >>>> Hi, today mi Infra provider has a blackout, then the Ceph was try to >>>> recover but are in an inconsistent state because many OSD can >>>> recover itself because the kernel kill it by OOM. Even now one OSD >>>> that was OK, go down by OOM killed. >>>> Even in a server with 32GB RAM the OSD use ALL that and never >>>> recover, i think that can be a memory leak, ceph version octopus >>>> 15.2.3 >>>> In: https://pastebin.pl/view/59089adc >>>> You can see that buffer_anon get 32GB, but why?? all my cluster is >>>> down because that. >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users(a)ceph.io >>>> To unsubscribe send an email to ceph-users-leave(a)ceph.io >> That's the only solution??? > > I didn’t say that. If you don’t like it, don’t do it. You can also > try eras’ idea. You’ve offered almost no detail about your cluster or > hardware. > > >> there is nothing to limit the OSD resource usage on recover? Sorry, what is the era's idea? And i can send my cluster info, what do you need to try a solution (already i'm running your suggestion of a script that restart the OSD on high memory)

3 years, 6 months

1
0
0 0

Ceph OIDC Integration

by technical＠zylacomputing.com

Hello, we have integrated Ceph's RGW with LDAP and have authenticated users using the mail attribute successfully. We would like to shift to SSO and are evaluating the new OIDC feature in Ceph together with dexIdP with an LDAP connector as an upstream IdP. We are trying to understand the flow of the user authentication and how it will effect my current LDAP users buckets which are already created in Ceph as LDAP users. Will the Ceph RGW be able to pass the token to be verified to the IdP and what type of user will then be created in Ceph? Is this the intended way of OIDC integration? Thanks for any assistance

3 years, 6 months

2
7
0 0

Problems with ceph command - Octupus - Ubuntu 16.04

by Emanuel Alejandro Castelli

Hello I'm facing an issue with ceph. I cannot run any ceph command. It literally hangs. I need to hit CTRL-C to get this: ^CCluster connection interrupted or timed out This is on Ubuntu 16.04. Also, I use Graphana with Prometheus to get information from the cluster, but now there is no data to graph. Any clue? BQ_BEGIN cephadm version BQ_END BQ_BEGIN INFO:cephadm:Using recent ceph image ceph/ceph:v15 ceph version 15.2.4 (7447c15c6ff58d7fce91843b705a268a1917325c) octopus (stable) BQ_END cephadm ls [ { "style": "cephadm:v1", "name": "mon.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d(a)mon.osswrkprbe001", "enabled": true, "state": "running", "container_id": "afbe6ef76198bf05ec972e832077849d4a4438bd56f2e177aeb9b11146577baf", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T19:03:16.759730", "created": "2020-09-04T23:30:30.250336", "deployed": "2020-09-04T23:48:20.956277", "configured": "2020-09-04T23:48:22.100283" }, { "style": "cephadm:v1", "name": "mgr.osswrkprbe001", "fsid": "56820176-ae5b-4e58-84a2-442b2fc03e6d", "systemd_unit": "ceph-56820176-ae5b-4e58-84a2-442b2fc03e6d(a)mgr.osswrkprbe001", "enabled": true, "state": "running", "container_id": "1737b2cf46310025c0ae853c3b48400320fb35b0443f6ab3ef3d6cbb10f460d8", "container_image_name": "docker.io/ceph/ceph:v15.2.1", "container_image_id": "bc83a388465f0568dab4501fb7684398dca8b50ca12a342a57f21815721723c2", "version": "15.2.1", "started": "2020-10-19T20:43:38.329529", "created": "2020-09-04T23:30:31.110341", "deployed": "2020-09-04T23:47:41.604057", "configured": "2020-09-05T00:00:21.064246" } ] Thank you in advance. Saludos, EMANUEL CASTELLI Arquitecto de Información - Gerencia OSS C: (+549) 116707-4107 | Interno: 1325 | T-Phone: 7510-1325 | ecastelli(a)telecentro.net.ar Lavardén 157 1er piso. CABA (C1437FBC)

3 years, 6 months

2
6
0 0

2024

2023

2022

2021

2020

2019

ceph-users October 2020