June 2020 - ceph-users - lists.ceph.io

Radosgw huge traffic to index bucket compared to incoming requests

by Mariusz Gronczewski

Hi, we're using Ceph as S3-compatible storage to serve static files (mostly css/js/images + some videos) and I've noticed that there seem to be huge read amplification for index pool. Incoming traffic magniture is of around 15k req/sec (mostly sub 1MB request but index pool is getting hammered: pool pl-war1.rgw.buckets.index id 10 client io 632 MiB/s rd, 277 KiB/s wr, 129.92k op/s rd, 415 op/s wr pool pl-war1.rgw.buckets.data id 11 client io 4.5 MiB/s rd, 6.8 MiB/s wr, 640 op/s rd, 1.65k op/s wr and is getting order of magnitude more requests running 15.2.3, nothing special in terms of tunning aside from disabling some logging as to not overflow the logs. We've had similar test cluster on 12.x (and way slower hardware) getting similar traffic and haven't observed that magnitude of difference. when enabling debug on affected OSD I only get spam of 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# = 0 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.700+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b1a34d8:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.214:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# = 0 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.704+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.708+0200 7f80694c4700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b0d75b0:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.222:head# 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) omap_get_header 10.51_head oid #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# = 0 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.716+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# 2020-06-17T12:35:05.720+0200 7f806d4cc700 10 bluestore(/var/lib/ceph/osd/ceph-20) get_omap_iterator 10.51_head #10:8b5ed205:::.dir.88d4f221-0da5-444d-81a8-517771278350.454759.8.151:head# -- Mariusz Gronczewski (XANi) <xani666(a)gmail.com> GnuPG: 0xEA8ACE64 http://devrandom.pl

3 years, 10 months

3
5
0 0

Orchestrator: Cannot add node after mistake

by Simon Sutter

Hello, I did a mistake, while deploying a new node on octopus. The node is a fresh installed Centos8 machine. Bevore I did a "ceph orch host add node08" I pasted the wrong command: ceph orch daemon add osd node08:cl_node08/ceph That did not return anything, so I tried to add the node first with the host add command, but now I get an error: Error ENOENT: New host node08 (node08) failed check: ['Traceback (most recent call last):', ' File "<stdin>", line 4580, in <module>', ' File "<stdin>", line 3592, in command_check_host', "UnboundLocalError: local variable 'container_path' referenced before assignment"] I'm not a developer, so I don't know where to look, and how to fix this. I tried to reboot every node, to see if it's just a cached problem, but no luck there. Do any of you know how to fix this? Thanks in advance, Simon

3 years, 10 months

2
3
0 0

Enable msgr2 mon service restarted

by Amit Ghadge

Hello All, We saw all nodes mon services are restarted at the same time after enabling msgr2, So this make an impact on production running cluster? We are upgrading from Luminous to Nautilus. Thanks, AmitG

3 years, 10 months

1
1
0 0

Can't bind mon to v1 port in Octopus.

by Miguel Afonso

Hi, I have a lab single node cluster with octopus installed via ceph-ansible. Both v1 and v2 were enabled in ceph-ansible vars with the correct suffixes. The configuration was generated correctly and both ports were included in the mon array. [global] cluster network = 172.16.6.0/24 fsid = bb204a5c-957d-4a06-a372-redacted mon_host = [v2:172.16.6.210:3300/0,v1:172.16.6.210:6789/0] mon initial members = aio1 mon_pg_warn_max_per_osd = 0 osd pool default crush rule = -1 osd_pool_default_min_size = 1 osd_pool_default_size = 1 public network = 172.16.6.0/24 I can also see that `ms_bind_msgr1` is enabled in the live config. root@aio1 ~ # ceph daemon mon.aio1 config show | grep msgr "mon_warn_on_msgr2_not_enabled": "true", "ms_bind_msgr1": "true", "ms_bind_msgr2": "true", However only v2 is binding netstat -tlnp |grep mon tcp 0 0 172.16.6.210:3300 0.0.0.0:* LISTEN 2039098/ceph-mon I have a client that only speaks v1 (ceph-csi) that can't talk to the v2 port 2020-06-15T09:49:51.330+0100 7f8776038700 -1 --2- v2:172.16.6.210:3300/0 >> conn(0x563bfd6b2000 0x563bde5ff600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner peer is using msgr V1 protocol 2020-06-15T09:49:52.258+0100 7f8776038700 -1 --2- v2:172.16.6.210:3300/0 >> conn(0x563bfd6b2000 0x563bde5ff600 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner peer is using msgr V1 protocol What could be the reason for mon not binding to port 6789? Thanks Miguel

3 years, 10 months

3
3
1 0

Re: ceph grafana dashboards: rbd overview empty

by Zhenshi Zhou

Yep, you should also tell mgr that rbd of which pool you wanna export statistics. Follow this, https://ceph.io/rbd/new-in-nautilus-rbd-performance-monitoring/ Marc Roos <M.Roos(a)f1-outsourcing.eu> 于2020年6月12日周五下午10:33写道： > > The grafana dashboard 'rbd overview' is empty. Queries have measurements > 'ceph_rbd_write_ops' that do not exist in prometheus (I think). Should I > enable something more than just 'ceph mgr module enable prometheus' > > I am on Nautilus > > > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io >

3 years, 10 months

2
1
0 0

ceph mds slow requests

by locallocal

hi,guys. we have a ceph cluster which version is luminous 12.2.13. and Recently we encountered a problem.here are some log infomations: 2020-06-08 12:33:52.706070 7f4097e2d700 0 log_channel(cluster) log [WRN] : slow request 30.518930 seconds old, received at 2020-06-08 12:33:22.186924: client_request(client.48978906:941633993 create #0x100028cab8a/.filename 2020-06-08 12:33:22.197434 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply ... 2020-06-08 13:12:17.826727 7f4097e2d700 0 log_channel(cluster) log [WRN] : slow request 2220.991833 seconds old, received at 2020-06-08 12:35:16.764233: client_request(client.42390705:788369155 create #0x1000224f999/.filename 2020-06-08 12:35:16.774553 caller_uid=0, caller_gid=0{}) currently submit entry: journal_and_reply it looks like mds can't flush journal to osd of meta pool.but the osd type is ssd and the load is very low.this problem leads the client can't mount and the mds can't trim log. Is there anyone have encountered this problem.Please help!

3 years, 10 months

4
7
0 0

Jewel clients on recent cluster

by Christoph Ackermann

Hi all, we have a cluster starting from jewel to octopus nowadays. We would like to enable Upmap but unfortunately there are some old Jewel clients active. We cannot force Upmap by: ceph osd set-require-min-compat-client luminous Because of production state, we must not lose any client. ;-) "client": [ { "features": "0x27018fb86aa42ada", "release": "jewel", "num": 7 }, { "features": "0x3f01cfb8ffadffff", "release": "luminous", "num": 6 } The cluster and all clients are v15.2.3 and my assumtion was, that Centos7 with kernel 3.10 have backported kernel modules. Am i wrong? I also checked Centos7 client with 4.20-ml kernel without success. Clients always appear as Jewel clients... Fresh Centos8 run as Luminous client as expected. BTW: Is there a trick to identify Jewel clients by IP address / Hostname? Thank you much, Christoph -- -- Christoph Ackermann | System Engineer INFOSERVE GmbH | Am Felsbrunnen 15 | D-66119 Saarbrücken Fon +49 (0)681 88008-59 | Fax +49 (0)681 88008-33 | mailto:C.Ackermann@infoserve.de | https://www.infoserve.de INFOSERVE Datenschutzhinweise: https://infoserve.de/datenschutz Handelsregister: Amtsgericht Saarbrücken, HRB 11001 | Erfüllungsort: Saarbrücken Geschäftsführer: Dr. Stefan Leinenbach | Ust-IdNr.: DE168970599

3 years, 10 months

3
3
0 0

OSD heartbeat failure

by neil.ashby-senior＠bt.com

Hi, I have a Luminous (12.2.25) cluster with several OSDs down. The daemons start but they're reporting as down. I did see in some osd logs that heartbeats were failing but when I checked the ports for the heartbeats were incorrect for that osd, although another osd was listening on that. How does the osd know what ports to ping other osds on? Is there any way to force an update. The reason this happened is because someone took a VM snapshot of this cluster and restored the snapshot so the osds aren't up. I know this isn't a good implementation or a good idea and this will change going forward. Anyway, I was just wondering about the heartbeat issue and whether attempting to ping on the right ports might bring them up. Thanks, Neil.

3 years, 10 months

2
1
0 0

How to force backfill on undersized pgs ?

by Kári Bertilsson

Hello I'm running ceph 14.2.9. During heavy backfilling due to rebalancing one OSD crashed. I want to recover the data from the lost OSD before continuing the backfilling so i out'ed the lost osd and ran "ceph osd set norebalance". But i'm noticing with the norebalance flag set the system does not backfill the undersized PG's. Only the degraded ones. So now i have plenty of undersized PG's and the system is idle. How can i recover the undersized PG's before resuming normal backfilling/rebalancing ? Regards, Kári

3 years, 10 months

2
1
0 0

Re: Calculate recovery time

by 展荣臻（信泰）

you can calculate the difference of count of pg on osd before and after to estimate the amount of data migrated. Using the crush algorithm to calculate the difference of count of pg on osd without having to actually add or remove osd. > Date: Thu, 18 Jun 2020 01:18:30 +0430 > From: Seena Fallah <seenafallah(a)gmail.com> > Subject: [ceph-users] Re: Calculate recovery time > To: Janne Johansson <icepic.dz(a)gmail.com> > Cc: ceph-users <ceph-users(a)ceph.io> > Message-ID: > <CAK3+OmWxDZf_g0Ok5AEgtLWP+EujrwAQjauxx6J=xANmM7xchA(a)mail.gmail.com> > Content-Type: text/plain; charset="UTF-8" > > Yes I know but any point of view for backfill or priority used in Ceph when > recovering? > > On Wed, Jun 17, 2020 at 11:00 AM Janne Johansson <icepic.dz(a)gmail.com> > wrote: > > > Den ons 17 juni 2020 kl 02:14 skrev Seena Fallah <seenafallah(a)gmail.com>: > > > >> Hi all. > >> Is there any way that I could calculate how much time it takes to add > >> OSD to my cluster and get rebalanced or how much it takes to out OSD > >> from my cluster? > >> > > > > This is very dependent on all the variables of a cluster, from controller > > & disk speeds, network speeds, cpu/bus speeds, ram availability and/or ram > > allocation, the amount of copies the PGs and the pools are using, how many > > other OSDs there are in the same crush rules as the missing/new one, how > > full the OSDs are in general and the out:ed on specifically and of course > > on if you have few huge objects in your datasets or if you have millions of > > small ones. On top of that, it would be affected by the amount of client IO > > being done at the same time, and in some small sense, might even depend > > ever so slightly on the ability of the mons to react to changes for its own > > database in case the mons are super slow. > > > > This would probably be why you will not just find a fixed number saying > > "it will always take 5h45m for a 4TB drive". It is a problem that has 10 or > > more dimensions. > > But, you could always just out one. The cluster must be able to handle a > > broken drive, so you might aswell test it now, instead of some weekend > > night before that important database run someone at work needs done. > > > > You will see drives that break at some point, and if your dataset is > > anything like everyone elses the last 50 or so years, your data will grow > > so you just might want to get used to the "replace disk" and "add disk" > > procedures right now. > >

3 years, 10 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2020