June 2020 - ceph-users - lists.ceph.io

by Marcelo Miziara

Hello...it's the first time I need to use the lifecycle, and I created a bucket and set it to expire in one day with s3cmd: s3cmd expire --expiry-days=1 s3://bucket The rgw_lifecycle_work_time is set to the default values(00:00-06:00). But I noticed in the rgw logs a lot of messages like: 2020-06-16 00:00:00.311369 7fe2cac87700 0 RGWLC::process() failed to get obj entry lc.8 2020-06-16 00:00:00.311623 7fe2c8c83700 0 RGWLC::process() failed to get obj entry lc.16 2020-06-16 00:00:00.311862 7fe2c6c7f700 0 RGWLC::process() failed to get obj entry lc.4 2020-06-16 00:00:00.319424 7fe2cac87700 0 RGWLC::process() failed to get obj entry lc.10 2020-06-16 00:00:00.319647 7fe2c8c83700 0 RGWLC::process() failed to get obj entry lc.18 2020-06-16 00:00:00.320682 7fe2c6c7f700 0 RGWLC::process() failed to get obj entry lc.16 2020-06-16 00:00:00.327770 7fe2cac87700 0 RGWLC::process() failed to get obj entry lc.6 2020-06-16 00:00:00.328941 7fe2c8c83700 0 RGWLC::process() failed to get obj entry lc.17 2020-06-16 00:00:00.332463 7fe2c6c7f700 0 RGWLC::process() failed to get obj entry lc.20 2020-06-16 00:00:00.336788 7fe2cac87700 0 RGWLC::process() failed to get obj entry lc.1 2020-06-16 00:00:00.336924 7fe2c8c83700 0 RGWLC::process() failed to get obj entry lc.24 2020-06-16 00:00:00.340915 7fe2c6c7f700 0 RGWLC::process() failed to get obj entry lc.2 The object was deleted, but these messages keep appearing. Is it safe to ignore them? For the records, i'm using redhat luminous 12.2.12 Thanks, Marcelo.

3 years, 9 months

2
1
0 0

find rbd locks by client IP

by Void Star Nill

Hello, Is there a way to list all locks held by a client with the given IP address? Also, I read somewhere that removing the lock with "rbd lock rm..." automatically blacklists that client connection. Is that correct? How do I blacklist a client with the given IP address? Thanks, Shridhar

3 years, 9 months

2
2
0 0

Bluestore performance tuning for hdd with nvme db+wal

by Mark Kirkwood

Hi, We have recently added a new storage node to our Luminous (12.2.13) cluster. The prev nodes are all setup as Filestore: e.g 12 osds on hdd (Seagate Constellations) with one NVMe (Intel P4600) journal. With the new guy we decided to introduce Bluestore so it is configured as: (same HW) 12 osd with data on hdd and db + wal on one NVMe. We noticed there are periodic slow requests logged, and the implicated osds are the Bluestore ones 98% of the time! This suggests that we need to tweak our Bluestore settings in some way. Investigating I'm seeing: - A great deal of rocksdb debug info in the logs - perhaps we should tone that down? (debug_rocksdb 4/5 -> 1/5) - We look to have the default cache settings (bluestore_cache_size_hdd|ssd etc), we have memory to increase these - There are some buffered io settings (bluefs_buffered_io, bluestore_default_buffered_write), set to (default) false. Are these safe (or useful) to change? - We have default rocksdb options, should some of these be changed? (bluestore_rocksdb_options, in particular max_background_compactions=2 - should we have less, or more?) Also, anything else we should be looking at? regards Mark

3 years, 9 months

4
6
0 0

[RGW] Space usage vastly overestimated since Octopus upgrade

by Liam Monahan

Hi, Since upgrading from Nautilus 14.2.9 -> Octopus 15.2.3 two weeks ago we are seeing large upticks in the reported size (both space and object count) for a number of our RGW users. It does not seem to be isolated to just one user, so I don't think it's something wrong in the users' usage patterns. Users are hitting their quotas very quickly even though they are not writing anywhere near the reported space usage. Has anyone else seen this happen to them? I'm not sure what the most useful debugging information I could send would be. For example, here is a bucket that all of a sudden reports that it has 18446744073709551615 objects! The actual count should be around 20,000. [root@objproxy01 ~]# radosgw-admin bucket stats --bucket=droot-2020 { "bucket": "droot-2020", "num_shards": 32, "tenant": "", "zonegroup": "29946069-33ce-49b7-b93d-de8c95a0c344", "placement_rule": "default-placement", "explicit_placement": { "data_pool": "", "data_extra_pool": "", "index_pool": "" }, "id": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.93433056.64", "marker": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.93433056.64", "index_type": "Normal", "owner": "-droot", "ver": "0#12052,1#15700,2#11033,3#11079,4#11521,5#13708,6#12427,7#10442,8#12769,9#11965,10#12820,11#11015,12#12073,13#11741,14#11851,15#124 97,16#10611,17#11652,18#10162,19#13699,20#9519,21#14224,22#13575,23#12635,24#9413,25#11450,26#12700,27#13122,28#10762,29#14674,30#10809,31#1223 2", "master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0 ,27#0,28#0,29#0,30#0,31#0", "mtime": "2020-06-29T15:14:49.363664Z", "creation_time": "2020-02-04T20:36:40.752748Z", "max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#", "usage": { "rgw.none": { "size": 0, "size_actual": 0, "size_utilized": 0, "size_kb": 0, "size_kb_actual": 0, "size_kb_utilized": 0, "num_objects": 18446744073709551615 }, "rgw.main": { "size": 11612169555286, "size_actual": 11612211085312, "size_utilized": 11612169555286, "size_kb": 11340009332, "size_kb_actual": 11340049888, "size_kb_utilized": 11340009332, "num_objects": 20034 }, "rgw.multimeta": { "size": 0, "size_actual": 0, "size_utilized": 0, "size_kb": 0, "size_kb_actual": 0, "size_kb_utilized": 0, "num_objects": 0 } }, "bucket_quota": { "enabled": false, "check_on_raw": false, "max_size": -1, "max_size_kb": 0, "max_objects": -1 } } The user who owns that bucket above is reportedly using 1.3PB of space, but the known usage is 1/10th I would guess of that until we did the upgrade. [root@objproxy01 ~]# radosgw-admin user stats --uid=-droot { "stats": { "size": 1428764900976977, "size_actual": 1428770491326464, "size_utilized": 0, "size_kb": 1395278223611, "size_kb_actual": 1395283682936, "size_kb_utilized": 0, "num_objects": 2604800 }, "last_stats_sync": "2020-06-29T13:42:26.474035Z", "last_stats_update": "2020-06-29T13:42:26.471413Z" } This seems to be happening with may users who actively write data in our Object Store. Any help appreciated! Thanks, Liam University of Maryland Institute for Advanced Computer Studies

3 years, 9 months

3
4
0 0

removing the private cluster network

by Magnus HAGDORN

Hi there, we currently have a ceph cluster with 6 nodes and a public and cluster network. Each node has two bonded 2x1GE network interfaces, one for the public and one for the cluster network. We are planning to upgrade the networking to 10GE. Given the modest size of our cluster we would like to shut down the cluster network. The new 10GE switches will be on the public netowkr. What's the best way achieving this while the cluster is running. Regards magnus The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

3 years, 9 months

2
1
0 0

Balancing request between rados gateway nodes

by Nghia Viet Tran

Hi everyone, We’re working on the Multi-site setup using Rados gateway with the Active-standby mode. The master zone processes all requests from the web application through the load balancer (3 Rados Gateway nodes behind). From our testing, we figured out that three Gateway nodes in backup zone makes a lot of requests into the load balancer after the period of time to check status of new objects and pulling them to the backup zone OSDs if needed. Our problem is that the workload on each Gateway nodes is not balance. Normally, one of the Gateway node is overloaded while others quite free (no incoming data, some of them even didn’t make any requests to load balancer for data checking). The load balancer from the master zone also under the high load since it’s receving data from web application and also sending data into backup zone….[cid:image001.png@01D64F0C.1252B3A0] The workload on 3 Gate ways [cid:image002.png@01D64F0C.1252B3A0] How to reduce the workload on master load balancer and balancing requests into gateway nodes in Backup zone? Many thanks! -- Nghia Viet Tran (Mr) mgm technology partners Vietnam Co. Ltd 7 Phan Châu Trinh Đà Nẵng, Vietnam +84 935905659 nghia.viet.tran(a)mgm-tp.com<mailto:nghia.viet.tran@mgm-tp.com> www.mgm-tp.com<https://www.mgm-tp.com/en/> Visit us on LinkedIn<https://www.linkedin.com/company/mgm-technology-partners-vietnam-co-ltd> and Facebook<https://www.facebook.com/mgmTechnologyPartnersVietnam>! Innovation Implemented. General Director: Frank Müller Registered office: 7 Pasteur, Hải Châu 1, Hải Châu, Đà Nẵng MST/Tax 0401703955

3 years, 9 months

2
1
0 0

Re: [RGW] Space usage vastly overestimated since Octopus upgrade

by Liam Monahan

Thanks, both. That’s a useful observation. I wonder what I can try to get accurate user stats. All of our users are quota-ed, so wrong users stats actually stop them from writing data. Since stats are only updated on write: I have some users who are inactive and their stats are correct. I have other users who have been actively writing. I see users who have up to 55 times the expected vs. actual size. I looped over buckets manually via the Admin Ops API and pulled the stats for all of the user’s buckets and summed these and compared that to the output from “radosgw-admin user stats" I would guess that underflowing counters could be one explanation, but there may be other things going wrong in the stats aggregation... Thanks, Liam > On Jun 30, 2020, at 6:36 AM, EDH - Manuel Rios <mriosfer(a)easydatahost.com> wrote: > > You can ignore rgw.none details, it dont make sense today from our experience > > Still dont know why dev dont cleanup bucket with those rgw.none stats... > > Some of our buckets got it others new ones no. > > > -----Mensaje original----- > De: Janne Johansson <icepic.dz(a)gmail.com> > Enviado el: martes, 30 de junio de 2020 8:40 > Para: Liam Monahan <liam(a)umiacs.umd.edu> > CC: ceph-users <ceph-users(a)ceph.io> > Asunto: [ceph-users] Re: [RGW] Space usage vastly overestimated since Octopus upgrade > > Den mån 29 juni 2020 kl 17:27 skrev Liam Monahan <liam(a)umiacs.umd.edu>: > >> >> For example, here is a bucket that all of a sudden reports that it has >> 18446744073709551615 objects! The actual count should be around 20,000. >> >> "rgw.none": { >> "size": 0, >> "size_actual": 0, >> "size_utilized": 0, >> "size_kb": 0, >> "size_kb_actual": 0, >> "size_kb_utilized": 0, >> "num_objects": 18446744073709551615 >> }, >> > > That number is a small negative 64bit signed value, printed as an unsigned > 64bit integer. > Seems like the counter underflowed. > > 2^64 = 18446744073709551616 > > > -- > May the most significant bit of your life be positive. > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io > To unsubscribe send an email to ceph-users-leave(a)ceph.io

3 years, 9 months

1
0
0 0

Move WAL/DB to SSD for existing OSD?

by Lindsay Mathieson

Nautilus - Bluestore OSD's created with everything on disk. Now I have some spare SSD's - can I move the location of the existing WAL and/or DB to SSD partitions without recreating the OSD? I suspect not - saw emails from 2018, in the negative :( Failing that - is it difficult to add lvmcache to a osd? -- Lindsay

3 years, 9 months

3
4
0 0

Best practice for object store design

by Szabo, Istvan (Agoda)

Hi, What is the let's say best practice to place haproxy, rgw, mon services in a new cluster? We would like to have a new setup, but unsure how to create a best setup in front of the OSD nodes. Let's say we have 3 mons as ceph suggest it, where should I put haproxy and rados? Should be vm or physical? Thank you the ideas. ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 9 months

1
0
0 0

Multisite setup with and without replicated region

by Szabo, Istvan (Agoda)

Hi, It is possible to create a multisite cluster with multiple zones? I'd like to have zone/region which is replicated across DCs, but I want to have without replication as well. Would prefer to use earlier version of ceph, not octopus yet. Thank you ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 9 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users June 2020