Hello,
Is there a way to list all locks held by a client with the given IP address?
Also, I read somewhere that removing the lock with "rbd lock rm..."
automatically blacklists that client connection. Is that correct?
How do I blacklist a client with the given IP address?
Thanks,
Shridhar
Hi,
We have recently added a new storage node to our Luminous (12.2.13)
cluster. The prev nodes are all setup as Filestore: e.g 12 osds on hdd
(Seagate Constellations) with one NVMe (Intel P4600) journal. With the
new guy we decided to introduce Bluestore so it is configured as: (same
HW) 12 osd with data on hdd and db + wal on one NVMe.
We noticed there are periodic slow requests logged, and the implicated
osds are the Bluestore ones 98% of the time! This suggests that we need
to tweak our Bluestore settings in some way. Investigating I'm seeing:
- A great deal of rocksdb debug info in the logs - perhaps we should
tone that down? (debug_rocksdb 4/5 -> 1/5)
- We look to have the default cache settings
(bluestore_cache_size_hdd|ssd etc), we have memory to increase these
- There are some buffered io settings (bluefs_buffered_io,
bluestore_default_buffered_write), set to (default) false. Are these
safe (or useful) to change?
- We have default rocksdb options, should some of these be changed?
(bluestore_rocksdb_options, in particular max_background_compactions=2 -
should we have less, or more?)
Also, anything else we should be looking at?
regards
Mark
Hi,
Since upgrading from Nautilus 14.2.9 -> Octopus 15.2.3 two weeks ago we are seeing large upticks in the reported size (both space and object count) for a number of our RGW users. It does not seem to be isolated to just one user, so I don't think it's something wrong in the users' usage patterns. Users are hitting their quotas very quickly even though they are not writing anywhere near the reported space usage.
Has anyone else seen this happen to them? I'm not sure what the most useful debugging information I could send would be.
For example, here is a bucket that all of a sudden reports that it has 18446744073709551615 objects! The actual count should be around 20,000.
[root@objproxy01 ~]# radosgw-admin bucket stats --bucket=droot-2020
{
"bucket": "droot-2020",
"num_shards": 32,
"tenant": "",
"zonegroup": "29946069-33ce-49b7-b93d-de8c95a0c344",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.93433056.64",
"marker": "8b980d5b-23de-41f9-8b14-84a5bbc3f1c9.93433056.64",
"index_type": "Normal",
"owner": "-droot",
"ver": "0#12052,1#15700,2#11033,3#11079,4#11521,5#13708,6#12427,7#10442,8#12769,9#11965,10#12820,11#11015,12#12073,13#11741,14#11851,15#124
97,16#10611,17#11652,18#10162,19#13699,20#9519,21#14224,22#13575,23#12635,24#9413,25#11450,26#12700,27#13122,28#10762,29#14674,30#10809,31#1223
2",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0,11#0,12#0,13#0,14#0,15#0,16#0,17#0,18#0,19#0,20#0,21#0,22#0,23#0,24#0,25#0,26#0
,27#0,28#0,29#0,30#0,31#0",
"mtime": "2020-06-29T15:14:49.363664Z",
"creation_time": "2020-02-04T20:36:40.752748Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#,11#,12#,13#,14#,15#,16#,17#,18#,19#,20#,21#,22#,23#,24#,25#,26#,27#,28#,29#,30#,31#",
"usage": {
"rgw.none": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 18446744073709551615
},
"rgw.main": {
"size": 11612169555286,
"size_actual": 11612211085312,
"size_utilized": 11612169555286,
"size_kb": 11340009332,
"size_kb_actual": 11340049888,
"size_kb_utilized": 11340009332,
"num_objects": 20034
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}
The user who owns that bucket above is reportedly using 1.3PB of space, but the known usage is 1/10th I would guess of that until we did the upgrade.
[root@objproxy01 ~]# radosgw-admin user stats --uid=-droot
{
"stats": {
"size": 1428764900976977,
"size_actual": 1428770491326464,
"size_utilized": 0,
"size_kb": 1395278223611,
"size_kb_actual": 1395283682936,
"size_kb_utilized": 0,
"num_objects": 2604800
},
"last_stats_sync": "2020-06-29T13:42:26.474035Z",
"last_stats_update": "2020-06-29T13:42:26.471413Z"
}
This seems to be happening with may users who actively write data in our Object Store. Any help appreciated!
Thanks,
Liam
University of Maryland
Institute for Advanced Computer Studies
Hi there,
we currently have a ceph cluster with 6 nodes and a public and cluster
network. Each node has two bonded 2x1GE network interfaces, one for the
public and one for the cluster network. We are planning to upgrade the
networking to 10GE. Given the modest size of our cluster we would like
to shut down the cluster network. The new 10GE switches will be on the
public netowkr. What's the best way achieving this while the cluster is
running.
Regards
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Hi everyone,
We’re working on the Multi-site setup using Rados gateway with the Active-standby mode. The master zone processes all requests from the web application through the load balancer (3 Rados Gateway nodes behind).
From our testing, we figured out that three Gateway nodes in backup zone makes a lot of requests into the load balancer after the period of time to check status of new objects and pulling them to the backup zone OSDs if needed. Our problem is that the workload on each Gateway nodes is not balance. Normally, one of the Gateway node is overloaded while others quite free (no incoming data, some of them even didn’t make any requests to load balancer for data checking). The load balancer from the master zone also under the high load since it’s receving data from web application and also sending data into backup zone….[cid:image001.png@01D64F0C.1252B3A0]
The workload on 3 Gate ways
[cid:image002.png@01D64F0C.1252B3A0]
How to reduce the workload on master load balancer and balancing requests into gateway nodes in Backup zone?
Many thanks!
--
Nghia Viet Tran (Mr)
mgm technology partners Vietnam Co. Ltd
7 Phan Châu Trinh
Đà Nẵng, Vietnam
+84 935905659
nghia.viet.tran(a)mgm-tp.com<mailto:nghia.viet.tran@mgm-tp.com>
www.mgm-tp.com<https://www.mgm-tp.com/en/>
Visit us on LinkedIn<https://www.linkedin.com/company/mgm-technology-partners-vietnam-co-ltd> and Facebook<https://www.facebook.com/mgmTechnologyPartnersVietnam>!
Innovation Implemented.
General Director: Frank Müller
Registered office: 7 Pasteur, Hải Châu 1, Hải Châu, Đà Nẵng
MST/Tax 0401703955
Thanks, both. That’s a useful observation. I wonder what I can try to get accurate user stats. All of our users are quota-ed, so wrong users stats actually stop them from writing data. Since stats are only updated on write: I have some users who are inactive and their stats are correct. I have other users who have been actively writing. I see users who have up to 55 times the expected vs. actual size. I looped over buckets manually via the Admin Ops API and pulled the stats for all of the user’s buckets and summed these and compared that to the output from “radosgw-admin user stats"
I would guess that underflowing counters could be one explanation, but there may be other things going wrong in the stats aggregation...
Thanks,
Liam
> On Jun 30, 2020, at 6:36 AM, EDH - Manuel Rios <mriosfer(a)easydatahost.com> wrote:
>
> You can ignore rgw.none details, it dont make sense today from our experience
>
> Still dont know why dev dont cleanup bucket with those rgw.none stats...
>
> Some of our buckets got it others new ones no.
>
>
> -----Mensaje original-----
> De: Janne Johansson <icepic.dz(a)gmail.com>
> Enviado el: martes, 30 de junio de 2020 8:40
> Para: Liam Monahan <liam(a)umiacs.umd.edu>
> CC: ceph-users <ceph-users(a)ceph.io>
> Asunto: [ceph-users] Re: [RGW] Space usage vastly overestimated since Octopus upgrade
>
> Den mån 29 juni 2020 kl 17:27 skrev Liam Monahan <liam(a)umiacs.umd.edu>:
>
>>
>> For example, here is a bucket that all of a sudden reports that it has
>> 18446744073709551615 objects! The actual count should be around 20,000.
>>
>> "rgw.none": {
>> "size": 0,
>> "size_actual": 0,
>> "size_utilized": 0,
>> "size_kb": 0,
>> "size_kb_actual": 0,
>> "size_kb_utilized": 0,
>> "num_objects": 18446744073709551615
>> },
>>
>
> That number is a small negative 64bit signed value, printed as an unsigned
> 64bit integer.
> Seems like the counter underflowed.
>
> 2^64 = 18446744073709551616
>
>
> --
> May the most significant bit of your life be positive.
> _______________________________________________
> ceph-users mailing list -- ceph-users(a)ceph.io
> To unsubscribe send an email to ceph-users-leave(a)ceph.io
Nautilus - Bluestore OSD's created with everything on disk. Now I have
some spare SSD's - can I move the location of the existing WAL and/or DB
to SSD partitions without recreating the OSD?
I suspect not - saw emails from 2018, in the negative :(
Failing that - is it difficult to add lvmcache to a osd?
--
Lindsay
Hi,
What is the let's say best practice to place haproxy, rgw, mon services in a new cluster?
We would like to have a new setup, but unsure how to create a best setup in front of the OSD nodes.
Let's say we have 3 mons as ceph suggest it, where should I put haproxy and rados?
Should be vm or physical?
Thank you the ideas.
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi,
It is possible to create a multisite cluster with multiple zones?
I'd like to have zone/region which is replicated across DCs, but I want to have without replication as well.
Would prefer to use earlier version of ceph, not octopus yet.
Thank you
________________________________
This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.
Hi all.
Is there anyway to completely health check one OSD host or instance?
For example rados bech just on that OSD or do some checks for disk and
front and back netowrk?
Thanks.