March 2021 - ceph-users - lists.ceph.io

by Janek Bevendorff

Hi, All of a sudden, we are experiencing very concerning MON behaviour. We have five MONs and all of them have thousands up to tens of thousands of slow ops, the oldest one blocking basically indefinitely (at least the timer keeps creeping up). Additionally, the MON stores keep inflating heavily. Under normal circumstances we have about 450-550MB there. Right now its 27GB and growing (rapidly). I tried restarting all MONs, I disabled auto-scaling (just in case) and checked the system load and hardware. I also restarted the MGR and MDS daemons, but to no avail. Is there any way I can debug this properly? I can’t seem to find how I can actually view what ops are causing this and what client (if any) may be responsible for it. Thanks Janek

3 years, 1 month

4
13
0 0

PG export import

by Szabo, Istvan (Agoda)

Hi, I’ve tried to save some pg from a dead osd, I made this: Picked on the same server an osd which is not really used and stopped that osd and import the exported one from the dead one. root@server:~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-33 --no-mon-config --pgid 44.c0s0 --op export --file ./pg44c0s0 Exporting 44.c0s0 info 44.c0s0( empty local-lis/les=0/0 n=0 ec=192123/175799 lis/c=4865474/4851556 les/c/f=4865475/4851557/0 sis=4865493) Export successful root@server:~# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-34 --no-mon-config --op import --file ./pg44c0s0 get_pg_num_history pg_num_history pg_num_history(e5583546 pg_nums {20={173213=256},21={219434=64},22={220991=64},24={219240=32},25={1446965=128},42={175793=32},43={197388=64},44={192123=512}} deleted_pools ) Importing pgid 44.c0s0 write_pg epoch 4865498 info 44.c0s0( empty local-lis/les=0/0 n=0 ec=192123/175799 lis/c=4865474/4851556 les/c/f=4865475/4851557/0 sis=4865493) Import successful Started back 34 and it says the osd is running but in the cluster map it is down :/ root@server:~# systemctl status ceph-osd@34 -l ● ceph-osd(a)34.service<mailto:ceph-osd@34.service> - Ceph object storage daemon osd.34 Loaded: loaded (/lib/systemd/system/ceph-osd@.service<mailto:/lib/systemd/system/ceph-osd@.service>; enabled-runtime; vendor preset: enabled) Active: active (running) since Thu 2021-03-18 10:38:00 CET; 8min ago Process: 45388 ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 34 (code=exited, sta> Main PID: 45392 (ceph-osd) Tasks: 60 Memory: 856.2M CGroup: /system.slice/system-ceph\x2dosd.slice/ceph-osd(a)34.service └─45392 /usr/bin/ceph-osd -f --cluster ceph --id 34 --setuser ceph --setgroup ceph Mar 18 10:38:00 server systemd[1]: Starting Ceph object storage daemon osd.34... Mar 18 10:38:00 server systemd[1]: Started Ceph object storage daemon osd.34. Mar 18 10:38:21 server ceph-osd[45392]: 2021-03-18T10:38:21.817+0100 7f41738d5dc0 -1 osd.34 5583546 log_to_mon> Mar 18 10:38:21 server ceph-osd[45392]: 2021-03-18T10:38:21.825+0100 7f41738d5dc0 -1 osd.34 5583546 mon_cmd_ma> Any idea? ________________________________ This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses.

3 years, 1 month

2
2
0 0

Same data for two buildings

by Denis Morejon Lopez

⁣I have a ceph cluster with 5 nodes. 3 in one building, and 2 in the other one. I put this information in the CRUSH. So that ceph were able to put one copy of objects in the nodes of one building and the other copy to the nodes of the other building. I mean, I setup replicas=2 in order to put the same information in both locations. But I know ceph cluster needs half +1 up nodes to keep one part of cluster working. I need at least a manual procedure to recover one of the two buildings if the other get down, or even if the link between them get down. I do not need 100% up, just something to block and unblock some nodes, and starts the two nodes if the building down were the one with 3 nodes.

3 years, 1 month

2
1
0 0

Telemetry ident use?

by Matthew Vernon

Hi, What use is made of the ident data in the telemetry module? It's disabled by default, and the docs don't seem to say what it's used for... Thanks, Matthew -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE.

3 years, 1 month

1
0
0 0

Erasure-coded Block Device Image Creation With qemu-img - Help

by duluxoz

Hi Guys, So, new issue (I'm gonna get the hang of this if it kills me :-) ). I have a working/healthy Ceph (Octopus) Cluster (with qemu-img, libvert, etc, installed), and an erasure-coded pool called "my_pool". I now need to create a "my_data" image within the "my_pool" pool. As this is for a KVM host / block device (hence the qemu-img et.al.) I'm attempting to use qemu-img, so the command I am using is: ``` qemu-img create -f rbd rbd:my_pool/my_data 1T ``` The error message I received was: ``` qemu-img: rbd:my_pool/my_data: error rbd create: Operation not supported ``` So, I tried the 'raw' rbd command: ``` rbd create -s 1T my_pool/my_data ``` and got the error: ``` _add_image_to_directory: error adding image to directory: (95) Operation not supported rbd: create error: (95) Operation not supported ``` So I don't believe the issue is with the 'qemu-img' command - but I may be wrong. After doing some research I *think* I need to specify a replicated (as opposed to erasure-coded) pool for my_pool's metadata (eg 'my_pool_metadata'), and thus use the command: ``` rbd create -s 1T --data-pool my_pool my_pool_metadata/my_data ``` First Question: Is this correct? Second Question: What is the qemu-img equivalent command - is it: ``` qemu-img create -f rbd rbd:--data-pool my_pool my_pool_metadata/my_data 1T ``` or something similar? Thanks in advance Dulux-Oz

3 years, 1 month

3
2
0 0

Quick quota question

by Andrew Walker-Brown

Hi all When setting a quota on a pool (or directory in Cephfs), is it the amount of client data written or the client data x number of replicas that counts toward the quota? Cheers A Sent from my iPhone

3 years, 1 month

4
6
0 0

Diskless boot for Ceph nodes

by Stephen Smith6

3 years, 2 months

6
6
0 0

Networking Idea/Question

by Dave Hall

Hello, If anybody out there has tried this or thought about it, I'd like to know... I've been thinking about ways to squeeze as much performance as possible from the NICs on a Ceph OSD node. The nodes in our cluster (6 x OSD, 3 x MGR/MON/MDS/RGW) currently have 2 x 10GB ports. Currently, one port is assigned to the front-side network, and one to the back-side network. However, there are times when the traffic on one side or the other is more intense and might benefit from a bit more bandwidth. The idea I had was to bond the two ports together, and to run the back-side network in a tagged VLAN on the combined 20GB LACP port. In order to keep the balance and prevent starvation from either side it would be necessary to apply some sort of a weighted fair queuing mechanism via the 'tc' command. The idea is that if the client side isn't using up the full 10GB/node, and there is a burst of re-balancing activity, the bandwidth consumed by the back-side traffic could swell to 15GB or more. Or vice versa. From what I have read and studied, these algorithms are fairly responsive to changes in load and would thus adjust rapidly if the demand from either side suddenly changed. Maybe this is a crazy idea, or maybe it's really cool. Your thoughts? Thanks. -Dave -- Dave Hall Binghamton University kdhall(a)binghamton.edu

3 years, 2 months

7
11
0 0

Ceph Cluster Taking An Awful Long Time To Rebalance

by duluxoz

Hi Guys, Is the below "ceph -s" normal? This is a brand new cluster with (at the moment) a single Monitor and 7 OSDs (each 6 GiB) that has no data in it (yet), and yet its taking almost a day to "heal itself" from adding in the 2nd OSD. ~~~ cluster: id: [REDACTED] health: HEALTH_WARN Reduced data availability: 256 pgs inactive, 256 pgs incomplete Degraded data redundancy: 12 pgs undersized services: mon: 1 daemons, quorum [REDACTED] (age 22h) mgr: [REDACTED](active, since 22h) osd: 7 osds: 7 up (since 21h), 7 in (since 21h); 32 remapped pgs data: pools: 5 pools, 288 pgs objects: 7 objects, 0 B usage: 7.1 GiB used, 38 TiB / 38 TiB avail pgs: 88.889% pgs not active 6/21 objects misplaced (28.571%) 256 creating+incomplete 18 active+clean 12 active+undersized+remapped 2 active+clean+remapped progress: Rebalancing after osd.1 marked in (22h) [............................] PG autoscaler decreasing pool 1 PGs from 32 to 1 (19h) [............................] ~~~ Thanks in advance Matthew J -- Peregrine IT Signature *Matthew J BLACK* M.Inf.Tech.(Data Comms) MBA B.Sc. MACS (Snr), CP, IP3P When you want it done /right/ ‒ the first time! Phone: +61 4 0411 0089 Email: matthew(a)peregrineit.net <mailto:matthew@peregrineit.net> Web: www.peregrineit.net <http://www.peregrineit.net> View Matthew J BLACK's profile on LinkedIn <http://au.linkedin.com/in/mjblack> This Email is intended only for the addressee. Its use is limited to that intended by the author at the time and it is not to be distributed without the author’s consent. You must not use or disclose the contents of this Email, or add the sender’s Email address to any database, list or mailing list unless you are expressly authorised to do so. Unless otherwise stated, Peregrine I.T. Pty Ltd accepts no liability for the contents of this Email except where subsequently confirmed in writing. The opinions expressed in this Email are those of the author and do not necessarily represent the views of Peregrine I.T. Pty Ltd. This Email is confidential and may be subject to a claim of legal privilege. If you have received this Email in error, please notify the author and delete this message immediately.

3 years, 2 months

3
6
0 0

Has anyone contact Data for Samsung Datacenter SSD Support ?

by Christoph Adomeit

Hi, I hope someone here can help me out with some contact data, email-adress or phone Number for Samsung Datacenter SSD Support ? If I contact Standard Samsung Datacenter Support they tell me they are not there to support PM1735 Drives. We are planning a new Ceph-Cluster and we are thinking of Samsung PM1735 NVME u.2 ssds Unfortunately the PM1735 is not available with u2 interface but the PM1733 is. Some manager from Samsung once told me that PM1733 and PM1735 is exactly the same Hardware, it is only provisioned differently. But he did not know whom to ask. Any idea of whom I could contact at Samsung or of how to provision the PM1733 (7.6TB) to a PM1735 (6.4TB) . I want the provisioning for better DWPD (3DWPD instead of 1DWPD).

3 years, 2 months

3
2
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2021