March 2020 - ceph-users - lists.ceph.io

by Simone Lazzaris

Hi there; I've got a ceph cluster with 4 nodes, each with 9 4TB drives. Last night a disk failed, and unfortunately this lead to a kernel panic on the hosting server (supermicro: never again). One reboot later, the cluster rebalances. This morning, I'm in this situation: root@s3:~# ceph status cluster: id: 9ec27b0f-acfd-40a3-b35d-db301ac5ce8c health: HEALTH_ERR 1/13122293 objects unfound (0.000%) Possible data damage: 1 pg backfill_unfound Degraded data redundancy: 1 pg undersized 27 slow ops, oldest one blocked for 68 sec, osd.5 has slow ops services: mon: 3 daemons, quorum s1,s2,s3 (age 11h) mgr: s1(active, since 6w), standbys: s2, s3 osd: 36 osds: 35 up (since 11h), 35 in (since 11h); 21 remapped pgs rgw: 3 daemons active (s1, s2, s3) data: pools: 10 pools, 1200 pgs objects: 13.12M objects, 41 TiB usage: 63 TiB used, 65 TiB / 127 TiB avail pgs: 186357/39366879 objects misplaced (0.473%) 1/13122293 objects unfound (0.000%) 1179 active+clean 11 active+remapped+backfilling 9 active+remapped+backfill_wait 1 active+backfill_unfound+undersized+remapped io: client: 42 KiB/s rd, 5.2 MiB/s wr, 43 op/s rd, 11 op/s wr recovery: 163 MiB/s, 48 objects/s One PG is in "backfill_unfound" status. The PG is the 6.36a, which is on server 1; the failed disk is the OSD.5, on server 3 (which was rebooted after the panic) so I don't understand the relation. This is the unfound object: root@s3:~# ceph pg 6.36a list_unfound { "num_missing": 1, "num_unfound": 1, "objects": [ { "oid": { "oid": "8a257939-05c9-4ba8-9fd3-fb8504226607.4332.4__shadow_.H5AtB0LjzRSbUWy- hnVSLf4fs884okG_1", "key": "", "snapid": -2, "hash": 961006442, "max": 0, "pool": 6, "namespace": "" }, "need": "263'18213", "have": "0'0", "flags": "none", "locations": [] } ], "more": false } How can I handle this error? The docs are not much comforting, as far as I can see the only thing to do is to mark the missing object as lost and try to cope with that. I'd prefer not. Any ideas? *Simone Lazzaris* *Qcom S.p.A.* simone.lazzaris(a)qcom.it[1] | www.qcom.it[2] * LinkedIn[3]* | *Facebook*[4] -------- [1] mailto:simone.lazzaris@qcom.it [2] https://www.qcom.it [3] https://www.linkedin.com/company/qcom-spa [4] http://www.facebook.com/qcomspa

4 years, 1 month

2
2
0 0

Octopus release announcement

by Alex Chalkias

Hello, I was looking for an official announcement for Octopus release, as the latest update (back in Q3/2019) on the subject said it was scheduled for March 1st. Any updates on that? BR, -- Alex Chalkias *Product Manager* alex.chalkias(a)canonical.com +33 766599367 *Canonical | **Ubuntu*

4 years, 1 month

3
3
0 0

Expected Mgr Memory Usage

by m＠silvenga.com

Hello all, I'm maintaining a small Nautilus 12 OSD cluster (36TB raw). My mon nodes have the mgr/mds collocated/stacked with the mon. Each are allocated 10gb of RAM. During a recent single disk failure and corresponding recovery, I noticed my mgr/mon's were starting to get OOM killed/restarted every 5ish hours - the mgr using around 6.5GB on all my nodes. My monitoring shows an interesting sawtooth pattern with network usage (100MB/s at max), disk storage usage, and disk IO (up to 300MB/s against SSD's at max) usage increasing in parallel with memory usage. I know the docs for hardware recommendations say: > Monitor and manager daemon memory usage generally scales with the size of the cluster. For small clusters, 1-2 GB is generally sufficient. For large clusters, you should provide more (5-10 GB). Now, I would like to think my cluster is on the small size of things, so I was hoping 10gb is enough for the mgr and mon (my OSD nodes are only allocated 32GB of ram), but that assumption appears to be false. So I was wondering how mgr's (and to a lesser extent mon's) are expected to scale in terms of memory. Is it the osd count, or the osd's size, number of pg's, etc.? And if there's a way to limit the amount of RAM used by these mgr's (it seems the mon_osd_cache_size and rocksdb_cache_size settings are for mons if I'm not mistaken). Regards, Mark

4 years, 1 month

1
0
0 0

I have different bluefs formatted labels

by Marc Roos

I have some osd's having a bluefs label formatted like this: { "/dev/sdc2": { "osd_uuid": "cfb2eaa3-1811-4108-b9bb-aad49555246c", "size": 4000681099264, "btime": "2017-07-14 14:58:09.627614", "description": "main", "require_osd_release": "14" } } And I have some osd's having a bluefs label formatted like this: { "/dev/sdd2": { "osd_uuid": "d8912a1b-696c-4668-9337-c740ec47e0d0", "size": 8001457295360, "btime": "2018-06-01 18:27:47.760695", "description": "main", "bluefs": "1", "ceph_fsid": "0f1701f5-453a-4a3b-928d-f652a2bbbcb0", "kv_backend": "rocksdb", "magic": "ceph osd volume v026", "mkfs_done": "yes", "ready": "ready", "require_osd_release": "14", "whoami": "18" } } I assume this is a version difference? What are the implications of having this? How to upgrade?

4 years, 1 month

1
0
0 0

HEALTH_WARN 1 pools have many more objects per pg than average

by Marcel Ceph

Hi We have a new mimic (13.2.6, will upgrade to nautilus next month) cluster where the rados gateway pool has currently many more objects per PG then the other pools. This leads to a warning with ceph status 1 pools have many more objects per pg than average I tries to get rid of this warning by setting on the commandline ceph config set mgr mon_pg_warn_max_object_skew 0 I recall that it leaded to HEALTH_OK but some time later it was at HEALTH_WARN again with the same message. I tried setting the value to -1 as well, restarting mon and mgr daemons. Unfortunately the message stays. Setting in ceph.conf does not help either I know that in the future the number of objects per PG increase for the other pools but in the meantime monitoring for another incoming health warn issue is hard to automate so we aim to always have a HEALTH_OK state Anyone any luck in getting rid of the warning by setting mon_pg_warn_max_object_skew? Thx Marcel

4 years, 1 month

1
0
0 0

scan_links crashing

by Guilherme Geronimo

Hey guys, I'm trying to solve some Lost+Found errors, but when I try to run the "scan_link" command, it crashes. Any tip? Cheers! ceph cluster version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) cephfs-data-scan version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable) root@deployer:/etc/ceph# cephfs-data-scan scan_links 2020-03-02 01:52:42.876 7fda150fa700 -1 NetHandler create_socket couldn't create socket (97) Address family not supported by protocol 2020-03-02 01:52:43.664 7fda23d26f40 -1 datascan.scan_links: Error getting omap from '10000af78b3.00000000': (2) No such file or directory Error ((2) No such file or directory) root@deployer:/etc/ceph# cephfs-data-scan --version ceph version 14.2.7 (3d58626ebeec02d8385a4cefb92c6cbc3a45bfe8) nautilus (stable)

4 years, 1 month

1
0
0 0

MAX AVAIL and RAW AVAIL

by konstantin.ilyasov＠mediascope.net

We have some inexplicable situation. We have ceph cluster on 14.2.4. About 14 nodes with 12 disks (4Tb) in each node. But command ceph df returns for us next report: # ceph df RAW STORAGE: CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 611 TiB 155 TiB 455 TiB 456 TiB 74.60 TOTAL 611 TiB 155 TiB 455 TiB 456 TiB 74.60 POOLS: POOL ID STORED OBJECTS USED %USED MAX AVAIL poolera01 20 238 TiB 84.59M 362 TiB 76.39 75 TiB poolera01md 21 17 GiB 58.33k 17 GiB 0.01 37 TiB poolera02dt 25 71 TiB 24.56M 92 TiB 45.02 90 TiB poolera02md 26 749 MiB 42.23k 1.2 GiB 0 37 TiB poolera01 - pool Erasure-coded 4+2 poolera02dt - pool Erasure-coded 8+2 poolera01md and poolera02md - both replicated size 3 We are using 2 cephfs. It seems, that all our pools can only utilize about 111TiB raw capacity. But we have 155 TiB! We have enabled balancer with mode upmap # ceph balancer status { "active": true, "plans": [], "mode": "upmap" } Can someone explain why ceph df show less MAX AVAIL than possible from AVAIL 155 TiB?

4 years, 1 month

1
0
0 0

Re: Is it ok to add a luminous ceph-disk osd to nautilus still?

by Marc Roos

I did not have time to convert all drives to lvm yet, so I would like to stick to the use of the partition until I have time to change everything. -----Original Message----- Sent: 01 March 2020 18:17 Subject: Re: [ceph-users] Is it ok to add a luminous ceph-disk osd to nautilus still? So use ceph-volume. ?? The nautilus release notes explain why. > On Mar 1, 2020, at 9:02 AM, Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > ceph-disk is not available in Nautilus.elease > > why scrub first? It is a new disk not having any data yet. Scrubbing > is verifying pg's not? > > I just created a vm on the ceph node where I want to add this osd. Did > a passthru of the disk and installed a few rpm's with nodeps to get > the ceph-disk command. > > > > -----Original Message----- > Sent: 01 March 2020 17:47 > Subject: Re: [ceph-users] Is it ok to add a luminous ceph-disk osd to > nautilus still? > > Ensure that it gets scrubbed at least once by Luminous first. But how > and why are you doing this ? Why not use Nautilus binaries ? > >>> On Mar 1, 2020, at 8:36 AM, Marc Roos <M.Roos(a)f1-outsourcing.eu> >> wrote: >> >> >> If I create and osd with luminous 12.0.3 binaries, can I just add it >> to an existing Nautilus cluster? >> >> I sort of did this already, just wondered if there are any drawbacks. >> >> >> [@test2 software]# ceph-disk prepare --bluestore --zap-disk /dev/sdb >> Creating new GPT entries. >> GPT data structures destroyed! You may now partition the disk using >> fdisk or other utilities. >> Creating new GPT entries. >> The operation has completed successfully. >> Setting name! >> partNum is 0 >> REALLY setting name! >> The operation has completed successfully. >> Setting name! >> partNum is 1 >> REALLY setting name! >> The operation has completed successfully. >> The operation has completed successfully. >> meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=6400 >> blks >> = sectsz=512 attr=2, projid32bit=1 >> = crc=1 finobt=0, sparse=0 >> data = bsize=4096 blocks=25600, imaxpct=25 >> = sunit=0 swidth=0 blks >> naming =version 2 bsize=4096 ascii-ci=0 ftype=1 >> log =internal log bsize=4096 blocks=864, version=2 >> = sectsz=512 sunit=0 blks, > lazy-count=1 >> realtime =none extsz=4096 blocks=0, rtextents=0 >> The operation has completed successfully. >> >> _______________________________________________ >> ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an >> email to ceph-users-leave(a)ceph.io > >

4 years, 1 month

1
0
0 0

Re: Is it ok to add a luminous ceph-disk osd to nautilus still?

by Marc Roos

ceph-disk is not available in Nautilus. why scrub first? It is a new disk not having any data yet. Scrubbing is verifying pg's not? I just created a vm on the ceph node where I want to add this osd. Did a passthru of the disk and installed a few rpm's with nodeps to get the ceph-disk command. -----Original Message----- Sent: 01 March 2020 17:47 Subject: Re: [ceph-users] Is it ok to add a luminous ceph-disk osd to nautilus still? Ensure that it gets scrubbed at least once by Luminous first. But how and why are you doing this ? Why not use Nautilus binaries ? > On Mar 1, 2020, at 8:36 AM, Marc Roos <M.Roos(a)f1-outsourcing.eu> wrote: > > > If I create and osd with luminous 12.0.3 binaries, can I just add it > to an existing Nautilus cluster? > > I sort of did this already, just wondered if there are any drawbacks. > > > [@test2 software]# ceph-disk prepare --bluestore --zap-disk /dev/sdb > Creating new GPT entries. > GPT data structures destroyed! You may now partition the disk using > fdisk or other utilities. > Creating new GPT entries. > The operation has completed successfully. > Setting name! > partNum is 0 > REALLY setting name! > The operation has completed successfully. > Setting name! > partNum is 1 > REALLY setting name! > The operation has completed successfully. > The operation has completed successfully. > meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=6400 > blks > = sectsz=512 attr=2, projid32bit=1 > = crc=1 finobt=0, sparse=0 > data = bsize=4096 blocks=25600, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 ftype=1 > log =internal log bsize=4096 blocks=864, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 > The operation has completed successfully. > > _______________________________________________ > ceph-users mailing list -- ceph-users(a)ceph.io To unsubscribe send an > email to ceph-users-leave(a)ceph.io

4 years, 1 month

1
0
0 0

Is it ok to add a luminous ceph-disk osd to nautilus still?

by Marc Roos

If I create and osd with luminous 12.0.3 binaries, can I just add it to an existing Nautilus cluster? I sort of did this already, just wondered if there are any drawbacks. [@test2 software]# ceph-disk prepare --bluestore --zap-disk /dev/sdb Creating new GPT entries. GPT data structures destroyed! You may now partition the disk using fdisk or other utilities. Creating new GPT entries. The operation has completed successfully. Setting name! partNum is 0 REALLY setting name! The operation has completed successfully. Setting name! partNum is 1 REALLY setting name! The operation has completed successfully. The operation has completed successfully. meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=6400 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=0, sparse=0 data = bsize=4096 blocks=25600, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=864, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 The operation has completed successfully.

4 years, 1 month

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users March 2020