- ceph-users - lists.ceph.io

Bug: ceph-objectstore-tool ceph version 12.2.12

by Marc Roos

Max 2x listed ["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}]

4 years, 8 months

1
0
0 0

Placement Groups - default.rgw.metadata pool.

by Jaroslaw Owsiewski

Hi, we have cluster which running Ceph Luminous 12.2.12. Rados Gateway Only (S3). Pool with data is placed on SAS HDDs (1430 pcs) and the rest of pools is placed on the SSD (72 pcs) disks - 72 hosts with OSD role (3 rows, 2 racks per row, and 12 hosts per rack). BlueStore of course. The question is: how many PG we need for default.rgw.metadata? Any ideas? Example statistics from this cluster: pool default.rgw.buckets.data id 15 189946/9416348469 objects misplaced (0.002%) recovery io 13.5MiB/s, 83objects/s client io 531MiB/s rd, 51.4MiB/s wr, 13.24kop/s rd, 4.84kop/s wr pool .rgw.root id 16 nothing is going on pool default.rgw.control id 17 nothing is going on pool default.rgw.meta id 18 client io 47.0MiB/s rd, 0B/s wr, 57.95kop/s rd, 450op/s wr pool default.rgw.log id 19 nothing is going on pool default.rgw.buckets.index id 20 client io 3.12MiB/s rd, 0B/s wr, 3.19kop/s rd, 1.92kop/s wr Regards, Jarek

4 years, 8 months

1
0
0 0

Re: pg 17.36 is active+clean+inconsistent head expected clone 1 missing?

by Marc Roos

Hi Steve, I was just about to follow your steps[0] with the ceph-objectstore-tool, (I do not want to remove more snapshots) So I have this error pg 17.36 is active+clean+inconsistent, acting [7,29,12] 2019-09-02 14:17:34.175139 7f9b3f061700 -1 log_channel(cluster) log [ERR] : deep-scrub 17.36 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head : expected clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 missing I removed the snapshot with snapshot id 4, did a pg repair without any result. I am trying to understand this command of yours, ceph-objectstore-tool --type bluestore --data-path /var/lib/ceph/osd/ceph-229/ --pgid 2.9a6 '{"oid":"rb.0.2479b45.238e1f29","snapid":-2,"hash":2320771494,"max":0,"p ool":2,"namespace":"","max":0}' I think you are getting this info from the --op list not? And grep for the "rbd_data.1f114174b0dc51.0000000000000974" occurance? I have these entries on osd.29 ["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna pid":63,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}] ["17.36",{"oid":"rbd_data.1f114174b0dc51.0000000000000974","key":"","sna pid":-2,"hash":1357874486,"max":0,"pool":17,"namespace":"","max":0}] So I guess snapid's with -2 are bad? I have noticed actually quite a few -2 listings in these op list output, and do not understand why there are so many and the cluster is healthy except for this pg 17.36. [0] https://www.mail-archive.com/ceph-users@lists.ceph.com/msg47212.html -----Original Message----- From: Steve Anthony [mailto:sma310@lehigh.edu] Sent: vrijdag 16 november 2018 17:44 To: ceph-users(a)lists.ceph.com Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head expected clone 1 missing? Looks similar to a problem I had after a several OSDs crashed while trimming snapshots. In my case, the primary OSD thought the snapshot was gone, but some of the replicas are still there, so scrubbing flags it. First I purged all snapshots and then ran ceph pg repair on the problematic placement groups. The first time I encountered this, that action was sufficient to repair the problem. The second time however, I ended up having to manually remove the snapshot objects. http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027431.html Once I had done that, repair the placement group fixed the issue. -Steve On 11/16/2018 04:00 AM, Marc Roos wrote: > > > I am not sure that is going to work, because I have this error quite > some time, from before I added the 4th node. And on the 3 node cluster > it was: > > osdmap e18970 pg 17.36 (17.36) -> up [9,0,12] acting [9,0,12] > > If I understand correctly what you intent to do, moving the data around. > This was sort of accomplished by adding the 4th node. > > > > -----Original Message----- > From: Frank Yu [mailto:flyxiaoyu@gmail.com] > Sent: vrijdag 16 november 2018 3:51 > To: Marc Roos > Cc: ceph-users > Subject: Re: [ceph-users] pg 17.36 is active+clean+inconsistent head > expected clone 1 missing? > > try to restart osd.29, then use pg repair. If this doesn't work or it > appear again after a while, scan your HDD which used for osd.29, maybe > there is bad sector of your disks, just replace the disk with new one. > > > > On Thu, Nov 15, 2018 at 5:00 PM Marc Roos <M.Roos(a)f1-outsourcing.eu> > wrote: > > > > Forgot, these are bluestore osds > > > > -----Original Message----- > From: Marc Roos > Sent: donderdag 15 november 2018 9:59 > To: ceph-users > Subject: [ceph-users] pg 17.36 is active+clean+inconsistent head > expected clone 1 missing? > > > > I thought I will give it another try, asking again here since there > is > another thread current. I am having this error since a year or so. > > This I of course already tried: > ceph pg deep-scrub 17.36 > ceph pg repair 17.36 > > > [@c01 ~]# rados list-inconsistent-obj 17.36 > {"epoch":24363,"inconsistents":[]} > > > [@c01 ~]# ceph pg map 17.36 > osdmap e24380 pg 17.36 (17.36) -> up [29,12,6] acting [29,12,6] > > > [@c04 ceph]# zgrep ERR ceph-osd.29.log*gz > ceph-osd.29.log-20181114.gz:2018-11-13 14:19:55.766604 7f25a05b1700 > -1 > log_channel(cluster) log [ERR] : deep-scrub 17.36 > 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:head > expected > clone 17:6ca1f70a:::rbd_data.1f114174b0dc51.0000000000000974:4 1 > missing > ceph-osd.29.log-20181114.gz:2018-11-13 14:24:55.943454 7f25a05b1700 > -1 > log_channel(cluster) log [ERR] : 17.36 deep-scrub 1 errors > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users(a)lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > -- Steve Anthony LTS HPC Senior Analyst Lehigh University sma310(a)lehigh.edu _______________________________________________ ceph-users mailing list ceph-users(a)lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

4 years, 8 months

1
0
0 0

BACKPORT #21481 - jewel: "FileStore.cc: 2930: FAILED assert(0 == "unexpected error")" in fs

by Reddi Prasad Yendluri

Hi, I have come across this issue on one of ceph cluster storage nodes. (in the production OpenStack). Now, I would like to redploy the CEPH and start OSD on this node again with the same version. I will try to upgrade to a newer version, after increasing the capacity of cluster storage and bring the ceph health status to "OK" status. Has any one used, kolla ansible to deploy or add new node the existing ceph cluster storage. let me know the steps. What I have tried: 1. Ran "ceph playbook from kolla-ansible $ kolla-ansible bootstrap-servers -i <path/multinode> --limit nodeXX.maas Output: success 2. Unable to go through the "prechecks" as docker deamon is not running. 3. Could not start the docker, Uninstalled docker and during installation of docker using the docker-install/site.yml the docker could not start docker neither install nor start the docker because it is looking for dependencies of OSD device (disk) 4. Trying to uninstall the docker completely from physical node and insgtall docker from scratch and reset up the CEPH on this node? Has anyone tried resetup CEPH OSD on the existing ceph cluster storge. Thanks, Reddi Prasad YENDLURI Cloud Specialist M +65 8345 9599 | D +65 6220 9908 Office: *51B Circular Road Singapore 049406* [image: PALO IT]

4 years, 8 months

1
0
0 0

ceph-volume 'ascii' codec can't decode byte 0xe2

by changcheng.liu

Hi Alfredo Deza I see your below PR in ceph PR 23289 "ceph-volume ensure encoded bytes are always used" I meet with one problem when using ceph-deploy to deploy ceph cluster. During running ceph-volume command to create OSD through ceph-deploy, it always hit below problem: nstcc1@nstcloudcc1:deploy$ PYTHONIOENCODING=UTF8 ceph-deploy osd create --data /dev/sdc nstcloudcc1 [nstcloudcc1][WARNIN] stderr: purged osd.0 [nstcloudcc1][WARNIN] --> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 132: ordinal not in range(128) [nstcloudcc1][ERROR ] RuntimeError: command returned non-zero exit status: 1 [ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-volume --cluster ceph lvm create --bluestore --data /dev/sdc [ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs Do you know how to solve the problem? Below is the deploy host environment: nstcc1@nstcloudcc1:deploy$ cat /etc/os-release NAME="Ubuntu" VERSION="18.04.1 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.1 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic nstcc1@nstcloudcc1:deploy$ env | grep 'LC_\|PYTHON' LC_ALL=en_US.UTF-8 LC_MEASUREMENT=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 PYTHONIOENCODING=UTF8 LC_NAME=en_US.UTF-8 LC_ADDRESS=en_US.UTF-8 LC_NUMERIC=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 LC_TIME=en_US.UTF-8 nstcc1@nstcloudcc1:deploy$ python --version Python 2.7.15rc1 B.R. Changcheng

4 years, 8 months

1
0
0 0

EC Compression

by Ashley Merrick

I have an EC RBD pool I want to add compression on. I have the meta pool and the data pool, do I need to enable compression on both for it to function correctly or only on one pool? Thanks

4 years, 8 months

2
1
0 0

Which network is used for recovery / rebalancing

by Thomas Schneider

Hi, my ceph cluster is recovering / rebalancing PGs since some OSDs are marked out. Question: Which network is used for this task be default? Is there any configuration for the network to be used in //etc/ceph/ceph.conf/? THX

4 years, 8 months

2
1
0 0

Fwd: Kinetic support

by Johan Thomsen

Hey, What is the current status of Kinetic KV support in Ceph? I'm asking because: https://www.crn.com.au/news/seagate-quietly-bins-open-storage-project-519345 .. and the fact that kinetic-cpp-client hasn't been updated in four years and only compiles against OpenSSL 1.0.2, which will become EOL by the end of 2019. Or am I totally wrong? ^ Thank you for reply in advance, /Johan

4 years, 8 months

1
0
0 0

removing/flattening a bucket without data movement?

by Zoltan Arnold Nagy

Hi folks, Originally our osd tree looked like this: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 2073.15186 root default -14 176.63100 rack s01-rack -19 176.63100 host s01 <snip osds> -15 171.29900 rack s02-rack -20 171.29900 host s02 <snip osds> etc. You get the idea. It was a legacy thing as we've been upgrading this cluster since probably firefly, and started with way less hardware. The crush rule was set up like this originally: step take default step chooseleaf firstn 0 type rack which we have modified to step take default step chooseleaf firstn 0 type host taking advantage of chooseleaf's behavior (eg searching in depth instead of just a single level). Now we thought we could get rid of the rack buckets simply by moving the host buckets to the root using "ceph osd crush move s01 root=default", however this resulted in a bunch of data movement. Swapping the IDs manually in the crushmap seems to work (verified via crushtool's --compare), eg. changing the ID of s01 to s01-rack's and vice versa, including all shadow trees. Looking around I saw that there is a swap-bucket command but that does not swap the IDs just bucket contents, so would result in data movement. Other than manually editing the crushmap is there a better way to achieve this? Is this way the most optimal? Cheers, Zoltan

4 years, 8 months

3
7
0 0

trouble with grafana dashboards in nautilus

by Rory Schramm

HI, I recently upgraded my cluster from 12.2 to 14.2 and I'm having some trouble getting the mgr dashboards for grafana working. I setup Prometheus and Grafana per https://docs.ceph.com/docs/nautilus/mgr/prometheus/#mgr-prometheus However, for the osd disk performance statistics graphs on the host details dashboard I'm getting the following error: "found duplicate series for the match group {device="dm-5", instance=":9100"} on the right hand-side of the operation: [{name="ceph_disk_occupation", ceph_daemon="osd.13", db_device="/dev/dm-8", device="dm-5", instance=":9100", job="ceph"}, {name="ceph_disk_occupation", ceph_daemon="osd.15", db_device="/dev/dm-10", device="dm-5", instance=":9100", job="ceph"}];many-to-many matching not allowed: matching labels must be unique on one side" This also happens on the following graphs: Host Overview/AVG Disk Utilization Host Details/OSD Disk Performance Statistics/* Also the following graphs show no data points: OSD Details/Physical Device Performance/* prometheus version: 2.12.0 node exporter: 0.15.2 grafana version: 6.3.3 note that my osds all have separate data and rocks db devices. I have also upgraded all the osds to nautilus via ceph-bluestore-tool repair. Any idea what's needed to fix this? Thanks below are the Prometheus config files prometheus.yml global: scrape_interval: 5s evaluation_interval: 5s scrape_configs: - job_name: 'node' file_sd_configs: - files: - node_targets.yml - job_name: 'ceph' honor_labels: true file_sd_configs: - files: - ceph_targets.yml ---- node_targets.yml: [ { "targets": [ "nas-osd-01:9100" ], "labels": { "instance": "nas-osd-01" } }, { "targets": [ "nas-osd-02:9100" ], "labels": { "instance": "nas-osd-02" } }, { "targets": [ "nas-osd-02:9100" ], "labels": { "instance": "nas-osd-03" } } ] --- ceph_targets.yml: [ { "targets": [ "nas-osd-01:9283" ], "labels": {} }, { "targets": [ "nas-osd-02:9283" ], "labels": {} }, { "targets": [ "nas-osd-03:9283" ], "labels": {} } ]

4 years, 8 months

1
0
0 0

2024

2023

2022

2021

2020

2019

ceph-users